6 Since we do not have access to the database that stores the publications for the original site, it would make sense to use a web crawler to scrape data from https://ptolemy.berkeley.edu/projects/icyphy/ instead of using manual effort. The crawler first looks at the publications collection page for each year; then it goes into each publication link to retrieve information.
The idea is to use a general schema to store a publication (title, journal, year, etc.) and generate citation styles on the fly, instead of storing three citation styles directly in the publication entry, so that there is less burden on data entry. To achieve this, the crawler looks at the bibtex of each publication, which is the closest representation to a general schema, and parses the bibtex info using a BibTeX parser. It then generates .md files in the _publications directory.
6 Since we do not have access to the database that stores the publications for the original site, it would make sense to use a web crawler to scrape data from https://ptolemy.berkeley.edu/projects/icyphy/ instead of using manual effort. The crawler first looks at the publications collection page for each year; then it goes into each publication link to retrieve information.
The idea is to use a general schema to store a publication (title, journal, year, etc.) and generate citation styles on the fly, instead of storing three citation styles directly in the publication entry, so that there is less burden on data entry. To achieve this, the crawler looks at the
bibtex
of each publication, which is the closest representation to a general schema, and parses thebibtex
info using a BibTeX parser. It then generates.md
files in the_publications
directory.