lmullen / legal-modernism

Law and legal practice modernized in the nineteenth-century United States. We are studying and visualizing the history of the modernization of American law.
https://legalmodernism.org
MIT License
4 stars 0 forks source link

Create data for missing textbooks #94

Open lmullen opened 2 months ago

lmullen commented 2 months ago

Data should be in two tables.

BOOK METADATA: bibliographicid (ignore) year title vols subjects (can be an array) psmid (make something up) author (can also add) (e.g., Abbott, Benjamin Vaughan; Barringer, Victor Clay)

PAGE OCR DATA, one row per page pageid (format: page 1 = 00010 psmid (make it up, this identifies the volume) ocrtext

lmullen commented 2 months ago

For psmids use something like this:

momlextra00001

kfunk074 commented 1 month ago

Metadata here: side_corpus.csv

Page OCR data is stored here: https://drive.google.com/drive/folders/1vzENEoxKK74cAI_m5qpVLLBFl9moKW3P?usp=sharing

Each folder is labeled with its psmid, and each page is a separate text file in the folder.