Open lmullen opened 2 months ago
For psmids use something like this:
momlextra00001
Metadata here: side_corpus.csv
Page OCR data is stored here: https://drive.google.com/drive/folders/1vzENEoxKK74cAI_m5qpVLLBFl9moKW3P?usp=sharing
Each folder is labeled with its psmid, and each page is a separate text file in the folder.
Data should be in two tables.
BOOK METADATA: bibliographicid (ignore) year title vols subjects (can be an array) psmid (make something up) author (can also add) (e.g.,
Abbott, Benjamin Vaughan; Barringer, Victor Clay
)PAGE OCR DATA, one row per page pageid (format: page 1 =
00010
psmid (make it up, this identifies the volume) ocrtext