simonw / sfms-history

The sfms-history project
https://sfms-history.vercel.app
6 stars 1 forks source link

Only publish content in the PUBLIC folder #14

Closed simonw closed 2 years ago

simonw commented 2 years ago

New decision: publish everything in INTAKE and PUBLIC, but exclude PROCESSED INTAKE DOCUMENTS subfolder.

simonw commented 2 years ago

I synced Google Drive to S3 again (using the method I'll use for #15), then ran s3-ocr dedupe sfms-history and s3-ocr start sfms-history --all to ensure everything had been OCRd.

Now running this: https://github.com/simonw/sfms-history/blob/cc4d3675d58cbdf6ae676dc1794dfc9033e73269/build-db.sh#L3-L4

s3-ocr index sfms-history index.db

Will take a while because it needs to suck down all of that JSON.

simonw commented 2 years ago

Still need to exclude stuff like:

INTAKE/SFMShistory_intake_2022.03.13/scans_2022.03.13_membership
simonw commented 2 years ago

Done and deployed in:

See https://sfms-history.vercel.app/docs