simonw / sfms-history

The sfms-history project
https://sfms-history.vercel.app
6 stars 1 forks source link

Automate building of database in GitHub Actions #19

Closed simonw closed 2 years ago

simonw commented 2 years ago

Currently it just downloads a previously built one - I want to run the build-dbs.sh script in GitHub Actions instead.

Current: https://github.com/simonw/sfms-history/blob/547e277e7614a2c7ca4aa244afcf175715fede44/.github/workflows/deploy.yml#L29-L33

simonw commented 2 years ago

There are two database files involved here.

index.db is the database of raw OCR data pulled from s3-ocr index. This can be generated from scratch by pulling content from the sfms-history bucket, but that takes around 10 minutes as it needs to pull multiple GBs of JSON.

The s3-ocr index command is smart enough to be able to only fetch new data, so it's OK to run provided the previous index.db file is available.

The script generates sfms.db which then gets deployed.

simonw commented 2 years ago

I could cache index.db in the GitHub Actions cache but that expires. I'm going to store it in the sfms-history S3 bucket, since that already exists and I already have credentials for it.

simonw commented 2 years ago

Steps to add:

  1. Download existing index.db from S3
  2. Run s3-ocr index sfms-history index.db to update it with new OCR
  3. Push the database back up to S3
  4. Run ./build-db.sh
  5. Deploy the result
simonw commented 2 years ago

I'll need to add these to requirements.txt:

simonw commented 2 years ago

I'm adding two secrets to this repo:

simonw commented 2 years ago

I created a valid blank SQLite database file with sqlite-utils create-database index.db and uploaded that to the root of the sfms-history bucket.

simonw commented 2 years ago

This was tough because of this bug, now fixed:

But... the deploy has gone out now, and /index.db in the S3 bucket is an 18.5MB database!

https://sfms-history.vercel.app/docs now just lists the documents that we want to be there.