simonw / scrape-open-data

Scrape various open data directories to create an index of what's available out there
https://open-data.datasette.io
28 stars 2 forks source link

Build and publish SQLite database of open data #3

Closed simonw closed 2 years ago

simonw commented 2 years ago

I tried this initially using sqlite-utils insert --flatten against the .jsonl files, but ended up with a table with hundreds of columns.

I'm going to be selective about which columns I include in the database instead.

simonw commented 2 years ago

I'm going to deploy this to Cloud Run with the datasette-block-robots plugin (so it doesn't get crawled constantly and cost a lot of money to run) - I'll push it to open-data.datasette.io.

simonw commented 2 years ago

Went with this: https://github.com/simonw/scrape-open-data/blob/d1647a1b08ab91f0668895ad3b5ef233dd29f1f5/.github/workflows/deploy.yml#L34-L41

simonw commented 2 years ago

I configured the CNAME for open-data.datasette.io to point to ghs.googlehosted.com.

simonw commented 2 years ago

Now live at https://open-data-j7hipcg4aq-uc.a.run.app - and I've told Cloud Run about the custom subdomain, that should go live in 10-15 minutes.

simonw commented 2 years ago

https://open-data.datasette.io/ works now.