tomwhite / covid-19-uk-data

Coronavirus (COVID-19) UK Historical Data
http://tom-e-white.com/covid-19-uk-data/
The Unlicense
162 stars 79 forks source link

Auto-updating datasette instance #17

Closed tomwhite closed 4 years ago

tomwhite commented 4 years ago

It would be great if we could have a datasette instance that was automatically updated (e.g. every hour).

One way of doing it: https://twitter.com/psychemedia/status/1243222423287271424

simonw commented 4 years ago

The three best options for this are:

All three should be essentially free for a small project like this. Which one are you most comfortable with?

simonw commented 4 years ago

I usually use Cloud Run for this kind of project: https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/

tomwhite commented 4 years ago

I've used Heroku in the past, but have most experience with Google Cloud - so probably Google Cloud Run. I saw https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/ which has lots of useful details, but I was hoping it might be a bit simpler if there's no postprocessing to do.

This repo already has a sqlite database (https://github.com/tomwhite/covid-19-uk-data/blob/master/data/covid-19-uk.db) which is updated whenever the data changes, so it might be simplest to just publish that. Alternatively, it should be straightforward to generate a sqlite database from the CSV files in the data directory.

simonw commented 4 years ago

My preference is to automate the building of the Database in the action (or CI script or whatever) - here's an example of a build script I wrote that uses csvs-to-sqlite for that: https://github.com/simonw/global-power-plants-datasette/blob/ece947cb869e2786fae5a6a6316ac1a77430cbdf/.travis.yml#L11

If you have a separate mechanism for building the SQLite database then you can skip that though and just run datasette publish against the .db file you've already created.

tomwhite commented 4 years ago

That makes sense. I actually already use your csvs-to-sqlite in the processing pipeline for preparing the data in this repo :)

So I think the way forward is to use Google Cloud Run triggered by a GitHub Action, just like you did in your blog post. I will try to work through it in the next couple of days. Thanks for your guidance!

simonw commented 4 years ago

I'm happy to help review your action YML file as you work on it - I find they usually take quite a bit of iterating to get them working. Setting up the secrets for Cloud Run is particularly fiddly in my experience.

tomwhite commented 4 years ago

Hi @simonw, I've managed to set up a GitHub Action that publishes a datasette instance on Cloud Run. It was quite fiddly, but the instructions you published at https://simonwillison.net/2020/Jan/21/github-actions-cloud-run/ were invaluable - thank you for documenting the details so clearly!

I made a few notes of extra things that I thought were worth mentioning:

It's running at https://covid-19-uk-datasette-65tzkjlxkq-ew.a.run.app/

The PR for this is #26, in case you've got any comments. Thanks!

tomwhite commented 4 years ago

Fixed in #26