updating metadata.json without recreating the app

simonw / datasette

An open source multi-tool for exploring and publishing data

https://datasette.io

Apache License 2.0

9.46k stars 677 forks source link

updating metadata.json without recreating the app #639

Open pkoppstein opened 4 years ago

pkoppstein commented 4 years ago

I've sucessfully "uploaded" an SQLite database (with a metadata.json file) to heroku using:

$ datasette publish heroku so-sales.db -m metadata.json -n so-sales

The question is: how can I modify the (small) metadata.json file without having to upload the (large) SQLite database.

The directions on heroku indicate I should run:

heroku git:clone -a so-sales

But this just results in an empty directory with a warning: warning: You appear to have cloned an empty repository.

I've been able to "clone" the heroku "app" using the command:

$ heroku slugs:download -a so-sales

but this is not a git repository....

Ideally, it seems to me, there'd be an option of the datasette CLI to allow a file to be updated, or there'd be some way to create a local git "clone" of the app so that the heroku instructions for "Deploying with git" would apply.

(p.s. I ran datasette publish heroku -m metadata.json -n so-sales in the hope that that would not cause the .db file to be wiped, but of course it was.)

(p.p.s. Thanks for Datasette!)

simonw commented 4 years ago

Unfortunately I don't think it's possible to do this with Heroku. Heroku treats all deployments as total replacements - that's part of how they achieve zero-downtime deployments, since they run the new deployment at the same time as the old deployment and then switch traffic over at the load balancer.

I did have one idea that's relevant here: #238 - which would provide a mechanism for metadata.json to be hosted on a separate URL (e.g. a gist) and have Datasette periodically fetch a new copy. I closed that in favour of #357 - a plugin hook for loading metadata. That's still something I'm interested in exploring.

pkoppstein commented 4 years ago

@simonw - Thanks for the reply!

My reading of the heroku documents is that if one sets things up using git, then one can use "git push" (from a {local, GitHub, GitLab} git repository to Heroku) to "update" a Heroku deployment, but I'm not sure exactly how this works. However, assuming there is some way to use "git push" to update the Heroku deployment, the question becomes how can one do this in conjunction with datasette.

Again based on my reading the heroku documents, it would seem that the following should work (but it doesn't quite):

1) Use datasette to create a deployment (named MYAPP) 2) Put it in maintenance mode 3) heroku git:clone -a MYAPP -- This results in an empty repository (as expected) 4) In another directory, heroku slugs:download -a MYAPP 5) Copy the downloaded slug into the repository 6) Make some change to metadata.json 6) Commit and push it back 7) Take the deployment out of maintenance mode 8) Refresh the deployment

Using the heroku console, I've verified that the edits appear on heroku, but somehow they are not reflected in the running app.

I'm hopeful that with some small tweak or perhaps the addition of a bit of voodoo, this strategy will work.

I think it will be important to get this working for another reason: getting Heroku, Cloudcube, and datasette to work together, to overcome the slug size limitation so that large SQLite databases can be deployed to Heroku using Datasette.

simonw commented 4 years ago

@jacobian does this sound like something that could work?

jacobian commented 4 years ago

A bit of background: the reason heroku git:clone brings down an empty directory is because datasette publish heroku uses the builds API, rather than a git push, to release the app. I originally did this because it seemed like a lower bar than having a working git, but the downside is, as you found out, that tweaking the created app is hard.

So there's one option -- change datasette publish heroku to use git push instead of heroku builds:create.

@pkoppstein - what you suggested seems like it ought to work (you don't need maintenance mode, though). I'm not sure why it doesn't.

You could also look into using the slugs API to download the slug, change metadata.json, re-pack and re-upload the slug.

Ultimately though I think I think @simonw's idea of reading metadata.json from an external source might be better (#357). Reading from an alternate URL would be fine, or you could also just stuff the whole metadata.json into a Heroku config var, and write a plugin to read it from there.

Hope this helps a bit!

pkoppstein commented 4 years ago

@jacobian - Thanks for your help. Having to upload an entire slug each time a small change is needed in metadata.json seems no better than the current situation so I probably won't go down that rabbit hole just yet. In any case, the really important goal is moving the SQLite file out of Heroku in a way that the Heroku app can still read it efficiently. Is this possible? Is Cloudcube the right place to start? Is there any alternative?

pkoppstein commented 4 years ago

@simonw, @jacobian - I was able to resolve the metadata.json issue by adding -m metadata.json to the Procfile. Now git push heroku master picks up the changes, though I have the impression that heroku is doing more work than necessary (e.g. one of the information messages is: Installing requirements with pip).

I also had to set the environment variable WEB_CONCURRENCY -- I used WEB_CONCURRENCY=1.

I am still anxious to know whether it's possible for Datasette on Heroku to access the SQLite file at another location. Cloudcube seems the most promising, and I'm hoping it can be done by tweaking the Procfile suitably, but maybe that's too optimistic?