simonw / datasette-publish-fly

Datasette plugin for publishing data using Fly
Apache License 2.0
20 stars 7 forks source link

Volume support plus integration tests #12

Closed simonw closed 2 years ago

simonw commented 2 years ago

Refs #10. Still needed:

simonw commented 2 years ago

Better name for the --rw option could be:

I think I like --create x - it's pretty clear, though it does risk confusion with the Datasette serve --create option which is a flag, not an option that takes a value.

simonw commented 2 years ago

Maybe --create-db with -c as a shortcut?

Yes - decided to go with this.

simonw commented 2 years ago

I'm going to imitate datasette publish vercel which has this option:

  --generate-dir DIRECTORY        Output generated application files and stop
                                  without deploying

Maybe also this option from datasette publish cloudrun:

  --show-files                    Output the generated Dockerfile and
                                  metadata.json
simonw commented 2 years ago

I'm just going to implement --generate-dir.

simonw commented 2 years ago

I need the user to specify a region in which to create the volume.

But... I don't want them to have to remember the region codes off the top of their heads.

It's surprisingly hard to programmatically retrieve a list of regions, see:

So I think I'm going to default to using the user's closest region instead, which I can get from the Fly GraphQL API:

{
  nearestRegion {
    code
  }
}

To https://api.fly.io/graphql

simonw commented 2 years ago

The GraphQL thing seems to require authentication:

In [29]: httpx.post(
    ...:     "https://api.fly.io/graphql",
    ...:     json={"query": "{\n  nearestRegion {\n    code\n  }\n}"},
    ...:     headers={"accept": "application/json"},
    ...: ).content
Out[29]: b'{"data":null,"errors":[{"message":"You must be authenticated to view this.","locations":[{"line":2,"column":3}],"path":["nearestRegion"],"extensions":{"code":"UNAUTHORIZED"}}]}'
simonw commented 2 years ago

Oh but this DOES work, using a token obtained using flyctl auth token:

In [4]: httpx.post(
   ...:     "https://api.fly.io/graphql",
   ...:     json={"query": "{\n  nearestRegion {\n    code\n  }\n}"},
   ...:     headers={
   ...:         "accept": "application/json",
   ...:         "Authorization": "Bearer xxx",
   ...:     },
   ...: ).json()
Out[4]: {'data': {'nearestRegion': {'code': 'sjc'}}}
simonw commented 2 years ago

That command fails like so if the user is not logged in:

% flyctl auth token 
Error No access token available. Please login with 'flyctl auth login'
% echo $?
1

It has a working --json flag if the user is logged in - if they are not logged in it produces not-JSON.

% flyctl auth token --json
{
    "token": "..."
}
simonw commented 2 years ago

I'm going to rename --generate-dir to just --generate.

I will add that as a other alias to publish vercel for consistency.

simonw commented 2 years ago

The --generate option should work even if the user is not logged in, but other operations should not.

simonw commented 2 years ago

This just worked!

datasette publish fly -a create-volume-test --create-volume 1 -c uploads
simonw commented 2 years ago

I'm going to rename --generate-dir to just --generate.

I will add that as a other alias to publish vercel for consistency.

Actually I won't do this, because datasette publish vercel has these two options:

  --generate-dir DIRECTORY        Output generated application files and stop without
                                  deploying
  --generate-vercel-json          Output generated vercel.json file and stop without
                                  deploying

Renaming one of those to --generate is more confusing than leaving them as they are.

simonw commented 2 years ago

What happens if you run this twice?

datasette publish fly -a create-volume-test --create-volume 1 -c uploads

I expect an error when it tries to create the same volume twice, but it seems to have continued without errors?

Oh that's weird... the GraphQL query shows two volumes with the same name:

{
  "data": {
    "apps": {
      "nodes": [
        {
          "name": "create-volume-test",
          "volumes": {
            "nodes": [
              {
                "name": "create_volume_test_volume"
              },
              {
                "name": "create_volume_test_volume"
              }
            ]

That was using:

{
  apps {
    nodes {
      name
      volumes {
        nodes {
          name
        }
      }
    }
  }
}

To https://api.fly.io/graphql

simonw commented 2 years ago

Confirmed, I now have two volumes with the same name:

% flyctl volumes list -a create-volume-test
ID                      NAME                        SIZE    REGION  ZONE    ATTACHED VM CREATED AT    
vol_8zmjnv88pm1vywgx    create_volume_test_volume   1GB     sjc     0ad1    f881fbcc    4 minutes ago   
vol_okgj5453k29ry2wz    create_volume_test_volume   1GB     sjc     c0a5                3 hours ago     

Here's why: https://fly.io/docs/reference/volumes/

There can be multiple volumes of the same volume name in a region. Each volume has a unique ID to distinguish itself from others to allow for this. This allows multiple instances of an app to run in one region; creating three volumes named myapp_data would let up to three instances of the app to start up and run. A fourth instance would find no volume to attach to and the system would look elsewhere to start it up.

So in this case it looks like I created a new volume but the app continued to attach to the previously created volume.

simonw commented 2 years ago

Idea: I could allow both --create-volume 1 --volume NAME to let people set the name of the volume they are creating.

simonw commented 2 years ago

I'm getting a bit uncomfortable about volume creation here. It's a bit messy that people might accidentally create multiple volumes of the same size and the same name, each of which they presumably get charged for.

simonw commented 2 years ago

How about: the volume is always called datasette. If you really want to customize its name you can do so.

simonw commented 2 years ago

Cases to consider then:

simonw commented 2 years ago

If I rename --create-volume to --volume-size this might work better:

datasette publish fly -a create-volume-test --volume-size 1 --create-db uploads

Then for subsequent deploys maybe you just do this?

datasette publish fly -a create-volume-test

Ideally this would automatically detect that the app called create-volume-test has a volume assigned to it already, and reuse that volume. But how will it remember to includes /data/uploads.db in the list of databases served?

Also, what happens if you then run:

datasette publish fly -a create-volume-test --volume-size 2
simonw commented 2 years ago

Ideally this would automatically detect that the app called create-volume-test has a volume assigned to it already, and reuse that volume. But how will it remember to includes /data/uploads.db in the list of databases served?

To solve this, I would need the ability to introspect which read-write databases are currently mounted. I considered a few ways of doing this:

One last option that's worth trying: could I start Datasette using datasette ... /data/*.db ?

simonw commented 2 years ago

To test a re-deploy I did the following:

datasette publish fly --generate /tmp/d --create-volume 1 -c uploads -a create-volume-test
cd /tmp/d
cp ~/Dropbox/Development/datasette/fixtures.db .

Then I edited Dockerfile to look like this (not in particular the last line):

FROM python:3.8
COPY . /app
WORKDIR /app

ENV DATASETTE_SECRET '55c...'
RUN pip install -U datasette
RUN datasette inspect fixtures.db --inspect-file inspect-data.json
ENV PORT 8080
EXPOSE 8080
CMD datasette serve --host 0.0.0.0 fixtures.db --cors --inspect-file inspect-data.json /data/*.db --create --port $PORT

Then I ran:

flyctl deploy

A few minutes later the result was https://create-volume-test.fly.dev/ - which served fixtures.db and the existing /data/uploads.db file.

So... the /data/*.db trick works great!

simonw commented 2 years ago

But would this work?

datasette serve --host 0.0.0.0 fixtures.db --cors --inspect-file inspect-data.json \
  /data/*.db /data/uploads.db --create --port $PORT

This is for the initial case where the /data/uploads.db file doesn't exist on the volume yet - I was previously relying on the --create flag to create this, but would that work in combination with the /data/*.db wildcard?

simonw commented 2 years ago

No, this is a problem. I tried this locally:

% mkdir /tmp/data
% cp ~/Dropbox/Development/datasette/fixtures.db /tmp/data 
% datasette /tmp/data/*.db /tmp/data/created.db --create -p 8852
...
INFO:     Uvicorn running on http://127.0.0.1:8852 (Press CTRL+C to quit)
^CINFO:     Shutting down
% datasette /tmp/data/*.db /tmp/data/created.db --create -p 8852
...
INFO:     127.0.0.1:49533 - "GET / HTTP/1.1" 200 OK

The first time I ran Datasette I got two databases - fixtures and created

BUT... when I ran Datasette the second time it looked like this:

image

This is the same result you get if you run:

datasette /tmp/data/fixtures.db /tmp/data/created.db /tmp/data/created.db

This is caused by this Datasette issue:

So... either I teach Datasette to de-duplicate multiple identical file paths passed to the command, or I can't use /data/*.db in the Dockerfile here and I need to go back to other solutions for the challenge described in this comment: https://github.com/simonw/datasette-publish-fly/pull/12#issuecomment-1031971831

simonw commented 2 years ago

I'm going to treat this as a bug in Datasette and fix it.

simonw commented 2 years ago

Reeased https://github.com/simonw/datasette/releases/tag/0.60.2 with a fix for that bug, so now I can use datasette /data/*.db /data/create-me.db --create provided I have 0.60.2 as the minimum version.

simonw commented 2 years ago

Huh... actually it's not so important to depend on datasette>=0.60.2 for the plugin - what matters is the version of Datasette that gets used in the deployed application.

I'll add this to the pip install command too then.

simonw commented 2 years ago

Hmm... actually I'm using temporary_docker_directory() which currently does this:

    if branch:
        install = [f"https://github.com/simonw/datasette/archive/{branch}.zip"] + list(
            install
        )
    else:
        install = ["datasette"] + list(install)

So passing branch="datasette>=0.60.2 won't do the right thing. But I think passing install=["datasette>=0.60.2"] WILL do the right thing, provided pip install datasette datasette>=0.60.2 doesn't get confused.

simonw commented 2 years ago

I tested locally and these both do the expected thing:

pip install datasette 'datasette==0.60.1'
pip install datasette 'datasette>=0.60.2'
simonw commented 2 years ago

... but, from looking at the rest of that code:

RUN pip install -U {install_from}
RUN datasette inspect {files} --inspect-file inspect-data.json
ENV PORT {port}
EXPOSE {port}
CMD {cmd}""".format(
        apt_get_extras=APT_GET_DOCKERFILE_EXTRAS.format(" ".join(apt_get_extras))
        if apt_get_extras
        else "",
        environment_variables=...,
        install_from=" ".join(install),

It looks like passing install=["datasette>=0.60.2"] will break things, because it will add this line to the Dockerfile:

pip install datasette>=0.60.2

Which should be:

pip install 'datasette>=0.60.2'
simonw commented 2 years ago

I'm going to leave this, and trust that running pip install datasette in a Docker build in a fresh Fly environment will indeed install the most version from PyPI without needing a hint that it needs to install a version higher than 0.60.2.

simonw commented 2 years ago

OK, now I can use datasette /data/create-me.db /data/*.db --create in the Dockerfile in order to load all files that exist on the volume already AND create create-me.db if it doesn't already exist there.

What does my final design look like?

Initial deploys look like this:

datasette publish fly -a create-volume-test fixtures.db --volume-size 1 --create-db uploads

Subsequent deploys look like this:

datasette publish fly -a create-volume-test fixtures.db

That subsequent deploy can look at flyctl volumes list -a create-volume-test to figure out if it should attach a volume. Then when it's running the /data/*.db option will ensure that any database files already in the volume will continue to be served.

If you run this it will create a brand new DB while continuing to serve the existing ones:

datasette publish fly -a create-volume-test fixtures.db --create-db second

In both cases fixtures.db is uploaded and served as a read-only, immutable database.

The one confusing aspect of this design is that --volume-size 1 parameter in the first call. This creates a new volume called datasette the first time you run the command - but it does nothing on subsequent calls (and can be omitted, but if it's included it will be ignored).

Would it be more obvious if it was --create-volume instead? I think that might read better:

datasette publish fly -a create-volume-test fixtures.db --create-volume 1 --create-db uploads

This creates a volume if one does not exist, otherwise it gets ignored.

One catch: regular Fly DOES allow you to create the same volume with the same name multiple times - but datasette publish fly would not let you do this. Does that matter? Will people be confused that they cannot call --create-volume x more than one time to create more than one volumes?

I'm also assuming here that the volume being called datasette will work. I can offer a --volume-name x option for people who want to use a different volume name there.

simonw commented 2 years ago

Yeah, I'm going with --create-volume and --create-db.

I'm going to ditch the -c shortcut too, since it could be a shortcut for either.

simonw commented 2 years ago

I'm renaming --generate back to --generate-dir.

simonw commented 2 years ago

I'm going to remove all of these checks - none of them make sense any more, because I'm changing the implementation such that, if a volume is detected for the existing application, that volume will automatically be reused:

https://github.com/simonw/datasette-publish-fly/blob/ac6a03e96b8c23eaddb4d223ededc5618f0d6c94/datasette_publish_fly/__init__.py#L101-L112

simonw commented 2 years ago

Hardest remaining task now (aside from integration tests and documentation) is:

simonw commented 2 years ago

I'm going to write the documentation next, then finish the implementation.

simonw commented 2 years ago

New documentation is here: https://github.com/simonw/datasette-publish-fly/blob/644f155cbc4f93ebbd77b49775444e844f500823/README.md#using-fly-volumes-for-writable-databases

simonw commented 2 years ago

That documentation should show how you need to upload the read only databases each time too.

Would be cool if you could optionally copy the read-only database into volume storage so you didn't need to ship it again in the future - probably better to solve that later on with a datasette-upload-database plugin though. This plugin is complicated enough already!

simonw commented 2 years ago

Latest updated documentation: https://github.com/simonw/datasette-publish-fly/blob/e5f0732022239a360ca2ff363a8b200319c34c6d/README.md#using-fly-volumes-for-writable-databases

simonw commented 2 years ago

At this point it should be feature complete. Next steps are a bunch of manual testing and an updated integration test.

I can come up with a manual testing script, execute that, then use that to create the automated integration tests.

simonw commented 2 years ago

I created a hash for password:

pbkdf2_sha256$260000$35a5c9d08a2116aa91e6012474df127b$Zgt5b9fDkc2ivG9syQq296rZ7VmF37MfHfOKUKfOLrQ=

Test script:

datasette publish fly fixtures.db \
--app test-script-1 \
--create-volume 1 \
--create-db tiddlywiki \
--plugin-secret datasette-auth-passwords root_password_hash 'pbkdf2_sha256$260000$35a5c9d08a2116aa91e6012474df127b$Zgt5b9fDkc2ivG9syQq296rZ7VmF37MfHfOKUKfOLrQ=
' \
--install datasette-auth-passwords \
--install datasette-tiddlywiki

Then wait for that deploy, then test it - including adding some data to TiddlyWiki.

Then run:

echo '{"name": "Barry"}' | sqlite-utils insert other.db docs -
datasette publish fly fixtures.db other.db \
--app test-script-1 \
--plugin-secret datasette-auth-passwords root_password_hash 'pbkdf2_sha256$260000$35a5c9d08a2116aa91e6012474df127b$Zgt5b9fDkc2ivG9syQq296rZ7VmF37MfHfOKUKfOLrQ=
' \
--install datasette-auth-passwords \
--install datasette-tiddlywiki \
--install datasette-graphql

And check that the new databases were deployed, but the volume is still there with the TiddlyWiki data in it.

simonw commented 2 years ago

It took a minute or so to stop giving me certificate errors, but it's live after running the first step.

Logging in at https://test-script-1.fly.dev/-/login doesn't work though - it says I got the password wrong.

https://test-script-1.fly.dev/-/metadata returns

{
    "plugins": {
        "datasette-auth-passwords": {
            "root_password_hash": {
                "$env": "DATASETTE_AUTH_PASSWORDS_ROOT_PASSWORD_HASH"
            }
        }
    }
}

https://test-script-1.fly.dev/-/plugins confirms the plugin is installed.

It looks like the secret is there:

% flyctl secrets list -a test-script-1
Update available 0.0.295 -> v0.0.296.
Run "flyctl version update" to upgrade.
NAME                                        DIGEST                           DATE      
DATASETTE_AUTH_PASSWORDS_ROOT_PASSWORD_HASH 36f7986d3f66f00dceeec56e29b2fe27 5m29s ago 
simonw commented 2 years ago

Urgh... I ran flyctl version update and it broke my flyctl because I did an upgrade to macOS Monterey the other day and now Homebrew won't work properly if I don't upgrade Xcode (12.6GB).

simonw commented 2 years ago

Urgh. Updating Xcode downloaded 12.6GB and then just hung on a spinner. So I killed the appstoreagent process using Activity Monitor based on the tip I found here https://discussions.apple.com/thread/252478041

But that didn't work. Then I tried installing flyctl from Brew again and got this:

Error: Your Xcode (12.4) is too outdated.
Please update to Xcode 13.2.1 (or delete it).
Xcode can be updated from the App Store.

Error: Your Command Line Tools are too outdated.
Update them from Software Update in System Preferences or run:
  softwareupdate --all --install --force

If that doesn't show you any updates, run:
  sudo rm -rf /Library/Developer/CommandLineTools
  sudo xcode-select --install

So I tried running those.

simonw commented 2 years ago

Still getting errors about needing the latest Xcode. I'm worried that I've bricked it.

simonw commented 2 years ago

Waited for a few hours with the App Store app open on the Xcode page and it seems to have fixed it. I then ran Xcode for the first time and allowed it to install extra components.

And now brew install superfly/tap/flyctl runs without errors and flyctl is working again. So unblocked!

simonw commented 2 years ago

I'm trying running this command again:

 % datasette publish fly fixtures.db \                                                 
--app test-script-1 \
--create-volume 1 \
--create-db tiddlywiki \
--plugin-secret datasette-auth-passwords root_password_hash 'pbkdf2_sha256$260000$35a5c9d08a2116aa91e6012474df127b$Zgt5b9fDkc2ivG9syQq296rZ7VmF37MfHfOKUKfOLrQ=' \
--install datasette-auth-passwords \
--install datasette-tiddlywiki

I have a hunch that the password may have broken because I had an accidental newline at the end of the --plugin-secret line the first time I ran it.

simonw commented 2 years ago

I think running --create-volume a second time may do the wrong thing - it may create a duplicate volume rather than noticing and sticking with the existing volume called datasette.

simonw commented 2 years ago

The tool should definitely output the URL at the end, with a note that it may take a few minutes to start working if this was the first deploy.

simonw commented 2 years ago

OK, after that second deploy https://test-script-1.fly.dev/-/login now successfully logs in with root / password!