Closed simonw closed 2 years ago
Better name for the --rw
option could be:
--create name
--create-db name
I think I like --create x
- it's pretty clear, though it does risk confusion with the Datasette serve --create
option which is a flag, not an option that takes a value.
Maybe --create-db
with -c
as a shortcut?
Yes - decided to go with this.
I'm going to imitate datasette publish vercel
which has this option:
--generate-dir DIRECTORY Output generated application files and stop
without deploying
Maybe also this option from datasette publish cloudrun
:
--show-files Output the generated Dockerfile and
metadata.json
I'm just going to implement --generate-dir
.
I need the user to specify a region in which to create the volume.
But... I don't want them to have to remember the region codes off the top of their heads.
It's surprisingly hard to programmatically retrieve a list of regions, see:
So I think I'm going to default to using the user's closest region instead, which I can get from the Fly GraphQL API:
{
nearestRegion {
code
}
}
The GraphQL thing seems to require authentication:
In [29]: httpx.post(
...: "https://api.fly.io/graphql",
...: json={"query": "{\n nearestRegion {\n code\n }\n}"},
...: headers={"accept": "application/json"},
...: ).content
Out[29]: b'{"data":null,"errors":[{"message":"You must be authenticated to view this.","locations":[{"line":2,"column":3}],"path":["nearestRegion"],"extensions":{"code":"UNAUTHORIZED"}}]}'
Oh but this DOES work, using a token obtained using flyctl auth token
:
In [4]: httpx.post(
...: "https://api.fly.io/graphql",
...: json={"query": "{\n nearestRegion {\n code\n }\n}"},
...: headers={
...: "accept": "application/json",
...: "Authorization": "Bearer xxx",
...: },
...: ).json()
Out[4]: {'data': {'nearestRegion': {'code': 'sjc'}}}
That command fails like so if the user is not logged in:
% flyctl auth token
Error No access token available. Please login with 'flyctl auth login'
% echo $?
1
It has a working --json
flag if the user is logged in - if they are not logged in it produces not-JSON.
% flyctl auth token --json
{
"token": "..."
}
I'm going to rename --generate-dir
to just --generate
.
I will add that as a other alias to publish vercel
for consistency.
The --generate
option should work even if the user is not logged in, but other operations should not.
This just worked!
datasette publish fly -a create-volume-test --create-volume 1 -c uploads
I'm going to rename
--generate-dir
to just--generate
.I will add that as a other alias to
publish vercel
for consistency.
Actually I won't do this, because datasette publish vercel
has these two options:
--generate-dir DIRECTORY Output generated application files and stop without
deploying
--generate-vercel-json Output generated vercel.json file and stop without
deploying
Renaming one of those to --generate
is more confusing than leaving them as they are.
What happens if you run this twice?
datasette publish fly -a create-volume-test --create-volume 1 -c uploads
I expect an error when it tries to create the same volume twice, but it seems to have continued without errors?
Oh that's weird... the GraphQL query shows two volumes with the same name:
{
"data": {
"apps": {
"nodes": [
{
"name": "create-volume-test",
"volumes": {
"nodes": [
{
"name": "create_volume_test_volume"
},
{
"name": "create_volume_test_volume"
}
]
That was using:
{
apps {
nodes {
name
volumes {
nodes {
name
}
}
}
}
}
Confirmed, I now have two volumes with the same name:
% flyctl volumes list -a create-volume-test
ID NAME SIZE REGION ZONE ATTACHED VM CREATED AT
vol_8zmjnv88pm1vywgx create_volume_test_volume 1GB sjc 0ad1 f881fbcc 4 minutes ago
vol_okgj5453k29ry2wz create_volume_test_volume 1GB sjc c0a5 3 hours ago
Here's why: https://fly.io/docs/reference/volumes/
There can be multiple volumes of the same volume name in a region. Each volume has a unique ID to distinguish itself from others to allow for this. This allows multiple instances of an app to run in one region; creating three volumes named
myapp_data
would let up to three instances of the app to start up and run. A fourth instance would find no volume to attach to and the system would look elsewhere to start it up.
So in this case it looks like I created a new volume but the app continued to attach to the previously created volume.
Idea: I could allow both --create-volume 1 --volume NAME
to let people set the name of the volume they are creating.
I'm getting a bit uncomfortable about volume creation here. It's a bit messy that people might accidentally create multiple volumes of the same size and the same name, each of which they presumably get charged for.
How about: the volume is always called datasette
. If you really want to customize its name you can do so.
Cases to consider then:
If I rename --create-volume
to --volume-size
this might work better:
datasette publish fly -a create-volume-test --volume-size 1 --create-db uploads
Then for subsequent deploys maybe you just do this?
datasette publish fly -a create-volume-test
Ideally this would automatically detect that the app called create-volume-test
has a volume assigned to it already, and reuse that volume. But how will it remember to includes /data/uploads.db
in the list of databases served?
Also, what happens if you then run:
datasette publish fly -a create-volume-test --volume-size 2
Ideally this would automatically detect that the app called
create-volume-test
has a volume assigned to it already, and reuse that volume. But how will it remember to includes/data/uploads.db
in the list of databases served?
To solve this, I would need the ability to introspect which read-write databases are currently mounted. I considered a few ways of doing this:
/-/databases.json
JSON API - problem here is that it might be protected by authentication or not yet available if the app hasn't been fully deployedflyctl secrets list
- you only get back the digest.One last option that's worth trying: could I start Datasette using datasette ... /data/*.db
?
To test a re-deploy I did the following:
datasette publish fly --generate /tmp/d --create-volume 1 -c uploads -a create-volume-test
cd /tmp/d
cp ~/Dropbox/Development/datasette/fixtures.db .
Then I edited Dockerfile
to look like this (not in particular the last line):
FROM python:3.8
COPY . /app
WORKDIR /app
ENV DATASETTE_SECRET '55c...'
RUN pip install -U datasette
RUN datasette inspect fixtures.db --inspect-file inspect-data.json
ENV PORT 8080
EXPOSE 8080
CMD datasette serve --host 0.0.0.0 fixtures.db --cors --inspect-file inspect-data.json /data/*.db --create --port $PORT
Then I ran:
flyctl deploy
A few minutes later the result was https://create-volume-test.fly.dev/ - which served fixtures.db
and the existing /data/uploads.db
file.
So... the /data/*.db
trick works great!
But would this work?
datasette serve --host 0.0.0.0 fixtures.db --cors --inspect-file inspect-data.json \
/data/*.db /data/uploads.db --create --port $PORT
This is for the initial case where the /data/uploads.db
file doesn't exist on the volume yet - I was previously relying on the --create
flag to create this, but would that work in combination with the /data/*.db
wildcard?
No, this is a problem. I tried this locally:
% mkdir /tmp/data
% cp ~/Dropbox/Development/datasette/fixtures.db /tmp/data
% datasette /tmp/data/*.db /tmp/data/created.db --create -p 8852
...
INFO: Uvicorn running on http://127.0.0.1:8852 (Press CTRL+C to quit)
^CINFO: Shutting down
% datasette /tmp/data/*.db /tmp/data/created.db --create -p 8852
...
INFO: 127.0.0.1:49533 - "GET / HTTP/1.1" 200 OK
The first time I ran Datasette I got two databases - fixtures
and created
BUT... when I ran Datasette the second time it looked like this:
This is the same result you get if you run:
datasette /tmp/data/fixtures.db /tmp/data/created.db /tmp/data/created.db
This is caused by this Datasette issue:
So... either I teach Datasette to de-duplicate multiple identical file paths passed to the command, or I can't use /data/*.db
in the Dockerfile
here and I need to go back to other solutions for the challenge described in this comment: https://github.com/simonw/datasette-publish-fly/pull/12#issuecomment-1031971831
I'm going to treat this as a bug in Datasette and fix it.
Reeased https://github.com/simonw/datasette/releases/tag/0.60.2 with a fix for that bug, so now I can use datasette /data/*.db /data/create-me.db --create
provided I have 0.60.2
as the minimum version.
Huh... actually it's not so important to depend on datasette>=0.60.2
for the plugin - what matters is the version of Datasette that gets used in the deployed application.
I'll add this to the pip install
command too then.
Hmm... actually I'm using temporary_docker_directory()
which currently does this:
if branch:
install = [f"https://github.com/simonw/datasette/archive/{branch}.zip"] + list(
install
)
else:
install = ["datasette"] + list(install)
So passing branch="datasette>=0.60.2
won't do the right thing. But I think passing install=["datasette>=0.60.2"]
WILL do the right thing, provided pip install datasette datasette>=0.60.2
doesn't get confused.
I tested locally and these both do the expected thing:
pip install datasette 'datasette==0.60.1'
pip install datasette 'datasette>=0.60.2'
... but, from looking at the rest of that code:
RUN pip install -U {install_from}
RUN datasette inspect {files} --inspect-file inspect-data.json
ENV PORT {port}
EXPOSE {port}
CMD {cmd}""".format(
apt_get_extras=APT_GET_DOCKERFILE_EXTRAS.format(" ".join(apt_get_extras))
if apt_get_extras
else "",
environment_variables=...,
install_from=" ".join(install),
It looks like passing install=["datasette>=0.60.2"]
will break things, because it will add this line to the Dockerfile
:
pip install datasette>=0.60.2
Which should be:
pip install 'datasette>=0.60.2'
I'm going to leave this, and trust that running pip install datasette
in a Docker build in a fresh Fly environment will indeed install the most version from PyPI without needing a hint that it needs to install a version higher than 0.60.2
.
OK, now I can use datasette /data/create-me.db /data/*.db --create
in the Dockerfile in order to load all files that exist on the volume already AND create create-me.db
if it doesn't already exist there.
What does my final design look like?
Initial deploys look like this:
datasette publish fly -a create-volume-test fixtures.db --volume-size 1 --create-db uploads
Subsequent deploys look like this:
datasette publish fly -a create-volume-test fixtures.db
That subsequent deploy can look at flyctl volumes list -a create-volume-test
to figure out if it should attach a volume. Then when it's running the /data/*.db
option will ensure that any database files already in the volume will continue to be served.
If you run this it will create a brand new DB while continuing to serve the existing ones:
datasette publish fly -a create-volume-test fixtures.db --create-db second
In both cases fixtures.db
is uploaded and served as a read-only, immutable database.
The one confusing aspect of this design is that --volume-size 1
parameter in the first call. This creates a new volume called datasette
the first time you run the command - but it does nothing on subsequent calls (and can be omitted, but if it's included it will be ignored).
Would it be more obvious if it was --create-volume
instead? I think that might read better:
datasette publish fly -a create-volume-test fixtures.db --create-volume 1 --create-db uploads
This creates a volume if one does not exist, otherwise it gets ignored.
One catch: regular Fly DOES allow you to create the same volume with the same name multiple times - but datasette publish fly
would not let you do this. Does that matter? Will people be confused that they cannot call --create-volume x
more than one time to create more than one volumes?
I'm also assuming here that the volume being called datasette
will work. I can offer a --volume-name x
option for people who want to use a different volume name there.
Yeah, I'm going with --create-volume
and --create-db
.
I'm going to ditch the -c
shortcut too, since it could be a shortcut for either.
I'm renaming --generate
back to --generate-dir
.
I'm going to remove all of these checks - none of them make sense any more, because I'm changing the implementation such that, if a volume is detected for the existing application, that volume will automatically be reused:
Hardest remaining task now (aside from integration tests and documentation) is:
I'm going to write the documentation next, then finish the implementation.
That documentation should show how you need to upload the read only databases each time too.
Would be cool if you could optionally copy the read-only database into volume storage so you didn't need to ship it again in the future - probably better to solve that later on with a datasette-upload-database
plugin though. This plugin is complicated enough already!
At this point it should be feature complete. Next steps are a bunch of manual testing and an updated integration test.
I can come up with a manual testing script, execute that, then use that to create the automated integration tests.
I created a hash for password
:
pbkdf2_sha256$260000$35a5c9d08a2116aa91e6012474df127b$Zgt5b9fDkc2ivG9syQq296rZ7VmF37MfHfOKUKfOLrQ=
Test script:
datasette publish fly fixtures.db \
--app test-script-1 \
--create-volume 1 \
--create-db tiddlywiki \
--plugin-secret datasette-auth-passwords root_password_hash 'pbkdf2_sha256$260000$35a5c9d08a2116aa91e6012474df127b$Zgt5b9fDkc2ivG9syQq296rZ7VmF37MfHfOKUKfOLrQ=
' \
--install datasette-auth-passwords \
--install datasette-tiddlywiki
Then wait for that deploy, then test it - including adding some data to TiddlyWiki.
Then run:
echo '{"name": "Barry"}' | sqlite-utils insert other.db docs -
datasette publish fly fixtures.db other.db \
--app test-script-1 \
--plugin-secret datasette-auth-passwords root_password_hash 'pbkdf2_sha256$260000$35a5c9d08a2116aa91e6012474df127b$Zgt5b9fDkc2ivG9syQq296rZ7VmF37MfHfOKUKfOLrQ=
' \
--install datasette-auth-passwords \
--install datasette-tiddlywiki \
--install datasette-graphql
And check that the new databases were deployed, but the volume is still there with the TiddlyWiki data in it.
It took a minute or so to stop giving me certificate errors, but it's live after running the first step.
Logging in at https://test-script-1.fly.dev/-/login doesn't work though - it says I got the password wrong.
https://test-script-1.fly.dev/-/metadata returns
{
"plugins": {
"datasette-auth-passwords": {
"root_password_hash": {
"$env": "DATASETTE_AUTH_PASSWORDS_ROOT_PASSWORD_HASH"
}
}
}
}
https://test-script-1.fly.dev/-/plugins confirms the plugin is installed.
It looks like the secret is there:
% flyctl secrets list -a test-script-1
Update available 0.0.295 -> v0.0.296.
Run "flyctl version update" to upgrade.
NAME DIGEST DATE
DATASETTE_AUTH_PASSWORDS_ROOT_PASSWORD_HASH 36f7986d3f66f00dceeec56e29b2fe27 5m29s ago
Urgh... I ran flyctl version update
and it broke my flyctl
because I did an upgrade to macOS Monterey the other day and now Homebrew won't work properly if I don't upgrade Xcode (12.6GB).
Urgh. Updating Xcode downloaded 12.6GB and then just hung on a spinner. So I killed the appstoreagent
process using Activity Monitor based on the tip I found here https://discussions.apple.com/thread/252478041
But that didn't work. Then I tried installing flyctl
from Brew again and got this:
Error: Your Xcode (12.4) is too outdated.
Please update to Xcode 13.2.1 (or delete it).
Xcode can be updated from the App Store.
Error: Your Command Line Tools are too outdated.
Update them from Software Update in System Preferences or run:
softwareupdate --all --install --force
If that doesn't show you any updates, run:
sudo rm -rf /Library/Developer/CommandLineTools
sudo xcode-select --install
So I tried running those.
Still getting errors about needing the latest Xcode. I'm worried that I've bricked it.
Waited for a few hours with the App Store app open on the Xcode page and it seems to have fixed it. I then ran Xcode for the first time and allowed it to install extra components.
And now brew install superfly/tap/flyctl
runs without errors and flyctl
is working again. So unblocked!
I'm trying running this command again:
% datasette publish fly fixtures.db \
--app test-script-1 \
--create-volume 1 \
--create-db tiddlywiki \
--plugin-secret datasette-auth-passwords root_password_hash 'pbkdf2_sha256$260000$35a5c9d08a2116aa91e6012474df127b$Zgt5b9fDkc2ivG9syQq296rZ7VmF37MfHfOKUKfOLrQ=' \
--install datasette-auth-passwords \
--install datasette-tiddlywiki
I have a hunch that the password may have broken because I had an accidental newline at the end of the --plugin-secret
line the first time I ran it.
I think running --create-volume
a second time may do the wrong thing - it may create a duplicate volume rather than noticing and sticking with the existing volume called datasette
.
The tool should definitely output the URL at the end, with a note that it may take a few minutes to start working if this was the first deploy.
OK, after that second deploy https://test-script-1.fly.dev/-/login now successfully logs in with root / password
!
Refs #10. Still needed:
--generate-dir
option--generate-dir
to just--generate
(and back again, changed my mind)--create-volume
working--create-db
working--create-volume
tests--volume-name x
working--volume-name x
tests