simonw / datasette-publish-fly

Datasette plugin for publishing data using Fly
Apache License 2.0
20 stars 7 forks source link

Support Fly volumes #10

Closed simonw closed 2 years ago

simonw commented 2 years ago

Especially interesting given the 3GB available now in the free tier: https://fly.io/blog/free-postgres/

simonw commented 2 years ago

Documentation: https://fly.io/docs/reference/volumes/

First create it:

fly volumes create myapp_data --region lhr --size 40

(That size is in GB)

Then mount it using this in fly.toml:

[mounts]
source="myapp_data"
destination="/data"
simonw commented 2 years ago

Initial thoughts on the design: add a new option that causes a volume to be created and then mounted on the deployed Datasette instance - like this:

datasette publish fly demo.db --app myflyapp \
  --region sfo --volume-size 1G --rw test.db

This would create a volume called myflyapp-volume and mount it as /data - then it would create a new empty read-write database file in /data/test.db and start Datasette against it. It would package and deploy data.db too, as a immutable database baked into the image.

I don't really like that --rw test.db option. Also it would be nice if an existing database could be copied up to the new /data volume, though that's hard since I can't see an obvious Fly mechanism for copying files to a volume yet.

simonw commented 2 years ago

For the first prototype it would be really fun to get datasette-tiddlywiki working.

simonw commented 2 years ago

Asked about copying files to a volume here: https://twitter.com/simonw/status/1484419391475236864

Idea for a workaround: an authenticated Datasette plugin that supports uploading a SQLite file and stashing it in a directory is something I've wanted for a while already.

simonw commented 2 years ago

Frictionless authentication is going to be a big deal here: most Datasette plugins that write to a database reasonably expect users to be authenticated somehow.

So might need a good mechanism for setting up some kind of auth. Could even generate a password and output it to the console at deploy time (if user doesn't specify one).

simonw commented 2 years ago

Maybe --create-volume 1 to create a new 1GB volume, and --volume name-of-volume to mount an existing named volume.

simonw commented 2 years ago

OK, I figured out how this needs to work in:

I'm going to design it like this:

Here's a nasty edge-case: what should happen in the following example:

# Create and deploy a new instance
datasette publish fly fixtures.db --create-volume 1 -a tiddlywiki --install datasette-tiddlywiki --rw tiddlywiki
# App called "tiddlywiki" should now be live with a `/data/tiddlywiki.db` database attached in that volume

# But now we try to deploy again to the same named app, but without the --create-volume flag
datasette publish fly fixtures.db -a tiddlywiki --install datasette-tiddlywiki datasette-graphql

Here the first example creates a new volume called tiddlywiki_volume and attaches it to the deployment. But.. the second one should presumably still configure the existing application to mount that volume even though it wasn't specified in the command-line.

Which means the tooling needs to be able to spot when an existing app has volumes attached to it and re-mount that volume for future deployments - by including it in the generated fly.toml file.

I was hoping the output from fly apps list --json might help here, but it doesn't list my app as having any volumes even though it does:

    {
        "ID": "simon-tiddlywiki-3",
        "Name": "simon-tiddlywiki-3",
        "State": "",
        "Status": "running",
        "Deployed": true,
        "Hostname": "simon-tiddlywiki-3.fly.dev",
        "AppURL": "",
        "Version": 0,
        "Release": null,
        "Organization": {
            "ID": "",
            "InternalNumericID": "",
            "Name": "",
            "Slug": "personal",
            "Type": "",
            "Domains": {
                "Nodes": null,
                "Edges": null
            },
            "WireGuardPeers": {
                "Nodes": null,
                "Edges": null
            },
            "DelegatedWireGuardTokens": {
                "Nodes": null,
                "Edges": null
            },
            "HealthCheckHandlers": null,
            "HealthChecks": null,
            "LoggedCertificates": null
        },
        "Secrets": null,
        "CurrentRelease": {
            "ID": "",
            "Version": 0,
            "Stable": false,
            "InProgress": false,
            "Reason": "",
            "Description": "",
            "Status": "",
            "DeploymentStrategy": "",
            "User": {
                "ID": "",
                "Name": "",
                "Email": ""
            },
            "CreatedAt": "2022-01-21T23:05:53Z"
        },
        "Releases": {
            "Nodes": null
        },
        "IPAddresses": {
            "Nodes": null
        },
        "IPAddress": null,
        "Builds": {
            "Nodes": null
        },
        "SourceBuilds": {
            "Nodes": null
        },
        "Changes": {
            "Nodes": null
        },
        "Certificates": {
            "Nodes": null
        },
        "Certificate": {
            "ID": "",
            "AcmeDNSConfigured": false,
            "AcmeALPNConfigured": false,
            "Configured": false,
            "CertificateAuthority": "",
            "CreatedAt": "0001-01-01T00:00:00Z",
            "DNSProvider": "",
            "DNSValidationInstructions": "",
            "DNSValidationHostname": "",
            "DNSValidationTarget": "",
            "Hostname": "",
            "Source": "",
            "ClientStatus": "",
            "IsApex": false,
            "IsWildcard": false,
            "Issued": {
                "Nodes": null
            }
        },
        "Config": {
            "Definition": null,
            "Services": null,
            "Valid": false,
            "Errors": null
        },
        "ParseConfig": {
            "Definition": null,
            "Services": null,
            "Valid": false,
            "Errors": null
        },
        "Allocations": null,
        "Allocation": null,
        "DeploymentStatus": null,
        "Autoscaling": null,
        "VMSize": {
            "Name": "",
            "CPUCores": 0,
            "MemoryGB": 0,
            "MemoryMB": 0,
            "PriceMonth": 0,
            "PriceSecond": 0
        },
        "Regions": null,
        "BackupRegions": null,
        "Volumes": {
            "Nodes": null
        },
        "TaskGroupCounts": null,
        "ProcessGroups": null,
        "HealthChecks": null,
        "PostgresAppRole": null,
        "Image": null,
        "ImageUpgradeAvailable": false,
        "ImageVersionTrackingEnabled": false,
        "ImageDetails": {
            "Registry": "",
            "Repository": "",
            "Tag": "",
            "Version": "",
            "Digest": ""
        },
        "LatestImageDetails": {
            "Registry": "",
            "Repository": "",
            "Tag": "",
            "Version": "",
            "Digest": ""
        }

Thankfully it looks like fly volumes list -a simon-tiddlywiki-3 --json does show me what I need to know for this:

[
    {
        "id": "vol_wod56vj56dm4ny30",
        "App": {
            "Name": ""
        },
        "Name": "simon_tiddlywiki_volume_3",
        "SizeGb": 1,
        "Snapshots": {
            "Nodes": null
        },
        "Region": "sjc",
        "Encrypted": true,
        "CreatedAt": "2022-01-21T21:11:21Z",
        "AttachedAllocation": {
            "ID": "",
            "IDShort": "057f0672",
            "Version": 0,
            "TaskName": "app",
            "Region": "",
            "Status": "",
            "DesiredStatus": "",
            "Healthy": false,
            "Canary": false,
            "Failed": false,
            "Restarts": 0,
            "CreatedAt": "0001-01-01T00:00:00Z",
            "UpdatedAt": "0001-01-01T00:00:00Z",
            "Checks": null,
            "Events": null,
            "LatestVersion": false,
            "PassingCheckCount": 0,
            "WarningCheckCount": 0,
            "CriticalCheckCount": 0,
            "Transitioning": false,
            "PrivateIP": "",
            "RecentLogs": null,
            "AttachedVolumes": {
                "Nodes": null
            }
        },
        "Host": {
            "ID": "c0a5"
        }
    }
]
simonw commented 2 years ago

I'm also going to add an integration test suite, similar to the one in s3-credentials (here), that exercises Fly directly so I can spot breaking changes better in the future.

simonw commented 2 years ago

Basic setup of integration suite is to add this to conftest.py:

import pytest

def pytest_addoption(parser):
    parser.addoption(
        "--integration",
        action="store_true",
        default=False,
        help="run integration tests",
    )

def pytest_configure(config):
    config.addinivalue_line(
        "markers",
        "integration: mark test as integration test, only run with --integration",
    )

def pytest_collection_modifyitems(config, items):
    if config.getoption("--integration"):
        # Also run integration tests
        return
    skip_integration = pytest.mark.skip(reason="use --integration option to run")
    for item in items:
        if "integration" in item.keywords:
            item.add_marker(skip_integration)

And then this in test_integration.py:

# These integration tests only run with "pytest --integration" -
# they execute live calls against Fly and clean up after themselves
from click.testing import CliRunner
import pytest

# Mark all tests in this module with "integration":
pytestmark = pytest.mark.integration

@pytest.fixture(autouse=True)
def cleanup():
    cleanup_any_resources()
    yield
    cleanup_any_resources()

def test_basic():
    pass

def cleanup_any_resources():
    pass
simonw commented 2 years ago
def cleanup_any_resources():
    proc = subprocess.run(["flyctl", "apps", "list", "--json"], capture_output=True)
    apps = json.loads(proc.stdout)
    app_names = [app["Name"] for app in apps]
    # Delete any starting with publish-fly-temp-
    to_delete = [app_name for app_name in app_names if app_name.startswith("publish-fly-temp-")]
    for app_name in to_delete:
        subprocess.run(["flyctl", "apps", "destroy",  app_name, "--yes", "--json"])
simonw commented 2 years ago

Moving this to a PR.

simonw commented 2 years ago

Wrote about it here: https://simonwillison.net/2022/Feb/15/fly-volumes/