Explicit tooling support or documentation for out-of-band migrations

superfly / litefs

FUSE-based file system for replicating SQLite databases across a cluster of machines

Apache License 2.0

4.06k stars 96 forks source link

Explicit tooling support or documentation for out-of-band migrations #268

Open irskep opened 1 year ago

irskep commented 1 year ago

I'm exploring options for hosting side projects super cheaply, so fly.io + LiteFS seem like a great route to take. I have a very simple Python/Flask app.

A very common thing to do while developing web apps is change the database schema outside the application process, often completely separately from a deploy. It seems like this is very challenging to do with LiteFS, because there's no way to temporarily claim ownership of write operations and apply migrations safely.

I added some code to my app that tries to run whatever migrations it can on startup, but I worry that this is some combination of buggy, flaky, dangerous, and error-prone. I have no idea if it's the right thing to do.

So my request is, could we have some documentation on best practices for running migrations, or maybe some tooling support? I love the idea of running SQLite in production for a small web site, but worrying about this is pointing me back toward Postgres just to get a somewhat normal and safe database migration workflow.

benbjohnson commented 1 year ago

@irskep Running one-off transactions like migrations is a pain right now. I'm actually working on write forwarding (#56) right now to solve this issue. It'll allow you to run write transactions from any node by temporarily borrowing the write lock, updating the database, and then sending the change set for the transaction back to the primary. It'll be released with v0.4.0 in the next few weeks.

I'm keeping this open and tagging it to that release to remind me to write up docs about best practices on this too. Thanks!

irskep commented 1 year ago

It sounds like getting a shell where you can run your migrations will still be an exercise for the reader. If that's the case, it looks like my fly.io app is better off just using SQLite with a persistent volume and taking the performance hit. Assuming that's all true, it might be worth an additional callout in the docs. The fly.io blog pushes LiteFS pretty aggressively, so it would be helpful to have tempered expectations if certain workflows are complicated to implement.

(Still a cool project though, I am watching this space!)

Edit: Got a bit farther; I assume the fly.io workflow will ultimately be something to do with flyctl ssh console, but it's still not clear to me how you'd be able to run a script inside litefs mount in a container, since the container has already bound the web process to the port it would use to communicate. It's likely I'm still misunderstanding something.

benbjohnson commented 1 year ago

Got a bit farther; I assume the fly.io workflow will ultimately be something to do with flyctl ssh console

Yes, that's correct. If you run fly ssh console then you can run the sqlite3 CLI against the SQLite database:

# The -s flag allows you to select one of your VMs.
fly ssh console -s

Once you're in the VM, you can run:

sqlite3 /litefs/my.db

Then you can run SQL commands just like a regular SQLite database.

Right now, you'll need to ssh into the primary node which is not terribly intuitive from the CLI. Once write forwarding is done, you'll be able to ssh into any node and run your script.

it's still not clear to me how you'd be able to run a script inside litefs mount in a container, since the container has already bound the web process to the port it would use to communicate

I'm not quite clear on this point. Once you fly ssh console in you'll be in a regular shell inside your VM. You can execute any scripts you need from here. If you need to upload a script, you can use fly sftp shell to perform SFTP commands (like put).

irskep commented 1 year ago

Thanks for the detailed instructions! I really appreciate the time you're taking to help me work through my misunderstanding.

I think I understand where my confusion came from. The litefs-example Dockerfile runs the service inside litefs mount. I think I misunderstood this to mean that in order for LiteFS to work, it needs to run the service in a subprocess. But what the comment actually says is that running the service inside litefs mount just ensures that the litefs filesystem is mounted before the service runs.

I brought up the bound port because I tried to run litefs mount -- <a script> and got an error related to that. But I could have just run the script directly (assuming write forwarding was implemented, or I was on the primary already) and it would have worked fine, because the litefs filesystem was already mounted.

I hope this gives you some insight into the misunderstandings and misconceptions medium-competent learners might have when reading your docs. :-) My expertise is in application code and I only dabble in deployment.

benbjohnson commented 1 year ago

@irskep Thanks for the explanation. It really helps to hear about where folks have issues in detail. That's an easy thing to get tripped up on. I'll get that clarified in the docs.