Open andyhorng opened 1 year ago
That's a good idea. The biggest hurdle right now is that using Consul makes it so your app needs to know how to redirect requests to the primary and that can get a little wonky—especially when running migrations as part of a deployment.
I'm reworking the docs to focus more on using the static leasing (where a single node is always primary). That's much simpler to get working for application developers. The trade-off is that you'll have some write availability loss during deploy but we have improvements coming to apps to reduce deploy times significantly.
We have also have some improvements on dynamic leasing with Consul coming in the near future so it won't be such a pain. :)
Good direction focusing on static leasing. Much simpler setup/adoption story for your typical dev adopter to reconfigure manually per deployment (especially when deployments happen multiple times per day) versus also provisioning consul.
Sounds great! I'm excited to see your work. The static lease is a really good idea, I like the trade-off, and it's well-suited for most of the web apps I've developed.
I'm curious about how the static leasing will work. Does Fly provide any mechanism for us to have different environment variables if it's the primary node? That seems like a good way to implement it.
Does Fly provide any mechanism for us to have different environment variables if it's the primary node?
We don't currently have a concept of "primary node" inside Fly.io since many applications used our nodes ephemerally. We have talked about add that though.
The easiest way to run a static setup is decide on a primary region and deploy 1 node there. Set an environment variable called PRIMARY_REGION
with that region and then you can reference it as ${PRIMARY_REGION}.${FLY_APP_NAME}.internal
We are also rolling out a new version of Fly Apps that will have more stable hostnames so you'll be able to reference those instead.
I am trying to get this working right now, but the hostname
option has to be set on the replica nodes to the hostname of the primary--something I don't know until after I've deployed on Fly.
-- EDIT --
I think I get it; you're suggesting to not fly-replay
to a specific instance but to a region. So check for the .primary file but don't use the contents for now.
I think I get it; you're suggesting to not fly-replay to a specific instance but to a region. So check for the .primary file but don't use the contents for now.
Yes, that's correct. ${PRIMARY_REGION}.${FLY_APP_NAME}.internal
should work to reference the region (assuming you set the PRIMARY_REGION
environment variable).
@jwhear If you post the litefs.yml
config file then I can give feedback on that too.
This combo is working is working for me:
# Directory where the application will find the database
fuse:
dir: "/data"
# Directory where the LiteFS will actually place LTX files
data:
dir: "/var/lib/litefs"
retention: "1h"
retention-monitor-interval: "5m"
lease:
type: "static"
candidate: ${FLY_REGION == "dfw"}
advertise-url: "http://${PRIMARY_REGION}.${FLY_APP_NAME}.internal:20202"
# In production we want to exit on error and thus restart the instance
# On staging leave everything as-is so that we can SSH in and diagnose
exit-on-error: ${FLY_APP_NAME != "work-staging"}
In the app, if the request would cause a DB write and the .primary
file exists, set fly-replay: region={PRIMARY_REGION}
and send an empty response.
Nice. Yeah, that looks good. You can also compare two environment variables so you could do:
lease:
candidate: ${FLY_REGION == PRIMARY_REGION}
That's a good idea. The biggest hurdle right now is that using Consul makes it so your app needs to know how to redirect requests to the primary and that can get a little wonky—especially when running migrations as part of a deployment.
I'm reworking the docs to focus more on using the static leasing (where a single node is always primary). That's much simpler to get working for application developers. The trade-off is that you'll have some write availability loss during deploy but we have improvements coming to apps to reduce deploy times significantly.
We have also have some improvements on dynamic leasing with Consul coming in the near future so it won't be such a pain. :)
Thank you for this great project!
I can't use Consul, and i use LiteFS as a library, then just create a RAFT based leasing to leader election and forward requests.This is incorporated to a boilerplate code generation tool called sqlc-grpc. This is experimental and i need to monitor and check the trade-offs. There are plans to embed an HA leasing system without external dependencies into LiteFS?
There are plans to embed an HA leasing system without external dependencies into LiteFS?
I'm not sure. Having run Raft-based clusters, it's kind of a pain. Also, adding distributed consensus to your application nodes can be problematic when they're under high load as they can loose leadership easily. I've found that moving the leasing system off the application nodes is usually a good approach.
That makes sense. Thank you!
I'm planning a cutover of my staging environment shortly as 0.3 appears to be good enough feature-wise.
Some questions around reliability though: 1) Has there been any data-consistency / long-term testing / production use? From my preliminary testing, the failover on node down seems very solid with consul and haven't had any issues with replication so far on dev. Key word here is so far. 2) When is the approximate ETA for 0.4? 1 month? 6 months? 1 year? I see there are a few fixes on 0.3 features in the trunk, so I suppose I could run off that. 3) Will 0.4 be a drop-in replacement for 0.3?
Has there been any data-consistency / long-term testing / production use?
We use LiteFS internally at Fly.io but we use it with the static lease because it works better for our particular set up. We do a long-running chaos test with the Consul lease for each PR too which runs a geographically spread-out cluster where nodes are randomly killed every few minutes.
@kentcdodds has been running with the Consul lease for a while too. He may be able to chime in with additional information.
It's always good to have additional backups though. If you're running on Fly.io, we take automatic snapshots every 24 hours. LiteFS v0.4.0 will also have a litefs export
command to perform a fast snapshot to disk. That can be good if you want to take more frequent backups.
When is the approximate ETA for 0.4?
I'm expecting the v0.4.0 release to go out early next week. I was expecting to get it out in March but we had some internal projects that I needed to jump on temporarily.
Will 0.4 be a drop-in replacement for 0.3?
From a data file perspective, yes. There's nothing to upgrade in that sense. However, some backwards incompatible internal API changes needed to be made to support compression (#132) so you can't run a mixed-version cluster (e.g. v0.3.0 nodes with v0.4.0 nodes).
There are also some minor cosmetic changes in the litefs.yml
configuration file and the FUSE library has been upgraded so it requires fuse3
instead of just regular fuse
on your VM.
I forgot that I scaled down to 1 node a while back so most of the time I've been running Consul with a single node and infrequent deploys. I just scaled up to 9 instances and it started up without issue. I'll report back if I have any trouble. I'm currently running sha-9ff02a3
which is some iteration of v0.4.0 (mostly because I like and use the http-proxy). With all that context, I've been running issue-free for a couple months. Happy with that. I'll let you know if I run into any bumps now that I'm back to multi-instance.
@kentcdodds - cool. A few more questions if you have the time. 1) Are you nomad or on machines? I'm trying to figure out if I need to run my own consul cluster right now. There seems to be some graphql api that allows it to be enabled. 2) What's the approximate write qps that you're dealing with?
@tjheeta, first I'll say that I'm a bit in over my head when it comes to infra. I just have very particular requirements I have placed upon myself and I suppose that comes with being the guinea pig for some new tech even when I don't really know what I'm doing.
My use case is my personal website: https://kentcdodds.com
So while I am pretty small scale, don't close the tab yet. I actually have some pretty unique features to my site and I get a lot of traffic for a personal website. I also publish my analytics: https://kcd.im/kcd-fathom as well as my real-world performance metrics: https://kcd.im/kcd-metronome
My site is also completely open source so you can take a look at the source code as well if you like: https://github.com/kentcdodds/kentcdodds.com
With that context...
So yeah, not very large scale I'm afraid. I look forward to more people throwing heavier scale at LiteFS. I'm confident it can handle it.
I hope that's helpful!
Oh, I should also mention that I've got two databases in LiteFS on my site. The cache is probably in the range of upper hundreds of queries per second, possibly thousands. Still, smaller scale, but definitely more than "simple blog" level stuff I think :)
@tjheeta I've worked with Kent a bit on debugging his site since he was a super early adopter of LiteFS so I can try to answer a bit.
Are you nomad or on machines?
He's on nomad (apps v1) so he's using the multi-tenant Consul cluster we provide. We'll be making that available on machines & apps v2 soon but I don't have an ETA for that.
What's the approximate write qps that you're dealing with?
Kent answered this already but I'll give a little extra info. LiteFS on FUSE is typically good for tens of write transactions per second because the FUSE overhead of the syscalls is fairly high.
I am planning to make a VFS version of LiteFS available in the near future. That will eliminate that syscall overhead but I would expect throughput to be about 50% of the normal SQLite write throughput since it has double the writes (once to the WAL & once to the LTX transaction file). I think that should handle in the thousands of writes per second once it's decently optimized. My plan is to make it so you simply run a load_extension()
and all the other LiteFS configuration is the same.
Thanks for all the information. Staging environment is running on litefs now out of 3 regions and using the multi-tenant consul provided by fly on machines. Not sure that I should be, but I am. There was a bit of a hiccup bringing a cloned machine up, but I could not isolate it to litefs.
LiteFS on FUSE is typically good for tens of write transactions per second because the FUSE overhead of the syscalls is fairly high.
That is surprisingly low, however, still should work for my use case.
He's on nomad (apps v1) so he's using the multi-tenant Consul cluster we provide. We'll be making that available on > machines & apps v2 soon but I don't have an ETA for that.
Found out that it is possible to enable the consul cluster on machines via https://api.fly.io/graphql . This doesn't give a FLY_CONSUL_URL, but the url is usable.
Request:
mutation{
enablePostgresConsul(input:{appId: "someappid"}) {
consulUrl
}
}
Response
{
"data": {
"enablePostgresConsul": {
"consulUrl": "https://someurl"
}
}
}
However, given all the recent hubbub around reliability/consul, should the multi-tenant fly consul be used?
That is surprisingly low, however, still should work for my use case.
Yeah, the long-term aim is to provide a seamless experience for smaller applications and those tend to have very low write throughput. Honestly, many web applications are less than 1 write/sec. Reads are still quite fast as SQLite can leverage the OS page cache.
Medium-sized applications that need higher write throughput will need to load an extension, which isn't too hard, but also not quite as seamless.
However, given all the recent hubbub around reliability/consul, should the multi-tenant fly consul be used?
We don't have any plans to discontinue the multi-tenant Consul support. Multi-tenancy always has its trade-offs though so you may have better reliability if you ran your own Consul app.
I'm looking to have the primary stay only in a few regions. I see there are a few issues that potentially may do what I'm looking for:
https://github.com/superfly/litefs/issues/176 https://github.com/superfly/litefs/issues/178
It doesn't look like there's an IN operator right now:
candidate: ${FLY_REGION in ["dfw", "ord", "abc"]}
but is there an OR operator currently?
candidate: ${FLY_REGION == "dfw"} || ${FLY_REGION == "ord"} || ${FLY_REGION == "abc"}
I'm hesitant to add more complicated expression parsing because it opens up a can of worms (e.g. if there's OR
then there should be AND
and probably parenthesis). I think the best option is to set an environment variable in a bash script and then embed that in the config.
#!/bin/bash
if [ "$FLY_REGION" = "dfw" ] || [ "$FLY_REGION" = "ord" ]; then
export LITEFS_CANDIDATE=true
else
export LITEFS_CANDIDATE=false
fi
exec litefs mount
And then in your litefs.yml
:
lease:
candidate: ${LITEFS_CANDIDATE}
(Pardon my terrible bash scripting)
I don't have anything particularly useful to say, but just wanted to mention that I've been using litefs on https://coolstuff.app for 6-ish months now with essentially zero issues (beyond bugs I introduced myself with things like fly-replay
and some minor issues pre 0.3).
We're still in private beta so traffic is low, but regular usage happens in two regions on v1 apps (syd
and yyz
and will be adding Europe soon), so it's not a super trivial setup. Our primary is in yyz, with fly-replay
used to redirect mutable requests and we run background workers which handle all their mutation via litefs file descriptor locking (I wrote a simple Python library to handle this which I've been meaning to open source). Beyond that, it's just regular SQLite, so it all just works.
I know it's still pre-production technically, but litefs has been solid for me so far and have no regrets using it.
Thanks for the feedback, @ben-davis. 🙏 I just wanted to say that coolstuff.app is super snappy! I'm glad to hear LiteFS is working well for you.
I already use "Litestream" as my primary database solution with Fly.io. This architecture is very simple and I really like it. However, the deployment time takes too long, and I can't do a rolling restart because of the volume needs to be unmounted and remounted to the new instance.
So, my question is, I am aware that LiteFS is currently in beta and not recommended for use in production. However, in my case, I believe it is worth trying. I would like to know what potential problems or caveats I might encounter when using it in production and how to avoid them.
Maybe we can list out those potential problems and create a document for others who are also considering using LiteFS in production.