Closed ccll closed 1 year ago
Sorry my bad, just found out there is a solution to the db migration problem. https://github.com/superfly/litefs/issues/56
Thanks for the great work! I'll dig that solution later.
Also I'd like to contribute some of my naive thoughts on the problem.
If we could force some node to be elected or manually specified as primary, then we could run db migration on the proper node.
One method came to my mind was to use the candidate
setting, if we could let LiteFS live reload its config file, through SIGHUP, then we could transfer the primary role during rolling updates.
For e.g we have 5 v1 app nodes running and 1 of them if primary:
candidate = true
, it should join the cluster as a replicacandidate = false
kill -SIGHUP
to let v1 nodes live reload the config (meanwhile still serving read requests)candidate = true
for future failover)Another method could be let user manually update the Consul KV to force a node as primary, and force all LiteFS nodes reconnect to this new primary. (I'm wondering if this works already as I haven't dig into the code yet :)
These methods would bring some write downtime during the switch of the primary, so just my 2 cents.
@ccll I agree that forcing the promotion of a node would be ideal for migrations. There's one issue related to handing off the primary (#11) that's sorta related but I added another one (#299) around using the new litefs run
command to force a promotion on candidate nodes so you can run your migration script.
Ideally, you'd need to deploy to one of your candidate nodes first so they can apply the migration changes immediately. However, it's best if your migrations can work with both the prior version of your application and the new version of your application. Distributed databases are a pain. :)
According to the doc, only the replica nodes that connected to a primary will have the '.primary' file.
https://fly.io/docs/litefs/primary/
Then my question is how to distinguish a node from the two cases?
The scenario is when doing database schema migrations, I guess it should run on the primary node right? And this is not a network request so I can't simply redirect it to the primary node, so I need a way to detect on every node and run the migrations only on the primary node.
For now I can
curl
the Consul API to see if current node is primary, but it's not as convenient as a local '.primary' file, how about having this file on every node including the primary?