superfly / litefs

FUSE-based file system for replicating SQLite databases across a cluster of machines
Apache License 2.0
3.96k stars 95 forks source link

Write Forwarding #56

Closed benbjohnson closed 1 year ago

benbjohnson commented 2 years ago

Currently, LiteFS supports a single primary node that performs all the writes. However, there are situations where it would be useful to have multiple nodes that can write—even if it means taking a performance hit. Two common examples are background jobs & out-of-band migrations.

This could work by having the primary handoff the write lock to another node temporarily:

  1. Given N₁ is the primary and N₂ is a replica.
  2. N₂ sends a request to acquire a write lock from N₁.
  3. N₁ acquires the write lock on behalf of N₂ and holds it for the duration of the request.
  4. N₂ ensures it has the latest transaction data.
  5. N₂ executes its write transaction locally.
  6. N₂ sends the LTX file for the transaction back to N₁
  7. If N₁ still holds the lock, it commits the transaction and notifies N₂
  8. If N₁ no longer holds the lock or is demoted, it rejects the transaction and notifies N₂

It is to be determined exactly how the lock handoff is requested by the client application. It could be transparent but that could cause users to experience slow performance if they are not correctly forwarding writes when they can. Maybe this should be a flag in the config to enable it?

kentcdodds commented 1 year ago

Am I to understand from this that you'd normally end up with much better write performance if you don't leverage this feature and instead use a fly-replay response to make sure writes happen in the primary?

I was hoping this feature would mean I could simplify my application code by not worrying about whether I'm in the primary region for writes, but based on the issue it looks like that would significantly impact write speed 😬

benbjohnson commented 1 year ago

Am I to understand from this that you'd normally end up with much better write performance if you don't leverage this feature and instead use a fly-replay response to make sure writes happen in the primary?

Yes, that's mostly correct. Writes will always be much faster on the primary itself.

This feature is for apps that have low write throughput (e.g. 10s of writes per second) that don't want to mess around with the .primary file. Ideally, you would setup a set of candidate nodes in a single primary region and then other nodes could redirect writes using fly-replay with the region (instead of a specific instance). That way it removes most LiteFS-specific application handling, except for checking for the current region.

For example, if you could decide to make ord your primary region and start 2 instances there with candidate set to true in your litefs.yml file. That will give you redundancy if one of the nodes goes down or you need to do a rolling deploy. Then you could start up a node per region on the edge that is a read-only replica (e.g. candidate: false). Those read-only nodes can set fly-replay: region=ord on any write requests and it'll go to one of the two nodes in ord. If it goes to the primary in ord then it simply performs the write but if it goes to the replica in ord then write forwarding will let LiteFS transparently handle the write (although at slower performance).