Closed benbjohnson closed 1 year ago
Am I to understand from this that you'd normally end up with much better write performance if you don't leverage this feature and instead use a fly-replay
response to make sure writes happen in the primary?
I was hoping this feature would mean I could simplify my application code by not worrying about whether I'm in the primary region for writes, but based on the issue it looks like that would significantly impact write speed 😬
Am I to understand from this that you'd normally end up with much better write performance if you don't leverage this feature and instead use a fly-replay response to make sure writes happen in the primary?
Yes, that's mostly correct. Writes will always be much faster on the primary itself.
This feature is for apps that have low write throughput (e.g. 10s of writes per second) that don't want to mess around with the .primary
file. Ideally, you would setup a set of candidate nodes in a single primary region and then other nodes could redirect writes using fly-replay
with the region
(instead of a specific instance
). That way it removes most LiteFS-specific application handling, except for checking for the current region.
For example, if you could decide to make ord
your primary region and start 2 instances there with candidate
set to true
in your litefs.yml
file. That will give you redundancy if one of the nodes goes down or you need to do a rolling deploy. Then you could start up a node per region on the edge that is a read-only replica (e.g. candidate: false
). Those read-only nodes can set fly-replay: region=ord
on any write requests and it'll go to one of the two nodes in ord
. If it goes to the primary in ord
then it simply performs the write but if it goes to the replica in ord
then write forwarding will let LiteFS transparently handle the write (although at slower performance).
Currently, LiteFS supports a single primary node that performs all the writes. However, there are situations where it would be useful to have multiple nodes that can write—even if it means taking a performance hit. Two common examples are background jobs & out-of-band migrations.
This could work by having the primary handoff the write lock to another node temporarily:
It is to be determined exactly how the lock handoff is requested by the client application. It could be transparent but that could cause users to experience slow performance if they are not correctly forwarding writes when they can. Maybe this should be a flag in the config to enable it?