superfly / litefs

FUSE-based file system for replicating SQLite databases across a cluster of machines
Apache License 2.0
3.78k stars 89 forks source link

New writing data is not replicated when update database in primary without any replica connected #282

Closed legionxiong closed 1 year ago

legionxiong commented 1 year ago

This bug cloud be easily reproduced via:

1. Prepare 3 litefs node using consul lease, like Node-A, Node-B and Node-C, A is the primary, B and C are replicas;
2. Disconnect B and C from primary (stop replicas' litefs process);
3. Update database in A and then stop litefs;
4. Start litefs of B and C,  B or C becomes new primary;
5. Start litefs of A( A becomes a replica).

Expected: The data updated in step 3 replicated to other nodes. Actual: The data updated in step 3 was cleared.

client transaction id (0000000000000037) exceeds primary transaction id (0000000000000034), clearing client position
transaction file for txid 0000000000000001 no longer available
benbjohnson commented 1 year ago

@legionxiong Sorry, I thought I had responded to this issue already. We could reconcile that server A is ahead of B/C and forward the missing transactions. However, it's more typical that B/C will have accepted other writes in the time that A comes back. There's also edge cases where A could be much further ahead than B/C and it'd be difficult to ensure that the lineage hasn't been broken.

We do have plans for synchronous writes which will provide better guarantees about windows of transaction loss. I'm going to close this issue in lieu of that as this seems like adding a fair bit of complexity for a relatively rare scenario.