question: replication over unreliable Networks / network partitions / disconnects

aldas commented 1 year ago

This is same question as https://github.com/benbjohnson/litestream/discussions/143

How good/bad would litefs work in situation where there are unreliable network connections involved.

For example: we have a fleet of vessels that have cellular network near port (99+% of time) but there are occasions when those vessels are so far off at sea that there are no persistent connection. This can last from couple of hours (most of the time) to extreme a week. Our usecase has several edge databases (or tables) that need to sync their data to central one.

I would guess it is same as network partition between N servers in usual data center setup.

I assume answer is pretty much same as for Litestream - it should work without problems, unless WAL hits the disk limits?

benbjohnson commented 1 year ago

@aldas LiteFS should work fine. It'll print out a bunch of log messages saying that it can't connect but you will still have read availability on the disconnected node.

One thing you'll need to adjust is the retention period. By default, it's 10 minutes. That's how long transaction files are stored on disk. If your database doesn't have many writes then you could increase this to hours (or days or weeks). We've merged in LZ4 compression (#249) too so that should help with disk space. If a node reconnects and the next transaction file is no longer available, then it will do a full snapshot of the primary and resume.

Eventually, we'll add compaction for the retained transaction files so they don't take up too much space over longer periods of time.

aldas commented 1 year ago

Thank you!

superfly / litefs

question: replication over unreliable Networks / network partitions / disconnects #254