Open huahaiy opened 3 years ago
Any idea on how to do initial replication ?
Scenario: We have a datalevin db that has a certain size (few GB's to take some time to copy over to replica).
A new replica pops up and will need to catch up on the main server - copy the data over.
While copying the data, the main server also receives new write requests.
How would this be handled using lmdb database ? The replica is in process of copying but also needs to apply transactions.
The only algorithms I can imagine imply some sort of snapshotting + transaction log keeping + shipping.
Sure. The copy is on a snapshot. So transactions after that snapshot needs to be buffered, then applied after the copy is completed.
I think a good way to copy this might be if we get access to the lmdb pages and also some hooks when these change. I imagine that if we scan the pages and hash them locally (which should be fast) and send these over the wire it's going to make things easier to cope with. Working with 4k pages might be easier then working with the individual keys and values.
WDYT?
Pages are low level implementation details that LMDB does not expose an API for.
Pages are not necessarily larger than individual values either, e.g. one could have a 500mb document as a value that spans many pages.
One idea to implement read-only replication is to implement a mdb file parser like https://blog.separateconcerns.com/2016-04-03-lmdb-format.html .
Use that to sync over the database as pages. The mdb file contains information about what pages are free / and the latest transaction id. I think this might be enough to sync a db if it has gone offline and missed any transactions.
Not sure if this is going to be good enough. Thought it might be usefull to know.
One idea is to use Apache Arrow related formats for this, so it may enable easier analytics.
Before we implementing fully distributed store feature, maybe we could have a read-only replica for the server.
Although we do not have WAL, but we can propagate the transaction messages to the replicas just as well: e.g. the master will be a client of the replicas, and it just forwards all the transaction messages it receives to the replicas.