Replication issues - Githubissues

ssbc / ssb-db

A database of unforgeable append-only feeds, optimized for efficient replication for peer to peer protocols

https://scuttlebot.io/

MIT License

1.17k stars 75 forks source link

Replication issues #12

Closed pfrazee closed 10 years ago

pfrazee commented 10 years ago

Setup: From a fresh start, NodeA/FeedA had 3 messages (create, "First post", "Second post"). NodeB follows FeedA; NodeB adds NodeA to network.

Replication run 1: NodeB is informed the current sequence for FeedA is 3. NodeA then only sends seq 1 and 2 (wrong behavior!). NodeB correctly processes.

Replication run 2: NodeB informs NodeA that local sequence for FeedA is 2; NodeA tells NodeB latest for FeedA is 3. NodeA then sends NodeB seq 2. NodeB errors out with "sequence out of error, expected 3, got 2", which is correct.

Looks like NodeA is not sending the last message in the feed. Off by one error?

pfrazee commented 10 years ago

Progress...

There are two problems.

NodeA creates a history stream and adds it to the replication pull-stream (elegant!). It starts the history stream from the given latest sequence. NodeB told NodeA it has seq 2 for FeedA, so NodeA starts history stream at seq 2. NodeA thus sends NodeB seq 2, when NodeB expects seq 3.

Updated https://github.com/dominictarr/secure-scuttlebutt/blob/master/replicate.js#L25 to read source.add(sbs.createHistoryStream(data.id, data.sequence+1, opts.live)) PR: #11

Replication run 3: out-of-order (off-by-one error) solved, replication finishes. That's problem 2.

Good, so start fresh and run again. Now NodeA always emits the right messages, but NodeB doesn't always get them. Each run makes some indeterminate amount of progress. Race condition for handling received messages? Debugging...

pfrazee commented 10 years ago

Had to break out some packet sniffers on this one.

It looks like the receiving end is breaking the duplex stream when it caps it's many-source. (NodeA gets all messages on the wire, but NodeB stops receiving them.) If I comment out the many() cap call in replicate() for NodeB, everything transfers.

I've been digging through this a bit, but I'm not sure how to solve it. Pull-many needs to cap but not end the duplex stream. Ideas...

not call many cap() and just leave the connection open,
not call many cap() until all syncing has finished,
pass the { live: true } option in to replicate() and keep the history stream open