peer-base / js-delta-crdts

Delta State-based CRDTs in Javascript
192 stars 16 forks source link

WIP: Changes to rga compareIds() method #36

Closed jimpick closed 5 years ago

jimpick commented 5 years ago
jimpick commented 5 years ago

@dirkmc pointed me at a random failure we were seeing in CI, eg.

https://travis-ci.org/peer-base/peer-base/builds/480105251#L1070

It was somewhat random. The problem didn't often manifest itself until 8 or more peers were participating in a collaboration. The more peers, the more likely the replicas would diverge.

I spent the weekend figuring out how to reproduce the problem reliably. I found if I scaled the collaboration to 15 peers, it would almost always fail. I tweaked a number of parameters, and managed to get it to fail with just 8 peers, which was much easier to collect data from. I traced all the rga CRDT .join() calls and dumped them in serialized format to stdout so I could do some forensics. Then, by hand, I recreated the sequence of calls, and identified where the inconsistency was creeping in. You can see my manual work here:

https://github.com/jimpick/delta-crdt-ordering-demo/blob/master/reconstruct3.js

I reduced that down to a simpler test case, and debugging it interactively:

https://github.com/jimpick/delta-crdt-ordering-demo/blob/master/reconstruct3a.js

There were a couple of problems in the compareIds() method which I fixed in this patch.

I still need to write some test cases, and fix the existing test cases, which are currently broken.

With this patch, I'm able to scale up to 20 peers writing simultaneously.

I did a successful test with 12 peers, each writing 500 characters each, for a total document size of 6000 characters!

I'm currently running a 20 peer test with 1000 characters each ... I had to slow down the speed of the input - hopefully I'll know if it passed by the morning. :-)

pgte commented 5 years ago

@jimpick since these are small changes, I preemptively merged to master so I can also easily run some tests locally.

pgte commented 5 years ago

I can confirm, there were some random failures happening in peer-pads e2e stress tests that now don't seam to happen any more. :)

@jimpick do you think you could make CI green again on master? (invite sent)

dirkmc commented 5 years ago

Wow, fantastic work Jim!