Closed jimpick closed 5 years ago
I used the new peerPadDevTools.dumpState() facility from the browser devtools console to capture the state in base64 format from Chrome, where I had a good editing session, and Firefox, where I had a corrupt editing session:
These files can be viewed like this:
$ cat state-chrome-good.txt | bin/peerpad-print-dump | head
Value:
Dynamic Data and Capabilities WG — December 11th, 2018
- **Lead:** @pgte
- **Notetaker:** @jimpick
- **Attendees:**
- @pgte
- @arkadiy
- @jimpick
$ cat state-firefox-bad.txt | bin/peerpad-print-dump | head
Value:
- Required for reasonable pinner functionality- /3
Versioned graph synchronization implementation supports multiple named graphs- Versioned graph synchronization proposal
https://github.com/ipld/replication/pull - IPNS Multi-Writer style API
- Simple version graph synchronization pinner
- Started using IPLD structures (go-ipld-cbor) in the communication protocol
- Moving away from Protobufs at the recommendation of the libp2p team
(you can see that the state isn't the same for some reason)
@jimpick I'm beginning to suspect that this may be a problem related to Firefox. In peer-base our current tests are Chrome-only. I'll do some work to do tests in Firefox also.
Here is the diff:
https://ipfs.io/ipfs/QmUDcL9me5g1jUtRV3XBRKAYcSykqV8SVkRQzLAPFjG98t/chrome-firefox.diff
If I take the state captured from Chrome, and then apply the state from Firefox to it, using the 'rga' CRDT, I get a different result than if I do it in the opposite order. That tells me that the CRDT state is corrupt / inconsistent.
To get to the bottom of this, I'm going to dig into the data to figure out what is causing the 'join' algorithm to give different results. That might give some clues about where to look in the code to find the source of the problem.
I made a temporary repo so I can try to create a reduction of the problem down to a manageable size...
Searching for clues ... the rga array appears to be broken into 21 separate chunks that are almost randomly 'shuffled' between the state captured from chrome vs. the state captured from firefox
https://github.com/jimpick/investigate-rga/blob/master/ranges.txt
I was hoping it might be a little more simpler, thus easier to replicate possible failure modes. Without the history, it's hard to guess the sequence of events that caused this.
I'm playing around with the rga CRDT a bit ...
Just a hunch, but I can see that if a delta that depends on a state (eg. an .insertAt()) gets applied elsewhere before the dependent state is applied, things get corrupted.
Here's an example where I try to do some bad things:
https://github.com/jimpick/investigate-rga/blob/master/break-rga.js
It looks like if I apply the delta from the insertAt() 'too early', the resulting string is wrong in 50% of the test runs.
@jimpick This is great!
The result of my local test run agains the latest delta-crdts v0.8.0 is the following:
→ npm ls delta-crdts
peer-base@0.11.0 /Users/pedroteixeira/projects/ipfs/peer-base/peer-base
└── delta-crdts@0.8.0
→ node test.js
ABC: abc
DEF: def
ABCDEF (1): defabc
ABCDEF (2): defabc
X: X
ABCDEFX (1): Xdefabc
ABCDEFX (2): Xdefabc
ABCDEFX (3): Xdefabc
ABCDEFX (4): Xdefabc
ABCDEFX (5): Xdefabc
ABCDEFX (6): Xdefabc
ABCDEFY (1): defYabc
ABCDEFY (1a): defYabc
ABCDEFY (2): defYabc
ABCDEFY (3): fYdeabc
ABCDEFY (4): fYdeabc
ABCDEFY (5): fYdeabc
ABCDEFY (6): defYabc
X State [ Map { null => null, 'kgGheA==' => 'X' },
Set {},
Map { null => 'kgGheA==', 'kgGheA==' => null } ]
Y Delta [ Map { 'kgOjZGVm' => 'f', 'kgeoYWJjZGVmeTE=' => 'Y' },
Set {},
Map {
null => 'kgOjZGVm',
'kgOjZGVm' => 'kgeoYWJjZGVmeTE=',
'kgeoYWJjZGVmeTE=' => null } ]
DEFY (1): defY
DEFY (2): fYde
ABCY (1): fYabc
ABCY (2): fYabc
This contrasts with v0.7.1, which is clearly wrong:
→ npm i delta-crdts@0.7
npm http fetch GET 304 https://registry.npmjs.org/delta-crdts 3266ms (from cache)
npm http fetch GET 304 https://registry.npmjs.org/hash-it 90ms (from cache)
npm http fetch GET 304 https://registry.npmjs.org/array.prototype.flatmap 148ms (from cache)
npm http fetch GET 304 https://registry.npmjs.org/curriable 61ms (from cache)
npm http fetch GET 304 https://registry.npmjs.org/fast-stringify 86ms (from cache)
npm http fetch GET 304 https://registry.npmjs.org/json-prune 111ms (from cache)
npm http fetch POST 200 https://registry.npmjs.org/-/npm/v1/security/audits/quick 6014ms
npm WARN acorn-jsx@5.0.1 requires a peer of acorn@^6.0.0 but none is installed. You must install peer dependencies yourself.
+ delta-crdts@0.7.1
removed 1 package, updated 1 package and audited 55287 packages in 41.626s
found 4 low severity vulnerabilities
run `npm audit fix` to fix them, or `npm audit` for details
pedroteixeira@MacBook-Pro-4:~/projects/ipfs/peer-base/peer-base (master)
→ node test.js
ABC: abc
DEF: def
ABCDEF (1): defabc
ABCDEF (2): defabc
X: X
ABCDEFX (1): Xdefabc
ABCDEFX (2): Xdefabc
ABCDEFX (3): Xdefabc
ABCDEFX (4): Xdefabc
ABCDEFX (5): Xdefabc
ABCDEFX (6): Xdefabc
ABCDEFY (1): defYabc
ABCDEFY (1a): defYabc
ABCDEFY (2): defYabc
ABCDEFY (3): dfYeabc
ABCDEFY (4): dfYeabc
ABCDEFY (5): dfYeabc
ABCDEFY (6): defYabc
X State [ Map { null => null, '-1fxv8_T--LgS1I' => 'X' },
Set {},
Map { null => '-1fxv8_T--LgS1I', '-1fxv8_T--LgS1I' => null } ]
Y Delta [ Map {
'--ezTmO5G_OV-kXa_j' => 'f',
'-9jrgaOWNaCZOLOtBF1B6S8K' => 'Y' },
Set {},
Map {
null => '--ezTmO5G_OV-kXa_j',
'--ezTmO5G_OV-kXa_j' => '-9jrgaOWNaCZOLOtBF1B6S8K',
'-9jrgaOWNaCZOLOtBF1B6S8K' => null } ]
DEFY (1): defY
DEFY (2): dfYe
ABCY (1): fYabc
ABCY (2): fYabc
@jimpick could you check this against v0.8 on your end?
@jimpick also, could you PR these tests into js-delta-crdts?
Closing this, as we appear to have solved the corruption problems.
During this morning's DDC WG update, we tried to use peer-pad to take the team notes.
Some browser sessions had the correct document, and worked as expected.
Some browser sessions didn't have the entire document, or may have been corrupted.
Jim has an example of both types of browser sessions.
Figure out a way to capture the state from both browser sessions so we can do some forensic analysis.