Closed staltz closed 3 years ago
Some users said that Manyverse v0.2108.2 was working fine, and that one had
8.0.4
7.0.3
Here are the diffs between
Learned some really strange things about this bug!
I kept two terminal tabs open, one that was constantly monitoring the contents of the ebt file on the phone, and the other was constantly monitoring the ebt file on the desktop. Once, the @-UMK
file on the desktop showed 8592
instead of -1
! Then I disconnected the peers (by shutting down the mobile peer), and the desktop went back to showing the number -1
in the file.
Note that the number went to 8592
but never to 8601
which would have been the correct "latest" content. And also the above experiment didn't happen consistently. I started both peers again and again and only SoMeTiMeS did it do the sudden jump to 8592
. Most of the times it stayed on -1
.
A couple of times, one of the files (either the mobile or the desktop) would suddenly become all blank, like the whole file was just the JSON {}
. Then, sometime later, randomly, it would go back to usual.
Finally, I took the nuclear option of deleting the EBT files @-UMK
on desktop and @QlCT
on mobile, and after a while, the files were re-created with 8592
coming back (not the expected 8601
), and after a while (~2min) the number -1
came back.
I now reverted ssb-ebt to 7.0.3 and epidemic-broadcast-trees to 8.0.4 and I still see the exact same behavior (and re-starting the peers over and over again many times). So at least we can rule out new changes to code. I also removed the ebt files, just to make sure, and same behavior.
To make sure we're not dealing with a feed fork, I ended up checking what is the message at 8592
by QlCT, and it's a private message. I looked at the JSON of that message on both desktop and mobile and they are identical except for the KVT timestamp.
I want to lend energy to this quest. Please let me know if having a person pairing on this would help (sometimes a small party is good when hunting down an elusive and dangerous beast)
Yeah it would be awesome to pair. I don't know a lot of the inner details of epidemic-broadcast-trees but it's super useful that I can reproduce it on two peers and tweak code for both quickly.
Indeed. I'll try and prepare a debugging branch of both SSB-EBT and EBT that should give us an idea of what is going on.
Can you try and run desktop and phone with these 2 branches and report what output you get?
https://github.com/ssbc/ssb-ebt/pull/62 https://github.com/ssbc/epidemic-broadcast-trees/pull/55
As for pairing, agree it would be great to pair on this issue. Especially since the bug can be reproduced. I'm good with remote debugging like this. EBT is a state beast and the code is really hard to read. Itching to tidy up the code without making big changes just to make it easier to understand.
Edit: while reading some of the code I decided to do a tiny refactor of things: https://github.com/ssbc/epidemic-broadcast-trees/pull/56
I'll try tomorrow for sure! :)
Does it have to be phone - phone or can we reproduce it locally with scuttle-testbot and it's connect function. Maybe hook some methods to watch the things trigger
A few things we discovered:
On top of what arj already mentioned, there was also a bug in ssb-validate2 where a fork was detected, but as a false positive.
I discovered what the problem there is: the validation code is recreating the key for a private message using ssbKeys.hash(JSON.stringify(msgval,null,2))
but the msgVal's content
is actually the unboxed object, while it should be the boxed string, for the calculation of the key to be correct.
Damn. Thought it was something like that. The good thing is that once these things are deployed replication should start working again.
The fix is so obvious, it's going to hurt to review. :D
The good thing is that once these things are deployed replication should start working again.
Yes, no need to fiddle with ebt file recovery.
Pretty sure this can be closed. Thanks @arj03 !
I'm getting a lot of bug reports in Manyverse that things are not replicating between two peers currently connected and mutually following. Finally I managed to reproduce it myself, and here's one thing I found:
@+UMK
is mobile Manyverse@QlCT
is desktop Manyverse@QlCT
has new content that@+UMK
should download, but does not9.0.0
8.0.1
5.1.0
1.0.0
File
~/.ssb/ebt/@QlCT
on@+UMK
looked like this:File
~/.ssb/ebt/@-UMK
on@QlCT
looked like this:As you can see, that
-1
seems problematic. It means that QlCT is forbidding +UMK to get new updates, perhaps mistaking it as some disallowed peer. Could this be some initialization bug?cc @arj03