Open EmiM opened 2 years ago
Sorry I don't have a good idea of what is going on here. It's mostly good to hear this behavior is rare, as it's less of a blocker that way. The replicator needs to be rewritten in my opinion which I'm in the process of doing as part of orbitdb.
Well it's rare but it's not that rare. There are just a handful of us testing the orbitdb-based application and we've seen it several times already in our casual use of the app.
And "everyone converges to the same state" is the core feature of a CRDT, so if OrbitDB doesn't do this reliably and deterministically that's a big deal.
Are there any options beyond "wait for rewrite"? For example:
Also, when should we expect the work you're doing to be ready?
You could try using 0.27 for the time being and see if the behavior is still there. It's possible this isn't related to changes in #933
orbit-db: 0.28.3 orbit-db-store: 4.3.3
Hi, we are using OrbitDb for developing a p2p chat application. Some time ago we noticed that once in a while users are lacking older messages ("message" - entry in a EventStore). It happens rarely but it already happened at least few times and we were sure that it wasn't a connection issue simply because new messages were arriving and could be sent with no problem.
Right now we are relying on
replicate.progress
event to send newly received messages to frontend. After some intensive testing I managed to get to the broken state (missing one message) and gather some logs.(Notice missing "I received a message but Windows did not start replicating missing messages. Will it trigger now?" on the left side).
What happened was that
replicate.progress
event didn't fire for this particular message because none of the conditions inonReplicationProgress
(https://github.com/orbitdb/orbit-db-store/blob/main/src/Store.js#L98) were met.These are the logs from the application with a broken state. They are a bit messy because I was logging the db snapshot on every 'replicate.progress' to see how oplog is changing. app1MissingMessage.log
This is the final snapshot that proves that the "missing" entry is in the local db, information about receiving it just wasn't propagated: app1MissingMessagesFinalSnapshot.log
Looking at the logs of the last 3 messages I noticed that the Replicator received "Yes it did trigger" message before "I received a message but Windows did not start replicating missing messages. Will it trigger now?". I am not sure if this matters but after "Yes it did trigger" the
replicationStatus
wasn't recalculated properly thusreplicate.progress
didn't happen:Unfortunatelly I don't have a working test yet because the path for reproducing the problem is a bit random. Opening and closing apps (aka peers) in some order seems to do the trick. I'll provide a test as soon as I create one.
Do you have any idea what could've happened here?