ssbc / ssb-ebt

secure scuttlebutt replication with epidemic-broadcast-trees
MIT License
18 stars 10 forks source link

Change p-defer to a simple waiting queue system #55

Closed arj03 closed 3 years ago

arj03 commented 3 years ago

A bit of background. In https://github.com/ssbc/ssb-ebt/pull/53 we replaced obz with p-defer (a Promise) and added the initialized check to the block function. This change actually broke the test/integration/block3.js test inside https://github.com/ssb-ngi-pointer/ssb-replication-scheduler because it was expecting that if you blocked someone that EBT would get that info right away, instead because of the promise it will get it a bit later leading to the blocked message being sent over the network. A promise then will add the cb as a microtask to a queue and only run only on next tick. For more information see this post. After reading that I'm quite skeptical about promises for anything performance critical. The problem is that the all the tasking and waiting is really bad for performance, not to mention can easily lead to subtile concurrency bugs like the one in block.

Now why this PR against an open branch instead of against master that has the bug. Because we are close to landing this partial-replication-changes branch and I didn't want to mess with the potential merge conflicts.

I also think it would be a good idea to move the block3 test from replication scheduler to this repo instead.

staltz commented 3 years ago

About moving block3 to here, lets not do that literally, just spiritually. Block3 in repl sched is about publishing contact msgs which means integration with ssb-friends. In ssb-ebt we can make a new test that is very similar but doesn't have ssb-friends, just uses request and block directly.

arj03 commented 3 years ago

It should be like this:

Writing this out I came to the same conclusion that the block3 test inside ssb-replication-scheduler still has its place.

staltz commented 3 years ago

So is the problem on Carol's side or on Bob's side (although they are running the same ssb-ebt code)?

arj03 commented 3 years ago

Carols side

staltz commented 3 years ago

What I still don't understand is: adding a setTimeout 100ms on the sbot.post in ssb-ebt should then fix the problem, right? Because then we make sure that the block() call happens before the sbot.post. But it didn't fix it...

arj03 commented 3 years ago

A good question.

My initial analysis was wrong. If post was the problem, then it is strange that both post and block are using a promises and if we change it to something else that it then works. Instead of post it was the cb in db.add. I have updated the earlier comment with the proper trace.

I've tried commenting out the redundant self.onAppend call in EBT and both ssb-ebt and ssb-replication-scheduler works. Even EBT itself works ;)