ssbc / go-ssb

Go implementation of ssb (work in progress!)
https://scuttlebutt.nz
160 stars 26 forks source link

Fix multiple index concurrency issues #308

Closed KyleMaas closed 1 year ago

KyleMaas commented 1 year ago

Fixes #293, #292, #289 (ran over 1000 times with no FAILs) and should also fix #250.

So, there's a lot to unpack here:

  1. Publishing messages now waits for indexes to finish. I'd much rather we had some way to query Margaret outside of the indexes and use that so we didn't hold up the caller, but I have not found any way to do that. This at least fixes a lot of the problems and allows us to move forward with stabilizing things at the cost of convenience for the caller.
  2. Indexing now holds the WaitGroup for a short time after each message. This is a theoretically unnecessary delay, but the end effect of this should be to delay publishing of new messages (not appending, because those rely on previous sequence numbers) by about 100 milliseconds. However, this fixes the problem of the index wait prematurely terminating and allowing other processes to proceed before the indexes are actually caught up. This if theoretically unnecessary because if we had a way to query the Luigi pumps to see if there was anything left in the source queue we could use that more directly to determine whether we need to continue waiting. Unfortunately, I did not find a way to do that or even a way to patch it into Luigi without major restructuring of Luigi. So this is the best solution I could come up with which should have the same effect and can be patched out later because it's still fully encapsulated within sbots indexes system. This, I'm pretty sure, is why we got that one stray TestNames failure in #250. It actually makes a lot of sense now that I've had a chance to chew on it for a while.
  3. TestSignMessages doesn't use sbot. It has a completely separate system for indexes. And that other system doesn't have a mechanism to wait for indexes to catch up. So, to make it so the test is at least being used and is usefully testing the functionality it's trying to test, I've added some short delays to allow indexes to catch up. I ran this test over 1000 times without any FAILs, so it should be fixed now.
KyleMaas commented 1 year ago

Thanks!