Closed AskAlexSharov closed 1 year ago
Ouch, seems our newly incorporated cache has a bug.
Our nodes are crashing because of this, and we need this fixed ASAP. If no one is working on this, we can take it,
sure, go ahead.
Correct behaviour is to ignore the message.
The Has
on cache is checked here with correct synchronization. However, false from Has
can be returned even if the message already in the cache, because the message is outdated(time since first seen > TTL)
We can either:
I am fine with both, but my intuition inclines toward the second option. WDYT, @vyzo
just ignore the message i would say, panic is very bad.
dont do the sweep in Has, will need xlock.
Also, we need to implement background sweeping at some point, for both cache impls, as this eager sweep business has implications for locking.
There are many places where processLoop is locked unnecessarily :) Network.Connectedness can look it for a while
dont do the sweep in Has, will need xlock.
I know, but it seemed fine. Though background sweeping is a better approach so, let's just remove panic for now, as you've said
yeah, although we try to make it fast.
But point taken, ok, lets fix Has.
But please do remove the panic, it needs to ignore the message (and maybe log something in debug).
We are racing ;)
To summarize: Lets fix Has while at it. If you feel so inclined, feel free to do bg sweeping.
Other than that, the panic must go.
More generally, if we are using bg sweeping, we can greatly simplify everything.
We dont need queues, just a map of mid to expiry. Has shouldnt even check time, just map presence. And bg sweeping just clears everything expired.
We can and should do for both implementations.
Ok, bg sweeping can be done in a separate PR. Let's remove the panic as a start: https://github.com/libp2p/go-libp2p-pubsub/pull/522.
@vyzo, mind releasing a patch?
We certainly can, but i was thinking of doing bg sweeping first (unless you want to do it).
Fair, no need to keep that tech debt around.