libp2p / go-libp2p-pubsub

The PubSub implementation for go-libp2p
https://github.com/libp2p/specs/tree/master/pubsub
Other
323 stars 187 forks source link

seenMessages TTL should be based on last observed time, not first #502

Closed stbrody closed 1 year ago

stbrody commented 1 year ago

The seenMessage cache prevents us from processing and re-broadcasting a pubsub message that we have already seen within the seenMsgTTL (default 2 minutes). Accessing an existing message, however, does not update the timestamp associated with the entry in the cache.

This means that if a single pubsub message is being received frequently and consistently, we'll re-broadcast it every seenMsgTTL, rather than never re-broadcasting it as is the desired behavior. If every time we received a duplicate pubsub message we pushed back the expiration time in the cache, that would give us the desired behavior of never re-broadcasting a message that we have already seen within the last seenMsgTTL seconds.

stbrody commented 1 year ago

Please note that we have experienced a severe issue in the past with seeing a flood of pubsub messages suddenly come in all at once. On closer observation we notice that many of the messages are duplicate messages that we have seen before. We reported it a while ago against the JS implementation here, but we have seen it more recently on go-ipfs v12 as well. In the most recent occurrence, we never saw the same message duplicated within a 2 minute window, but would often see the same messages coming into our node every 2-3 minutes.

So I think it's possible that this relatively small change could have a significant benefit for us to avoid these quite disastrous pubsub floods that we have been experiencing sporadically.

stbrody commented 1 year ago

We also discussed the more recent occurrence of the pubsub flood issue in this slack thread with some PL folks, for reference: https://filecoinproject.slack.com/archives/C025ZN5LNV8/p1661459082059149

vyzo commented 1 year ago

This makes sense, care for a patch?

vyzo commented 1 year ago

Probably as an option, it can be turned on by default.

stbrody commented 1 year ago

@vyzo Looks like I'd have to either modify or fork the timecache library as it doesn't provide any way to update the expiration time for an existing entry (it even errors if you try to add the same key twice). Do you know who the maintainer is? Any idea if I'd be better off making a PR to the existing library vs forking it or making a new TTL cache implementation?

vyzo commented 1 year ago

It's not really maintained, we should probably move it in tree.

smrz2001 commented 1 year ago

Hey @vyzo, I just put up a PR for this issue. Please take a look when you get a chance 🙏🏼