libp2p / go-libp2p-pubsub

The PubSub implementation for go-libp2p
https://github.com/libp2p/specs/tree/master/pubsub
Other
328 stars 186 forks source link

Duplicate Messages and High Outgoing Traffic in GossipSub #551

Closed b00f closed 11 months ago

b00f commented 1 year ago

Regarding the "High network usage issue," we have encountered problems with GossipSub where identical messages with the same message ID are not ignored by the network, resulting in message duplication. Additionally, Node Gossip sends messages to all nodes, leading to unnecessary outgoing traffic. Do you have any guidance on optimizing message distribution and addressing these issues to enhance network efficiency?

vyzo commented 1 year ago

Add a message validator with application level semantics for recognizing old messagrs pethaps?

Also, use a hash for message id, and can also try to increase the seen cache inerval.

Hope that helps.

b00f commented 1 year ago

"@vyzo, thanks for your prompt response and guidance. Regarding your comment:

Could we consider implementing a message validator at the application level to identify old messages, perhaps?

We have already implemented an approach called "Node Gossip." Node Gossips are a type of node that specifically broadcasts messages to non-Gossip nodes within the network. In contrast, Non-Gossip nodes, utilizing validators, consume the received messages but ignore them. This significantly reduces bandwidth usage. Currently, we have 12 Nodes Gossip and over 500 Nodes Non-gossip in our network. You can find more details here: Network Link.

Additionally, could we use a hash for message IDs and try increasing the seen cache interval?

We are already using a hash for message IDs, which consists of the first 20 bytes of the hashed data.

The question we have is: There are messages, such as block announcements, that can be broadcasted by multiple nodes almost simultaneously. These data packets contain identical information and, consequently, share the same message ID (we have a test case to confirm this). Based on our log analysis, it appears that nodes receive several identical messages almost simultaneously. For example, if both Node_A and Node_B send msg_x with the same ID, Node_C receives it twice. Is there a way to prevent this?

Another concern is that the outgoing bandwidth usage for Node Gossip seems to be slightly higher than our expectations. Are there any methods we can explore to reduce it? For instance, would setting WithFloodPublish(false) help?

vyzo commented 1 year ago

Maybe add some randomized delay for the simultaneous broadcast issue? This could help with the simultanrous issue, albeit at the expense of latency.

vyzo commented 1 year ago

Flood Publishing only affects the actual publication (first hop) so turning it off will probably not have any effect in the gossip nodes.

b00f commented 11 months ago

Update for the identical message: We tried to simulate the situation by a learning test. Good news that LibP2P ignores the identical messages when receives from different peer.