Closed aschmahmann closed 3 years ago
My off-the-cuff thought here is that it's difficult to create peer-to-peer gossip algorithms that operate optimally for all possible topologies.
In this particular case, the hub is deliberately acting like a broker, and therefore you may want to increase D
for it (it should be configurable). Ideally you'd maintain D
in lockstep with the peer count the hub observes, but I don't know if it's adjustable during runtime.
Essentially what you're getting by using gossipsub in this design is the topic-based messaging functionality. You're not interested in the routing, because that's always static.
it's difficult to create peer-to-peer gossip algorithms that operate optimally for all possible topologies.
Sure, although if you implemented a backoff so peers stopped GRAFTing peers that just PRUNEd them that would at least give some convergence.
I don't know if it's adjustable during runtime
In Go it's a global variable, so it's situational (e.g. I can't run two gossipsub topics, or even instances, with different D
)
You're not interested in the routing, because that's always static.
I suspect many users will not know their exact network topology ahead of time and will rely on various libp2p Discovery mechanisms to find peers to connect to. The chaotic nature of just discovering peers instead of carefully GRAFTing them into an optimal topology can certainly result in this GRAFT-PRUNE loop emerging (especially when you take into account NAT'd nodes connecting to the rest of the network).
This is resolved by gossipsub v1.1, which is the first iteration moving to episub. See https://github.com/libp2p/go-libp2p-pubsub/pull/234
@vyzo this is fixed now, right?
Yes, if you enable peer exchange (WithPeerExchange(true)
), then the mesh will converge.
Considering this fixed with https://github.com/libp2p/go-libp2p-pubsub/pull/234 and gossipsub v1.1. Closing. Please comment in case I am missing something.
The rules for gossipsub's GRAFT and PRUNE messages are:
D_low
, GRAFT peers until we're atD
D_high
PRUNE peers until we're atD
If the mesh is a star with a single node connected to
D_high + 1
nodes then the center node will be fighting to PRUNE nodes while the edge nodes GRAFT. This will go on indefinitely.While it's not the end of the world to send these extra messages 1/second (current heartbeat), it's definitely a waste of bandwidth for basically no reason. The "correct" answer is probably to have some tree forming algorithm as part of gossipsub. However, for now it'd be pretty straightforward to implement a backoff system where if A sends GRAFT to B and B sends PRUNE to A that A waits a while before trying to send GRAFT again.
Thoughts @raulk @Stebalien @vyzo ?