Pubsub should consolidate its peering

MichaelMure commented 3 months ago

I've been playing with GossipSub recently, and I noticed that pubsub doesn't consolidate its peering by itself. To be clear, I'm not talking about the general peer discovery (as in, "find peer for that topic as I have none"), I'm talking about pubsub receiving messages from peers in that topic, but not trying to connect directly even though the gossip topology is below the requirements.

As I understand, the expected solution is to pair GossipSub with a DHT and the WithDiscovery() option, so that pubsub can ask for more peer when below the topology requirements, but that's a quite heavy solution in my opinion, especially when pubsub already know those peers, just not their multiaddr.

If no DHT is setup, the peering is very brittle. If there is a single bootstrap node, every peer's communication will go through that bootstrap in a star topology, with an obvious single point of failure.

What I've ended up doing is adding two extra messages in my tiny app protocol, and package that into a WithDiscovery():

DISC_QUERY: broadcast an ask for participants to reply with their addrInfo
DISC_RESP: reply with an AddrInfo

It works really well, the topology grows and is resilient. When some peers are in the same LAN, they even find each other directly without mDNS. .... but that feels gross and really sub-optimal. Flooding the topic with answers is bad, having an external component is bad, having to integrate that into the app protocol is bad.

Would it be possible to have an opt-in solution so that pubsub itself query for other peers, as part of the pubsub protocol? That would be so much cleaner and efficient.

vyzo commented 3 months ago

We already have a PX mechanism as part of mesh pruning; a common pattern is to setup (some) bootstrap nodes with PX enabled and the mesh degree set to 0. That way you can leverage the bootstrap nodes for peer discovery.

Regardless, I am open to adding a general PX control message, but we would have to work through this in libp2p/specs.

MichaelMure commented 3 months ago

a common pattern is to setup (some) bootstrap nodes with PX enabled and the mesh degree set to 0

Could you detail that a bit more? I've tried that option and found that it had zero useful effect. I also couldn't find documentation on how I was supposed to use it.

What I saw was (even though my program was roped for graceful termination of the libp2p components) when peered with that single bootstrap node and that node getting shutdown, no failover to another peer happened and the pubsub peers would end up disconnected from each other, even though they were receiving messages from each other a moment before.

vyzo commented 3 months ago

You need to configure a score above the PX threshold for your bootstrapper, otherwise the peers will ignore PX.

You can do this with an application level score.

vyzo commented 3 months ago

and don't forget to enable PX emission in the bootstrap node!

libp2p / go-libp2p-pubsub

Pubsub should consolidate its peering #576