vacp2p / research

Thinking in code
MIT License
62 stars 4 forks source link

Waku GossipSub libp2p PoC #37

Closed oskarth closed 4 years ago

oskarth commented 4 years ago

This is the initial step for https://github.com/vacp2p/research/issues/22, ongoing work in https://github.com/status-im/nim-waku/ under v2

Problem

As per https://github.com/vacp2p/research/issues/22 we want to experiment with running Waku with GossipSub and libp2p. To do this we need to understand if this is feasible, and what it'd give us.

Solution

Write a PoC minimal "Waku" client that uses GossipSub in the context of nim-libp2p. The Poc should create a basic quicksim running some nodes, similar to the one for Waku v1. These nodes will form a mesh and do some gossiping. This should test the following assumptions:

As well as get better intuition for GossipSub/nim-libp2p behavior and other changes that have to be made, bridging etc.

This will later be complemented with theoretical calculations (a la Whisper scalability model) and empirical observations (e.g. cluster load).

For more details on some blockers that will be kept an eye on, see https://notes.status.im/waku-libp2p#

Acceptance criteria

A PoC simulation that runs N nodes, forms a mesh and does gossiping in a way that is informative

oskarth commented 4 years ago

At least got a minimal out-of-proc ping-pong wakusub<floodsub thingy going: image

oskarth commented 4 years ago

And here we see FloodSub with 6 nodes [connected nodes metric double counts sometimes], 5 subscribed to the same topic, and 100 messages being delivery 100*5 times. As expected, but good to verify.

image

Next step is to hook it up to GossipSub and show difference, as well as to understand how the gossiping behaves.

oskarth commented 4 years ago

GossipSub shows ~100 messages, i.e. less duplication. [Even for 6 nodes, no tweaking etc needed, happened automatically]

image

oskarth commented 4 years ago

After playing around with this a bit more, this is not a realistic example. It is a full mesh, and de facto messages seem to come from one peer.

Mesh is formed gradually from 0,1,2,2,3,4 (8 msgs sent),4,4 (108 msgs sent) peers. These events happen at heartbeat every second. If I introduce time delay and use same topic (same mesh) to send 100 more messages total messages is 208, which is not what I would expect with 4 peers in overlay peer.

Going to make a partial mesh where connectivity is more sparse. E.g. sender <-> a <-> b/c/d <> e <> receiver

oskarth commented 4 years ago

Got a basic failing case. With the following topology we get:

1) Node B sends a subscribe message to node 5 for topic foobar 2) Node A sends a message to node 0 on topic foobar 3) Even though node 00..05 are connected, they don't relay messages

image

image

This is where the gossiping part with control messages comes into play, I believe.

oskarth commented 4 years ago

Here a and b have foobar in their mesh, and 0 and 1 have foobar in gossipsub. Nothing is stored in fanout state.

One thing that isn't clear to me is if fanout should be used for 0/1 to relay further. Also see https://github.com/libp2p/go-libp2p-pubsub/blob/master/pubsub.go#L656-L661 for announcing topic on subscribe.

oskarth commented 4 years ago

It appears that topic propagation is external to GossipSub.

Writing up a RFC for how this could be amended.

oskarth commented 4 years ago

RFC https://forum.vac.dev/t/rfc-topic-propagation-extension-to-libp2p-pubsub/47/2

oskarth commented 4 years ago

Update/Note to self: after discussion earlier this week I'm trying a differnet approach. Essentially treating topic at this level as app level topic, and not a 1:1 mapping with current Waku topics. Then relying on mesh being formed/gossip to get down message volume. (This can later be supplanted with topic-interest for light nodes).

One issue I've noticed when running this is that leaf nodes are often left out of a mesh, with "no more peers". E.g. if it connects to a highly connected node, that has 14 peers, then this node might form a mesh with ~4-6 other peers. Even if GRAFT etc is sent this is enough to miss messages.

Some approaches I want to explore: 1) We could subscribe to more nodes and hope mesh works 2) We could leverage "IHAVE" msgs when reconnecting 3) We could ensure nodes are persistent (i.e. different part of peer pool)

Also investigate under what conditions IHAVE messages appear, as I haven't seen them

oskarth commented 4 years ago

I'd consider this done for now, we have a plan https://discuss.status.im/t/waku-version-2-pitch/1776