Tuning GossipSub's D parameter in Waku

alrevuelta commented 11 months ago

tldr: We assess the impact of reducing the gossip sub's D parameter from 6 (current value) to 3, analyzing i) the degradation in message propagation delay and ii) the savings in bandwidth. We observe with shadow simulations that in a network with 1000 nodes and using small messages (10kB) this change (D 6->3) worsens message propagation times from 627 ms to 849 ms for 95% of the messages. But in exchange bandwidth consumption is reduced by /2 on average.

In https://github.com/waku-org/research/issues/42 we presented the expected message propagation times for the current waku configuration, mostly affected by the gossipsub configuration (D=6 with Dlow=4 and Dhigh=12) In this issue we i) explain the rationale/tradeoffs for selecting these configuration parameters and ii) how it affects the propagation delay and bandwidth. We also asses if it would make sense to change the existing configuration.

Note: We use the same simulation setup and tools as in https://github.com/waku-org/research/issues/42, see tool and branch.

As explained in https://github.com/waku-org/research/issues/42, the tradeoff behind D (amount of peers in the mesh, aka full connections) is clear:

A higher value implies lower propagation times since a message reaches all nodes in fewer hops. But bandwidth consumption increases linearly (roughly D times the base bandwidth).
A lower value implies higher propagation times since a message has to travel more hops. But bandwidth consumption is less.

The question is, where should waku be in this trade-off? Note that every node is free to change these parameters, so what we discuss here are the reasonable defaults. Note also that waku could have multiple "profiles" as @chaitanyaprem suggested, so eg profile1 can be thought for nodes in datacenters and profile2 for average users. Each profile having different parameters such as D values.

Theory

Using ceil(log(N)/log(D)) we can see for different amount of nodes Nand D values the relationship between the amount of hops to reach all nodes (worst case) and the bandwidth consumption. Note that ihave/iwant messages are not taken into account, since they represent a small part of the bandwidth usage.

For example:

Network with 1000 nodes and D=6 the maximum amount of hops that a node will have to travel to reach all nodes, is 4. And as we can see in the right side, the bandwidth amplification factor would be 6. This means that if the amount of messages sent in the network is 3 Mbps, then the nodes would be consuming 18 Mbps.
Network with 1000 nodes and D=3, max amount of hops is 7.

(plot generated with this)

Relevant points in the tradeoff:

There is a point where increasing D, doesn't lower the amount of hops, but an increase in bandwidth. It wouldn't make sense to go beyond that point, as no benefit is obtained.
The lower de D is, the higher the impact on the amount of hops. For example, going from D=4 to D=3 has a greater impact than D=6 to D=5.
The bandwidth scales linearly with D, where amplification over the base bandwidth equals times D (plus some neglectable small % to account for ihave/iwants).

Simulations

In https://github.com/waku-org/research/issues/42 we ran simulations with the current waku configuration of D=6 (4/12). Hereunder, we present the simulation results for this scenario where:

D=3, Dlow=2, Dhigh=4

New results (yellow, green) are presented together with the results from https://github.com/waku-org/research/issues/42 (red, blue) so that they can be compared.

Conclussions

Best case propagation time (lower part of the whiskers) is not affected by the change of D. This is of course because the nodes that are just 1 hop away from the publisher get the message in 1 hop no matter D.
For small messages (10kB) changing D from 6 (current) to 3 has a visible impact in the propagation delay. Average goes from 508 ms to 610 ms. However, worst-case delay is affected more, since we went from 4 to 7 hops.
Such a change of D (6 to 3) will decrease bandwidth consumption by half. This means that either waku throughput can be increased or that we can reduce the bandwidth requirement to run a node (see https://github.com/waku-org/research/issues/31).
Note that these simulations apply to a network of 1000 nodes. As shown in the first figure, the larger the network the larger the delay for a fixed D. We can most likely extrapolate based on these results.
Besides lowering D, we might want to pick closer Dlow and Dhigh values. The current configuration of 4 to 12 makes bandwidth very unpredictable. Some nodes might use 4x and others 12x.

Based on this data, we should be ready to assess: do we want to lower D to reduce the bandwidth requirement? Or at least for some "profiles"?

alrevuelta commented 11 months ago

Weekly Update

achieved: nwaku simulations showing the impact in message propagation delay when reducing gossipsub's D value. Main goal is to reduce bandwidth consumption in exchange of worsen propagation delay.
next: asses if we want to move forward changing D.

fryorcraken commented 11 months ago

@jm-clius @alrevuelta feel free to correct epic label if needed.

chaitanyaprem commented 11 months ago

I do like the suggestion of reducing D value, but wondering what was the basis on which libp2p recommends Dlow to be 4. Would be good to find the study and simulations they have done before considering reducing to 3 (which is below 4). Also I like the idea of changing this using profiles, that way nodes which can handle more connections and bandwidth help propagate messages faster in the network.

waku-org / research