Open alrevuelta opened 11 months ago
Weekly Update
@jm-clius @alrevuelta feel free to correct epic label if needed.
I do like the suggestion of reducing D value, but wondering what was the basis on which libp2p recommends Dlow to be 4. Would be good to find the study and simulations they have done before considering reducing to 3 (which is below 4). Also I like the idea of changing this using profiles, that way nodes which can handle more connections and bandwidth help propagate messages faster in the network.
tldr: We assess the impact of reducing the gossip sub's D parameter from 6 (current value) to 3, analyzing i) the degradation in message propagation delay and ii) the savings in bandwidth. We observe with shadow simulations that in a network with 1000 nodes and using small messages (
10kB
) this change (D 6->3) worsens message propagation times from627 ms
to849 ms
for 95% of the messages. But in exchange bandwidth consumption is reduced by /2 on average.In https://github.com/waku-org/research/issues/42 we presented the expected message propagation times for the current waku configuration, mostly affected by the gossipsub configuration (
D=6
withDlow=4
andDhigh=12
) In this issue we i) explain the rationale/tradeoffs for selecting these configuration parameters and ii) how it affects the propagation delay and bandwidth. We also asses if it would make sense to change the existing configuration.Note: We use the same simulation setup and tools as in https://github.com/waku-org/research/issues/42, see tool and branch.
As explained in https://github.com/waku-org/research/issues/42, the tradeoff behind
D
(amount of peers in the mesh, aka full connections) is clear:The question is, where should waku be in this trade-off? Note that every node is free to change these parameters, so what we discuss here are the reasonable defaults. Note also that waku could have multiple "profiles" as @chaitanyaprem suggested, so eg
profile1
can be thought for nodes in datacenters andprofile2
for average users. Each profile having different parameters such asD
values.Theory
Using
ceil(log(N)/log(D))
we can see for different amount of nodesN
andD
values the relationship between the amount of hops to reach all nodes (worst case) and the bandwidth consumption. Note that ihave/iwant messages are not taken into account, since they represent a small part of the bandwidth usage.For example:
D=6
the maximum amount of hops that a node will have to travel to reach all nodes, is 4. And as we can see in the right side, the bandwidth amplification factor would be 6. This means that if the amount of messages sent in the network is 3 Mbps, then the nodes would be consuming 18 Mbps.D=3
, max amount of hops is 7.(plot generated with this)
Relevant points in the tradeoff:
D
is, the higher the impact on the amount of hops. For example, going fromD=4
toD=3
has a greater impact thanD=6
toD=5
.D
, where amplification over the base bandwidth equals timesD
(plus some neglectable small % to account for ihave/iwants).Simulations
In https://github.com/waku-org/research/issues/42 we ran simulations with the current waku configuration of
D=6
(4/12). Hereunder, we present the simulation results for this scenario where:D=3
,Dlow=2
,Dhigh=4
New results (yellow, green) are presented together with the results from https://github.com/waku-org/research/issues/42 (red, blue) so that they can be compared.
Conclussions
D
. This is of course because the nodes that are just 1 hop away from the publisher get the message in 1 hop no matterD
.10kB
) changingD
from 6 (current) to 3 has a visible impact in the propagation delay. Average goes from508 ms
to610 ms
. However, worst-case delay is affected more, since we went from 4 to 7 hops.D
(6 to 3) will decrease bandwidth consumption by half. This means that either waku throughput can be increased or that we can reduce the bandwidth requirement to run a node (see https://github.com/waku-org/research/issues/31).D
, we might want to pick closerDlow
andDhigh
values. The current configuration of 4 to 12 makes bandwidth very unpredictable. Some nodes might use 4x and others 12x.Based on this data, we should be ready to assess: do we want to lower D to reduce the bandwidth requirement? Or at least for some "profiles"?