nwaku simulation requirements

alrevuelta commented 1 year ago

As discussed in a meeting, we agreed on using the current features that we have in wakurtosis to run some simulations and try to i) learn more about nwaku behaviour with a significant amount of nodes and ii) showcase all the developed features and start using them in practice.

Here I list of nice to have set of requirements, more or less with what we discussed previosly having first test this in mind and adding some more specific details. Thinking about wakurtosis as a blackbox, I think we can divide these "requirements" into:

Inputs: amount of nodes, configuration, connectivity, traffic, etc.
Outputs: set of metrics during the simulation, coming from different sources and types (time series, vs distribution)

Simulation 1

Inputs:

Only nwaku nodes
Only relay protocol
Only one pubsub topic
Amount of nodes 300
Using discv5 with peers "randomly" forming a mesh. Meaning no hardcoded connections.
Simulation time 6 hours.
Traffic injected via existing RPC method is fine. waku-publisher as an alternative, but not required.
Traffic (both at the same time)
- a) 50 messages per second of 5kBytes each. Fixed or gausian distributed is fine.
- ~~b) 5 messages per second of 200 kBytes eaach. Fixed or gausian distributed is fine.~~
In order to see the gossiping in the network, each node must be connected to a maximum of 25 nodes.
Release v0.16.0

Output: The existing pdf shared in the past with the results would be a perfect way to share the results. Would suggest adding more information such as release version, timestamp, and some time series information coming from prometheus (waku and nim-libp2p). So I would suggest keeping the existing report data in the pdfs we shared:

Propagation time (distribution): nice to have, not a requirement
Peak CPU usage (distribution): nice to have, not a requirement
Peak mem usage (distribution): nice to have, not a requirement
Total network IO (distribution): nice to have, not a requirement

And would suggest plotting also a time series representation for the above metrics for a randomly amount of selected nodes (let's say 5). If we have 300 nodes displaying them all would be too much, but having some time series ones from a bunch of nodes, can help validating the simulation.

And add on top:

Message loss (distribution). (eg if the network has 300 nodes and a message only arrives to 298 nodes, track that). Unsure it this feature is ready.

And the following prometheus metrics. Same as the other, some time series with a bunch of random ones, ~~and calculate the probability densitiy fuction of the rest (or similar statistical "summarized" representation.)~~

libp2p_gossipsub_peers_per_topic_mesh: important to check that stays between D_low and D_high, which are the healthy amount of peers for a topic.
libp2p_gossipsub_received_total: used to validate the message rate. (increase(libp2p_gossipsub_received_total[1m]))/60 will display the amount of messages per second.
libp2p_peers: amount of connected peers.
(added): bandwdith over time (from cadvisor is ok)

Can you confirm if:

is it possible to calculate message loss rate.
is it possible to calculate the probability density function on prometheus metrics. If not, time series of a bunch of random nodes is fine for this simulation.

Daimakaimura commented 1 year ago

Thank you for putting this together @alrevuelta I do have some questions / comments:

Regarding traffic injection you mention "both at the same time" At the moment we only support a single source of traffic.
Regarding the NWaku Prometheus metrics, would you like us to calculate the distributions and add them to the final PDF figure?
Yes, it is possible to calculate the message loss rate (it is already calculated)
Yes, we can calculate the PDFs those 3 Prometheus metrics and add this to the final PDF. However this is not implemented at the moment and I am wondering if you would like to wait for this to be added or you rather get some results without those metrics ASAP.

alrevuelta commented 1 year ago

Regarding traffic injection you mention "both at the same time" At the moment we only support a single source of traffic.

No problem, will edit the requirements with just one source of traffic.

Regarding the NWaku Prometheus metrics, would you like us to calculate the distributions and add them to the final PDF figure

Since I assume this is not imlpemented, im fine with having the raw prometheus time series (without the distribution) by now.

Yes, it is possible to calculate the message loss rate (it is already calculated)

Great feature!

Yes, we can calculate the PDFs those 3 Prometheus metrics and add this to the final PDF. However this is not implemented at the moment and I am wondering if you would like to wait for this to be added or you rather get some results without those metrics ASAP.

No problem, lets stick to just the time series metrics to have something asap. pdfs are nice, but in a short simulation (eg few hours) perhaps with the time series if enough. So lets leave that by now :)

vacp2p / wakurtosis

nwaku simulation requirements #108

Simulation 1