Closed jm-clius closed 1 year ago
@D4nte I think this node is closer to what you had in mind in this comment: https://github.com/status-im/nwaku/issues/754#issuecomment-1161443548
@jm-clius This is something that interests me and I think it would be a great way of starting with both nim and libp2p, perhaps using nim-libp2p
. I've done similar work in the Ethereum consensus layer with armiarma. This project:
discv5
, estimating the total peers the network has.beacon_block
, voluntary_exit
, etc)@alrevuelta, indeed! With the operator trial having just launched, the value that a monitoring node will add has grown significantly. A basic monitoring node is indeed a good next step after the basic canary service to estimate at least two things as a start: estimate network size using discv5
and estimate the number of active applications (using contentTopic
prefix) on the default network. Liveness checks, a way of determining client type, version, etc. would be a very valuable bonus!
Related project nebula-crawler, similar to armiarma
@jm-clius any opinion on:
WakuNode
or directly libp2p
and discoveryv5
? The former option allows less flexibility so I'm leaning toward the later.I don't have strong feelings about this, other than to keep it very simple as a start. Although we should imagine the types of questions we may want to ask in future, our immediate need is just to monitor two aspects:
While I can imagine for (1) network size estimation with discv5 could built separately on discv5 (note the protocol ID change for Waku), once we start adding more discovery methods we may need an integration point similar to a WakuNode
.
For (2) I can barely think of another way to do this properly other than a (simplified) node participating in the network, subscribing to the default pubsub topic and counting the number of content topics/applications.
We should also foresee that we may want to monitor more and more application-specific network metrics as we go along, which seems to me to be easiest to trigger from a node that's an active participant in the network.
@jm-clius Regarding using nwaku or directly discv5/libp2p, we both agreed that nwaku may be more suitable. However, I'm having second thoughts regarding using WakuDiscoveryV5
for peer discovery.
The main reason is that we may have to go lower level for peer discovery, using directly discv5 primitives. I guess WakuDiscoveryV5
is not optimized for finding all peers in the network, but a random subset of them.
I guess that in a network of a few hundred it won't make much of a difference, but if we reach 1k-5k I'm not sure we will discover them all. Perhaps spamming findRandomPeers
is enough, but changing the k
param (aka BUCKET_SIZE
) might be interesting.
But perhaps I'm getting ahead of myself here.
Indeed, whereas most monitoring we want to do in future makes more sense from an application perspective, building a discv5 crawler of some kind may indeed require a more precise, separate service (i.e. the monitoring node could spin up this service too, but it may be a new service built directly on discv5).
However, I don't want the initial chunk of this work to be too complex - we just need a rough idea of how the current network, which is very small, is growing and we may get away by just using findRandomPeers()
for now (which should eventually give us an approximate answer with some confidence).
@kaiserd wdyt? There may be a simple technical solution I'm missing.
To avoid confusion regarding the term WakuDiscoveryV5
:
33/WAKU2-DISCV5 is our discv5 spec for Waku,
which uses a different protocol-id
.
You would have to adhere to this spec in order to discover Waku nodes.
However, I assume, you were refering to waku_discv5.nim, which adds some functions useful for Waku on nim-eth/discv5.
For your implementation, you would not be limited to waku_discv5.nim. You are free to use nim-eth/discv5 directly. Our version in vendor/nim-eth is based on a feature branch that already supports 33/WAKU2-DISCV5.
Imo, the cleanest way to implement this would be a new type of node. This node would only mount protocols necessary for this purpose and feature the additional metrics gathering and logging. A good starting point would be copying the current node and removing everything that is not necessary for now. As @jm-clius pointed out, we do not know what we want to measure in the future, so it would be good for this node to work on the Waku v2 layer (which allows it to look into Waku protocols as well as deeper into the stack, too).
@kaiserd Great info thanks! Some followup questions:
protocol-id
. Can't we open a PR to nim-eth and make it configurable so we avoid forking it? Or am I missing something?wakuv2.prod
and wakuv2.test
fleets, but how are they segregated? Can't find any flag ./build/wakunode2 --help
that allows for deployments in different networks + the ENRs look the same (apart from the keys). So how are both networks segregated? In ethereum consensus layer they use fork-id
and fork-digest
for this matter, which is encoded in the ENR. They use this to segregate mainnet from testnets. And actually I think that gnosis chain uses the same network with a different fork-id/digest. But to the best of my knowledge the protocol-id
is the same.I can maybe help answer some of these questions:
nim-eth
. The protocol-id
is set as a compile-time definition.We don't fork nim-eth. The protocol-id is set as a compile-time definition.
Right, let me change "forking" to "branching". As I can see we use a branch selectable-protocol-id
that has to be rebased to master (i.e. https://github.com/status-im/nwaku/pull/1276) when we want to bump nim-eth. Can't that change be part of master
?
We do not separate the DHTs between different fleets. The different fleets can be (and should eventually be) part of the same network - there's no strong reason to keep them separated, although we could come up with ways to do it if need be.
Interesting, I totally assumed they were completely different networks, like EthGoerli and EthMainnet. And since with RLN we are piggybacking on ethereum, I expected wakuv2.test
to be using i.e. goerli smart contracts and wakuv2.prod
mainnet smart contracts (when we eventually reach that point). Seems weird to me that they are part of the same network, since they should be imho different environments.
Anyway, this goes beyond this issue, but its great input because now I know which peers I should expect to discover.
I naively thought that there was some mechanism to filter out nodes that you were not interested in, so that you don't store them in your dht.
Filtering would boil down to random walk searching, if Waku nodes are rare within the whole network. There is new research regarding resilient but efficient topic discovery in discv5. The blog post Waku v2 Ambient Peer Discovery, which extends on the forum post, discusses this. (Currently, I focus on the anonymity track, but will go back to that research eventually. For now, we decided to wait on new results on discv5 research and stick a separate Waku network for now.)
Can't that change be part of master
At the moment, this is still somewhat "experimental", and it is not part of the Ethereum discv5 spec which nim-eth is following.
I totally assumed they were completely different networks,
While the Ethereum forkid
can be transmitted in the discovery network,
it does not actually fork the discovery layer. It forks the Ethereum layer.
The protocol-id
forks the discovery layer.
Regarding Waku, discovery and Relay are two separate overlay networks, too. Waku protocols (like libp2p protocols) feature a protocol ID. As long as nodes of different fleets have matching (fuzzy matching is possible) protocol IDs, they are interoperable.
The discovery layer is oblivious to Waku capabilities (for now), and there is only a single Waku discovery network. So, nodes of different fleets naturally discover each other as long as there is a common node in the discovery network. For instance, if you run your Waku node, and you add both a test-fleet and a prod-fleet node as bootstrap nodes, you would "connect" these fleets.
@jm-clius I made some progress on the monitoring node #1290
1) "walk" the discv5 DHT in the network to get an idea of the number of nodes participating in the network 2) try to get an idea of the number of unique peer IDs seen in the network (using various discovery methods) 3) keep track of the number of messages per: protocol content topic application (first part of content topic)
So far 1) and 2) are covered with an extra extension. Beyond discovering peers the monitoring node also tries to connect to them, identifying their user-agent (i.e. nwaku, go-waku) and supported protocols (i.e. /vac/waku/relay/2.0.0). See more in the PR. Some of the metrics:
More on that in the PR, but wanted to discuss 3). Some thoughts:
/waku/2/default-waku/proto
is common, should we monitor that just one?@alrevuelta, thanks for the great progress on the monitoring node! I will review (1) and (2) also in the WIP PR.
On (3):
pubsubTopic
s that it's subscribed to. In general nodes subscribe only to the default pubsubTopic
, namely /waku/2/default-waku/proto
and relays all messages on that topic. This will be the focus of the monitoring node for now (though I see no reason to limit it to only one pubsubTopic
, in case we introduce sharding in future).as we scale perhaps will become impossible, because it will require a single node to be aware of everything
pubsubTopic
sharding for all nodes in the network. Currently any given node is aware of every relay message on the network (since we share a single pubsubTopic
) and just counting content topics does not seem like a major extra requirement to me?Can you elaborate more on "keep track of the number of messages per protocol"?
Actually I'm not sure now what I meant. A monitoring node will by definition only be aware of relay
messages. Ignore this requirement for now - we are far more interested in counting the number of applications on the network. 😅
I'm trying to differentiate between the messages that are gossiped and the node can get, and the ones that are point-to-point and we can't get.
Exactly, though all point-to-point protocols by approximation require messages to be relayed (even if you interact directly with a service node using a request-response protocol it will either gossip a message on your behalf, or return gossiped messages).
Will split this into 2 PRs, one for 1)2) and one for 3)
Makes sense to me!
As a followup on this, the last remaining task to close this would be to run the networkmonitor
in our fleets and make all the data available via a grafana dashboard (with the already discussed restrictions).
@alrevuelta I'm moving this issue then to the next release milestone to reflect the outstanding task (an alternative would be to create a separate issue for that task and close this one, but that may be unnecessary admin :) )
Problem
We need a node that can be deployed to a specific Waku v2 network that can be used to gather various network metrics and monitor the overall health of the network. Note that this should not be confused with the "Waku v2 canary service" which will function more like a tool that can be used for checking the health of specific nodes on an ad-hoc basis.
Suggestion
This node could, for example,
Tracking:
networkmonitor
exposing its data in a grafana dashboard.