waku-org / docs.waku.org

Waku Documentation Portal
https://docs.waku.org
2 stars 6 forks source link

Document how to monitor and interpret Waku node health #165

Open jm-clius opened 6 months ago

jm-clius commented 6 months ago

We need to document how operators or encapsulating applications can monitor the health of their Waku nodes and how to interpret monitoring results for various use cases.

This is primarily a documentation exercise, but may require some engineering to ensure the appropriate metrics and information is retrievable from the node.

The necessary health information needs to be available both via the REST API and in any of the software bindings (Nim, C, etc.)

As a starting point, the following information from a node should be monitorable and we should provide clear documented guidelines on how to interpret this:

  1. Is the Waku node running?
  2. What protocols are currently mounted on the node?
  3. If relay is mounted, how many peers is connected for each pubsub topic.
  4. If relay is moutned, what is the current relay bandwidth for each pubsub topic.
romanzac commented 6 months ago

@jm-clius Thanks a lot for opening this. It would be also great, if we already have interested operator/integrator to comment on what info they would like to have. How they would like to interact with Waku node. Vac QA team could then test what we agree on and save time and trouble for integrator.

chaitanyaprem commented 6 months ago

Tagging a related issue for applications monitoring node health https://github.com/waku-org/go-waku/issues/1021 which was done for Status. Also can refer to https://github.com/status-im/status-go/issues/4628 which gives a brief on how apps can interpret node health. Note that above is for static sharding scenarios and in similar lines can be abstracted for users of autosharded network as well.

romanzac commented 6 months ago

Adding issue proposing solution in some scenarios: https://github.com/waku-org/go-waku/issues/921

vpavlin commented 5 months ago
  • Is the Waku node running?

This is a bit vague - what does "node running" mean?

I think the following items make sense though - which protocols are mounted and details about each - number of peers and some info on bandwith

Do we need to prioritize https://github.com/waku-org/nwaku/issues/2173 to get the detailed info to REST API?