zeta-chain / node

ZetaChain’s blockchain node and an observer validator client
https://zetachain.com
MIT License
157 stars 98 forks source link

Enhancing Network Monitoring: Including Upgrade-Specific Metrics in Prometheus #1495

Open gzukel opened 6 months ago

gzukel commented 6 months ago

Issue Description:

In our continuous effort to improve network management and decision-making processes, we have identified a crucial need for incorporating upgrade-specific metrics into our Prometheus monitoring system. This enhancement aims to provide a comprehensive and authoritative view of network upgrades, addressing current discrepancies observed through our a3 console and pvtop toolsets, and ensuring alignment with ongoing network changes.

Proposed Metrics Integration:

To achieve this, we propose adding the following metrics to the Prometheus metrics suite, particularly focusing on the network running on telemetry port 26660:

These metrics are essential for accurate, real-time monitoring and decision-making regarding network upgrades and maintenance. By integrating them into our Prometheus setup, we aim to resolve current tool discrepancies and enhance our network's operational transparency and efficiency.

brewmaster012 commented 6 months ago

These are very desirable results, however it'll be quite challenging to get definitive or reliable answers to these questions. The key difficulty is that this is a distributed consensus protocol with arbitrary failure modes of its participants. It may be possible to get suggestive answers similar to what we can guess from the pvtop and the consensus page from tendermint 26656 port. My recommendation is to have a more streamlined and contextual presentation of the information from the consensus page 26656 to aid human interpretation.