Open gzukel opened 6 months ago
These are very desirable results, however it'll be quite challenging to get definitive or reliable answers to these questions.
The key difficulty is that this is a distributed consensus protocol with arbitrary failure modes of its participants.
It may be possible to get suggestive answers similar to what we can guess from the pvtop
and the consensus page
from tendermint 26656 port.
My recommendation is to have a more streamlined and contextual presentation of the information from the consensus page 26656 to aid human interpretation.
Issue Description:
In our continuous effort to improve network management and decision-making processes, we have identified a crucial need for incorporating upgrade-specific metrics into our Prometheus monitoring system. This enhancement aims to provide a comprehensive and authoritative view of network upgrades, addressing current discrepancies observed through our a3 console and pvtop toolsets, and ensuring alignment with ongoing network changes.
Proposed Metrics Integration:
To achieve this, we propose adding the following metrics to the Prometheus metrics suite, particularly focusing on the network running on telemetry port 26660:
These metrics are essential for accurate, real-time monitoring and decision-making regarding network upgrades and maintenance. By integrating them into our Prometheus setup, we aim to resolve current tool discrepancies and enhance our network's operational transparency and efficiency.