Open shezaan-hashgraph opened 1 month ago
Network can be added to all metrics multiple ways. You can add it yourself to all metrics by adding it to the extraLabels in the Prometheus remote write configuration. We can add it manually to each metric, but this wouldn't add it to non hedera_
metrics. We can set management.metrics.tags.network=${hedera.mirror.monitor.network}
via config, but it's a bit redundant for us since we already have namespace which is named after the network. You can try setting that property yourselves.
For the others, it's doable but would take some rework of the internals. Currently we're not passing the node info to the metrics layer. We're just extracting the list of node account IDs from the SDK Transaction
in the metrics layer. It's something we can explore.
namespace
label to network
via the promQL query but that doesn't seem to be a transparent way of doing this, particularly when there is an incident and one of our engineers isn't able to ascertain where the network label came from without making sense of the query itself. Just my 2 cents here but runtime re-labeling is possible using the label_replace
function of PromQL.max by(node, network) (
label_replace(
hedera_mirror_monitor_publish_handle_seconds_max{application="hedera-mirror-monitor", status="SUCCESS"},
"network",
"$1",
"namespace",
"(.*)"
)
)
However, for complex enough queries, these re-label expressions can become cumbersome and complex. I would also imagine that when Grafana SaaS eventually becomes prohibitively expensive over time and we decide to run our own Prometheus and Grafana servers then such types of label transformations could add up to significant overhead/load.
W.r.t adding the label management.metrics.tags.network=${hedera.mirror.monitor.network}
via config, which config would that be? I'm guessing it's not the values.yaml
file since I don't see such a helm config exposed.
Without the node_ip
we are limited to writing queries that restrict us only to the node_id
(which uses the account_id
) and network
groups i.e. we can only group by node_id
and network
. So at best we can determine if there is an issue with the proxy or node but would be unable to further determine which one of the two is problematic without some debugging. Having the node_ip
would help us figure out if the node is the problem by enabling us to drill down to the node itself.
Lastly we also need the proxy_ip
in front of each node. Production for example has 2 proxies in front of each node. If one of them has a problem we would need to know which one.
Problem
The DevOps team would like to request the following labels and their corresponding values to all hedera-mirror-monitor metrics as we consider them standardized metrics that we can use in Prometheus for grouping purposes.
network
- Name of the network e.g. testnet, mainnet, previewnet, othernode_id
- Node IDs are not the same as account IDs e.g.node_id: 0
corresponds toaccount_id: 0.0.3
oraccount_id: 3
proxy_ip
: This is the IP address of the proxySolution
Export metrics for hedera-mirror-monitor
network
- Name of the network e.g. testnet, mainnet, previewnet, othernode_id
- Node IDs are not the same as account IDs e.g.node_id: 0
corresponds toaccount_id: 0.0.3
oraccount_id: 3
proxy_ip
: This is the IP address of the proxyAlternatives
No response