near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.31k stars 615 forks source link

High cardinality metric #11988

Open Lusitaniae opened 3 weeks ago

Lusitaniae commented 3 weeks ago

Describe the bug In Prometheus based monitoring systems, metrics with high cardinality (big combination of unique labels) creates issue.

To Reproduce When we pull metrics we'll get something like this for each validator:

near_current_validator_stake{account_id="01node.poolv1.near", instance="", job="near", num_expected_blocks="112", num_expected_chunks="704", num_produced_blocks="112", num_produced_chunks="703", public_key="ed25519:5xz7EbcnPqabwoFezdJBxieK8S7XLsdHHuLwM4vLLhFt", shards="1", slashed="false"}

Which is highlighted in the screenshot below as having high cardinality (manageable for now)

Expected behavior num_expected_chunks, num_expected_chunks, num_produced_blocks,num_produced_chunks should be its own metric instead of a label

near_validator_expected_chunks{account_id="01node.poolv1.near"} 112 near_validator_expected_chunks{account_id="01node.poolv1.near"} 704 near_validator_produced_blocks{account_id="01node.poolv1.near"} 112 near_validator_produced_chunks{account_id="01node.poolv1.near"} 703

Screenshots image

Version (please complete the following information):

Additional context https://docs.victoriametrics.com/faq/#what-is-high-cardinality

Lusitaniae commented 3 weeks ago

Alternatively this could be moved into an external exporter that gathers network wide metrics from a single place

because lots of duplicate metrics for each near node we're running (if we had 100 nodes, we'd have 100x the exact same metrics everywhere)

nagisa commented 3 weeks ago

Where are you getting the cardinality screenshot from? It might be useful to keep a reference handy for this.

Though I imagine we could also implement a cardinality check in neard itself, e.g. at the time when those metrics are gathered together in order to respond to a GET /metrics .

Lusitaniae commented 3 weeks ago

The dashboard is from vmui https://docs.victoriametrics.com/#vmui (this is a fork of Prometheus, Victoria Metrics)

There's also projects like https://github.com/thought-machine/prometheus-cardinality-exporter to monitor on this too

Lusitaniae commented 3 weeks ago

I think in the end a near_exporter that providers network wide metrics is probably best