High cardinality metric

Lusitaniae commented 3 months ago

Describe the bug In Prometheus based monitoring systems, metrics with high cardinality (big combination of unique labels) creates issue.

To Reproduce When we pull metrics we'll get something like this for each validator:

near_current_validator_stake{account_id="01node.poolv1.near", instance="", job="near", num_expected_blocks="112", num_expected_chunks="704", num_produced_blocks="112", num_produced_chunks="703", public_key="ed25519:5xz7EbcnPqabwoFezdJBxieK8S7XLsdHHuLwM4vLLhFt", shards="1", slashed="false"}

Which is highlighted in the screenshot below as having high cardinality (manageable for now)

Expected behavior num_expected_chunks, num_expected_chunks, num_produced_blocks,num_produced_chunks should be its own metric instead of a label

near_validator_expected_chunks{account_id="01node.poolv1.near"} 112 near_validator_expected_chunks{account_id="01node.poolv1.near"} 704 near_validator_produced_blocks{account_id="01node.poolv1.near"} 112 near_validator_produced_chunks{account_id="01node.poolv1.near"} 703

Screenshots

Version (please complete the following information):

nearcore
mainnet

Additional context https://docs.victoriametrics.com/faq/#what-is-high-cardinality

Lusitaniae commented 3 months ago

Alternatively this could be moved into an external exporter that gathers network wide metrics from a single place

because lots of duplicate metrics for each near node we're running (if we had 100 nodes, we'd have 100x the exact same metrics everywhere)

nagisa commented 3 months ago

Where are you getting the cardinality screenshot from? It might be useful to keep a reference handy for this.

Though I imagine we could also implement a cardinality check in neard itself, e.g. at the time when those metrics are gathered together in order to respond to a GET /metrics .

Lusitaniae commented 3 months ago

The dashboard is from vmui https://docs.victoriametrics.com/#vmui (this is a fork of Prometheus, Victoria Metrics)

There's also projects like https://github.com/thought-machine/prometheus-cardinality-exporter to monitor on this too

Lusitaniae commented 3 months ago

I think in the end a near_exporter that providers network wide metrics is probably best

near / nearcore

High cardinality metric #11988