Add vector_metrics_cardinality_total metric

vectordotdev / vector

A high-performance observability data pipeline.

https://vector.dev

Mozilla Public License 2.0

17.64k stars 1.55k forks source link

Add vector_metrics_cardinality_total metric #6075

Open jszwedko opened 3 years ago

jszwedko commented 3 years ago

Proposal: add an internal metric measuring the cardinality of metrics for sources (and maybe transforms?).

We currently have vector_internal_metrics_cardinality_total which reflects the cardinality of internal metrics, but there isn't an easy way to discover the cardinality of metrics coming out of a given source or transform.

I propose we add an additional internal metric: vector_metrics_cardinality_total that is tagged with the component type and name and reflects the cardinality of metrics seen from a given component.

Example:

vector_metrics_cardinality_total{component_kind="source",component_name="prometheus",component_type="prometheus_scrape"} 107

Open questions:

Do we expire metrics from this count after a period? Maybe configurable?

MOZGIII commented 3 years ago

It's a good idea. There is a certain penalty to the extraction of the fields and building the metrics::Key, so we should benchmark heavily to avoid issues. The potential problem here is that metrics updates always lay at the very hot paths, so even a single extra allocation can be costly.

Just to clarify - the proposal is to keep the cardinality of the internal metrics emitted by each individual component? We might want to include the internal_ in the name somewhere there if so.

bruceg commented 1 year ago

The internal metrics controller, when capturing metrics, already adds a synthetic metric containing the number of metrics observed during this run. A suggested implementation, then, would be to build up a table of component type/name identifiers during metric capture, which would then be incremented for each such metric seen during the current scan and added to the result at the end of the scan.

johnhtodd commented 8 months ago

Our use case (DNS statistics) makes this very important, as we have the potential for cardinality explosion. While we can regulate that with maximum cardinality limits, it is still very useful to understand what our behaviors are under normal (non-maximum) cardinality conditions so we can understand what is happening on the distributed Vector instances which may have dramatically different cardinality sets and has a significant impact on our end-of-funnel collector.