open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.71k stars 2.14k forks source link

[k8sclusterreceiver] k8s.node.condition metric not aggregatable in current form #33760

Open sirianni opened 3 days ago

sirianni commented 3 days ago

Component(s)

receiver/k8scluster

What happened?

Description

The use of -1 for ConditionUnknown greatly hinders usability of the k8s.node.condition metric.

For example, it's not possible to get a simple count of ready nodes in a k8s cluster (since the -1 subtracts from the sum). This would be useful to write an alert comparing k8s.daemonset.ready_nodes to sum(k8s.node.condition{condition="ready"}).

Another example of the Splunk team continuing to push the antipattern of using the metric value to encode enumerations. While this may be usable in the Splunk backend, it simply doesn't work well in most other metric systems (Datadog, New Relic, Prometheus, etc.).

This metric should instead be modeled like the kube_node_status_condition metric from kube-state-metrics which includes status as an attribute following the OpenMetrics StateSet pattern. This allows queries of the form

sum by(condition) (kube_node_status_condition{condition="ready", status="true"})

Collector version

v0.103.0

Environment information

No response

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

github-actions[bot] commented 3 days ago

Pinging code owners: