kubernetes / kube-state-metrics

Add-on agent to generate and expose cluster-level metrics.
https://kubernetes.io/docs/concepts/cluster-administration/kube-state-metrics/
Apache License 2.0
5.2k stars 1.92k forks source link

[node metrics] add message field to kube_node_status_condition #2404

Open daveoy opened 1 month ago

daveoy commented 1 month ago

What would you like to be added:

the message field from kube_node_status_condition would be helpful to have for visualization and alerting purposes -- especially for custom conditions

Why is this needed:

to enhance alerts based on these conditions or add context to visualizations of node status conditions

Describe the solution you'd like

add a message label to the existing timeseries generated by the generator family function that tracks this metric

Additional context

could it be as simple as adding the desired field as a label here https://github.com/kubernetes/kube-state-metrics/blob/85762cdf9790999957d8e9afdfc7253b1fa705db/internal/store/node.go#L478-L507 ?

ricardoapl commented 4 weeks ago

I would be happy to submit a patch to support your use case.

However, I'm afraid such label could have a significant impact on cardinality. It seems to me that conditions DiskPressure, MemoryPressure, PIDPressure, and NetworkUnavailable have a small, bounded set of values for the message field, whereas condition Ready doesn't.

Let's wait for others to comment on this.

daveoy commented 4 weeks ago

Thanks for that. I'm happy to contribute the patch as well.

Regarding cardinality; Perhaps it could be feature flagged or something?

dgrisonnet commented 2 weeks ago

I agree with @ricardoapl, message is unbounded so it could cause cardinality explosions if we introduced it. Especially if some variable information such as timestamps are introduced inside of it.

Regarding cardinality; Perhaps it could be feature flagged or something?

I wouldn't be in favor of adding any feature to kube-state-metrics that could harm users clusters, even behind feature flags. In the end we would still be responsible for the cardinality explosions that are bound to happen with such metrics.

daveoy commented 2 weeks ago

how about if enablement meant automatically adding a labeldrop blacklist that doesn't allow message / reason etc that the user had to repeal:

relabel_configs:
- action: labeldrop
  regex: message
- action: labeldrop
  regex: reason

or perhaps a default regex sort of thing in kubernetes sd objects (serviceMonitor,podMonitor etc) -- and docs for non-kubernetes config use cases? this way users could allow a certain set of regex-matching messages through and contain the chaos?

dgrisonnet commented 1 day ago

/assign @ricardoapl /triage accepted