kubernetes-monitoring / kubernetes-mixin

A set of Grafana dashboards and Prometheus alerts for Kubernetes.
Apache License 2.0
2.08k stars 597 forks source link

resource_alerts: drop redundant KSM selector from `KubeCPUOvercommit` #947

Closed rexagod closed 1 month ago

rexagod commented 2 months ago

Fixes the alert KubeCPUQuotaOvercommit not firing as expected.

cc @simonpasquier

rexagod commented 2 months ago

ACK, I noticed the outer sum will drop the labels anyway (except cluster), and since the innermost metric already respects the KSM selector, the outermost KSM selector is redundant. I missed this earlier, fixed.

povilasv commented 2 months ago

Could you elaborate why drop of this selector fixes something?

I feel like job label is the required one for all the dashboards / rules / alerts, so that you can correctly select the Prometheus job. It's also mentioned in the spec https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/DESIGN.md#proposal

rexagod commented 2 months ago

👋🏼 Thank you for looking at this, @povilasv! It's essentially the fact that namespace_cpu:kube_pod_container_resource_requests:sum has no job labels for kubeStateMetricsSelector to be respected by, so the current expression yields no metrics, which ends up affecting the alert this is used in.

However, I'd like to explicitly mention that kubeStateMetricsSelector is still respected in the inner expressions pre-filtering for the same metric.

rexagod commented 1 month ago

Gentle ping, @povilasv. 🙂

povilasv commented 1 month ago

understood, thanks!