kubernetes-monitoring / kubernetes-mixin

A set of Grafana dashboards and Prometheus alerts for Kubernetes.
Apache License 2.0
2.11k stars 597 forks source link

KubeMemoryOvercommit (and family) does not cordoning nodes into consideration #770

Open mladedav opened 2 years ago

mladedav commented 2 years ago

We had a cluster where someone cordoned a few nodes and we have found out only when a node was restarted. I think that this should be part of the alert because a node failure may cause the cluster to be unable to schedule all pods.

github-actions[bot] commented 1 month ago

This issue has not had any activity in the past 30 days, so the stale label has been added to it.

Thank you for your contributions!

skl commented 1 month ago

Seems like a genuine concern, not sure if KubeMemoryOvercommit is the right alert for this but some alert which says "cluster is unable to schedule all pods" sounds useful.