KubeMemoryOvercommit (and family) does not cordoning nodes into consideration

kubernetes-monitoring / kubernetes-mixin

A set of Grafana dashboards and Prometheus alerts for Kubernetes.

Apache License 2.0

2.11k stars 597 forks source link

KubeMemoryOvercommit (and family) does not cordoning nodes into consideration #770

Open mladedav opened 2 years ago

mladedav commented 2 years ago

We had a cluster where someone cordoned a few nodes and we have found out only when a node was restarted. I think that this should be part of the alert because a node failure may cause the cluster to be unable to schedule all pods.

github-actions[bot] commented 1 month ago

This issue has not had any activity in the past 30 days, so the stale label has been added to it.

The stale label will be removed if there is new activity
The issue will be closed in 7 days if there is no new activity
Add the keepalive label to exempt this issue from the stale check action

Thank you for your contributions!

skl commented 1 month ago

Seems like a genuine concern, not sure if KubeMemoryOvercommit is the right alert for this but some alert which says "cluster is unable to schedule all pods" sounds useful.