Closed bobheadxi closed 3 years ago
We recently ran into issues with unschedulable pods causing silently failing upgrades. @pecigonzalo suggested the following:
I would do it by just checking unschedulable pods metrics. There are quite a few Kube metrics which I believe we dont alert on but would provide this information. I would generally implement most of what is in https://monitoring.mixins.dev/kubernetes/ (basically https://github.com/kubernetes-monitoring/kubernetes-mixin) and https://gitlab.com/gitlab-com/runbooks/-/tree/master/ (lots of good monitoring examples and dashboards there) AFAIR there are a few for pod state
We have a "Kubernetes monitoring" section under each dashboard currently that we can expand with this information
@pecigonzalo brought up an awesome resource we can use for this: https://awesome-prometheus-alerts.grep.to/rules#kubernetes
We recently ran into issues with unschedulable pods causing silently failing upgrades. @pecigonzalo suggested the following:
We have a "Kubernetes monitoring" section under each dashboard currently that we can expand with this information