kubernetes-monitoring / kubernetes-mixin

A set of Grafana dashboards and Prometheus alerts for Kubernetes.
Apache License 2.0
2.11k stars 598 forks source link

Memory quota panel: memory limits % must include cache #804

Closed nanouck closed 3 weeks ago

nanouck commented 1 year ago

https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/05a58f765eda05902d4f7dd22098a2b870f7ca1e/dashboards/resources/pod.libsonnet#L240

The cache usage is counted within the container’s cgroup, which can be restricted in size by the limit In our case the pod task has been OOM killed because WSS + cache is reaching the limit

https://community.ibm.com/community/user/aiops/blogs/riley-zimmerman/2021/07/02/memory-measurements-complexities-part2

-          'sum(container_memory_working_set_bytes{%(cadvisorSelector)s, %(clusterLabel)s="$cluster", namespace="$namespace", pod="$pod", container!="", image!=""}) by (container) / sum(cluster:namespace:pod_memory:active:kube_pod_container_resource_limits{%(clusterLabel)s="$cluster", namespace="$namespace", pod="$pod"}) by (container)' % $._config,
+          'sum(container_memory_working_set_bytes{%(cadvisorSelector)s, %(clusterLabel)s="$cluster", namespace="$namespace", pod="$pod", container!="", image!=""} + container_memory_cache{%(cadvisorSelector)s, %(clusterLabel)s="$cluster", namespace="$namespace", pod="$pod", container != "", container != "POD"}) by (container) / sum(cluster:namespace:pod_memory:active:kube_pod_container_resource_limits{%(clusterLabel)s="$cluster", namespace="$namespace", pod="$pod"}) by (container)' % $._config
github-actions[bot] commented 4 weeks ago

This issue has not had any activity in the past 30 days, so the stale label has been added to it.

Thank you for your contributions!