Open minrk opened 4 years ago
FWIW, container_memoryrss and presumably all container metrics are affected.
You need to add container!="POD"
to your query.
You can look at kubernetes-mixin project to get ready to use dashboards and alerts which are tested by others. Generated YAMLs can be obtained from https://monitoring.mixins.dev/kubernetes/
I also have the same problem
We have the exact same issue and I'm so surprised this isn't more recognised as a problem. It's caused double counting on any of our container metrics.
Oddly ours only start displaying 12 hours in the past (metrics now aren't affected, but metrics older than 12 hours are)
This also ends up blowing up cardinality and metric series at scale, which is a major problem with almost all metrics providers.
We have some grafana charts from promethus that look like:
However, we noticed after a recent upgrade to GKE v1.17.9-gke.1504 (from 1.16) that resources seemed to spike, and it turns out that in addition to an entry for each container, there is matching entry with
container
undefined, which appears to be the sum of the actual containers, so our charts started reporting always ~2x the 'real' usage. Is there a recommended fix for this, or do we need to manually add,container!=""
to all of our queries to get accurate sums? Or am I misinterpreting what the undefined-container entry is?The missing-container results are always missing
container, image, name
keys and are otherwise identical to the 'real' metrics.Our prometheus is public, so you can see the results here.
I didn't find this exact issue searching for it, but happy for a close & link if this turns out to be a duplicate.