kubernetes-monitoring / kubernetes-mixin

A set of Grafana dashboards and Prometheus alerts for Kubernetes.
Apache License 2.0
2.08k stars 597 forks source link

Fix apiserver calculations not matching `resource` correctly #917

Open jalev opened 4 months ago

jalev commented 4 months ago

The current behaviour of scope=~"resource|" does not match anything even if there is a resource label, and so will always return vector(0). This makes the calculations go into negative numbers which makes the availability numbers return a value >1 :

Screenshot 2024-04-26 at 16 20 13 Screenshot 2024-04-26 at 16 23 32

When you change the matcher to scope="resource" :

Screenshot 2024-04-26 at 16 21 23 Screenshot 2024-04-26 at 16 23 42

This is also true for other things with the scope=~"resource|". E.g. the error burndown:

Screenshot 2024-04-26 at 16 28 53

and after fixing the query:

Screenshot 2024-04-26 at 16 29 04
lorenzofelletti commented 2 months ago

Hi @jalev this is weird as in my case scope=~"resource|" matches exactly both when the scope label equals resource and when it is empty. I'm experiencing a greater than 100% availability sporadically too, though I've tracked it down to a difference between cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{le="+Inf"} and cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d, while they should be equal (still I am not sure of the issue, but it might be related to this one here. Did you check this is not your case too?

jalev commented 2 months ago

I'll take a look and come back 👍

github-actions[bot] commented 3 days ago

This PR has been automatically marked as stale because it has not had any activity in the past 30 days. The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed in 7 days if there is no new activity. Thank you for your contributions!