kubernetes-monitoring / kubernetes-mixin

A set of Grafana dashboards and Prometheus alerts for Kubernetes.
Apache License 2.0
2.08k stars 597 forks source link

Error "multiple matches for labels: many-to-one matching must be explicit (group_left/group_right)" on Kubernetes/Networking Dashboards #951

Closed gsmith-sas closed 2 weeks ago

gsmith-sas commented 2 months ago

When I attempt to deploy 3 of the Grafana dashboards (i.e. Kubernetes/ Networking/ Cluster, Kubernetes/Networking/Namespace (Pods) and Kubernetes / Networking/Namespace (Workload)) with Grafana 11, the dashboards fail to load properly. An error icon appears for each of chart/plot on these dashboards reporting the error "_multiple matches for labels: many-to-one matching must be explicit (group_left/groupright)".

I am deploying Grafana as part of the Kube-Prometheus Stack Helm chart which deploys this mix-in. I was originally deploying Grafana 11.0.0 using Kube-Prometheus Stack Helm chart version 60.4.0. But I noticed there was an update released three days ago so I have just redeployed with Grafana 11.1.0 and Kube-Prometheus Stack Helm chart version 61.1.1 and the problem remains. Here's a screenshot showing the error message (for one of the queries): image

dragoangel commented 2 months ago

Should be something like sum(sum(rate(container_network_transmit_bytes_total{cluster="$cluster",namespace=~"$namespace"}[$__rate_interval])) by (cluster,namespace,pod) * on (cluster,namespace,pod) GROUP_RIGHT() sum(kube_pod_info{host_network="false"}) without(pod_ip,uid))

gsmith-sas commented 2 months ago

@jkroepke Can you take a look at this issue? It appears you made the most recent changes to the queries, perhaps there is a simple typo? Or, does the current syntax work in your deployment? Thanks!

jkroepke commented 2 months ago

Could someone test this one?

sum by (namespace) (
    rate(container_network_receive_bytes_total{cluster="$cluster",namespace!=""}[$__rate_interval])
  * on (cluster, namespace, pod) group_left ()
    topk by (cluster, namespace, pod) (
      1,
      max by (cluster, namespace, pod) (kube_pod_info{host_network="false"})
    )
)
gsmith-sas commented 2 months ago

That PromQL executes without errors. I've included a screenshot of the resulting chart below. Is that what the chart should look like?
image

gsmith-sas commented 1 month ago

@jkroepke Do you need additional help testing any other revised PromQL queries for the Kubernetes/Networking dashboards? Or, are they similar enough that you have what you need to get these dashboards back to a good state?

povilasv commented 1 month ago

@jkroepke do you know why need this part? What is the issue here? :thinking:

    topk by (cluster, namespace, pod) (
      1,
      max by (cluster, namespace, pod) (kube_pod_info{host_network="false"})
    )
jkroepke commented 1 month ago

If you deploy Statefulsets, than the same pod name in the same namespace/cluster exists, but with different labels (e.g. IP).

Then, I replicate the logic from other alerts:

https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/b836f0200a49645bda8d8623576e002814c934a6/rules/apps.libsonnet#L20-L22

povilasv commented 1 month ago

Ah, :100: feel free to open your PR, would definetely merge it https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/964 :+1:

jkroepke commented 1 month ago

964 is not completed yet. I missing the workload/namespace level dashboards yet.

jkroepke commented 1 month ago

964 is done yet. It would be great, if someone could also test it on own infra.

gsmith-sas commented 2 weeks ago

@jkroepke I will test these changes when they are available via the Kube-Prometheus Stack Helm chart project. Thank you for fixing this!