Label Matching Problem in PromQL Queries for TCP Retransmission and Syn Retransmission Rates

apankevics commented 2 months ago

Issue Description

There is a label matching problem in the following PromQL queries:

sum by (instance) (rate(node_netstat_Tcp_RetransSegs{%(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s]) / rate(node_netstat_Tcp_OutSegs{%(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s]) * on (%(clusterLabel)s,namespace,pod) kube_pod_info{host_network="false"})
sum by (instance) (rate(node_netstat_TcpExt_TCPSynRetrans{%(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s]) / rate(node_netstat_Tcp_RetransSegs{%(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s]) * on (%(clusterLabel)s,namespace,pod) kube_pod_info{host_network="false"})

Problem

The sum by (instance) aggregation is applied to the ratio calculations, but the kube_pod_info metric is not aggregated on the instance label, and it does not appear in the on clause. As a result, the join operation is performed on the cluster, namespace, and pod labels, which might lead to incorrect comparisons or misleading results.

Steps to Reproduce

Execute the above PromQL queries in Prometheus:

sum by (instance) (rate(node_netstat_Tcp_RetransSegs{%(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s]) / rate(node_netstat_Tcp_OutSegs{%(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s]) * on (%(clusterLabel)s,namespace,pod) kube_pod_info{host_network="false"})

sum by (instance) (rate(node_netstat_TcpExt_TCPSynRetrans{%(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s]) / rate(node_netstat_Tcp_RetransSegs{%(clusterLabel)s="$cluster"}[%(grafanaIntervalVar)s]) * on (%(clusterLabel)s,namespace,pod) kube_pod_info{host_network="false"})

Observe the results, which are shown as:

sum by (instance) (rate(node_netstat_Tcp_RetransSegs{%(clusterLabel)s="$cluster"}[1m0s]) / rate(node_netstat_Tcp_OutSegs{%(clusterLabel)s="$cluster"}[1m0s]) * on (%(clusterLabel)s,namespace,pod) kube_pod_info{host_network="false"})

sum by (instance) (rate(node_netstat_TcpExt_TCPSynRetrans{%(clusterLabel)s="$cluster"}[1m0s]) / rate(node_netstat_Tcp_RetransSegs{%(clusterLabel)s="$cluster"}[1m0s]) * on (%(clusterLabel)s,namespace,pod) kube_pod_info{host_network="false"})

Notice that the results are incorrect due to the mismatch in labels used in the join operation.

Expected Behavior

The queries should correctly aggregate and join the metrics on the appropriate labels to avoid misleading results.

Possible Solution

To fix the issue, ensure that the instance label is considered in the join operation or modify the aggregation strategy. One possible solution might be to aggregate kube_pod_info on the instance label as well.

Changes

The label matching problem was introduced in the following commit:

d63872c

github-actions[bot] commented 3 days ago

This issue has not had any activity in the past 30 days, so the stale label has been added to it.

The stale label will be removed if there is new activity The issue will be closed in 2 days if there is no new activity Add the keepalive label to exempt this issue from the stale check action Thank you for your contributions!

skl commented 2 days ago

There does seem to be issues with queries like this:

https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/dec42129576a19dc175aea145adbbc0d2fabe074/dashboards/network-usage/cluster-total.libsonnet#L441-L448

instance label on node exporter is usually the node, but I think instance label on kube-state-metrics is often the Prometheus instance and the node name is usually on the node label, at least in my scrape config. So instance != instance, at least for me.
node_netstat_Tcp_RetransSegs doesn't have namespace or pod labels, so joining on these labels doesn't make sense here - only cluster will match in the end.

The original problem that the commit mentioned in the issue description (#972) seems valid, but the current solution of joining against kube_pod_info is likely only to work with queries based on cAdvisor/kube-state-metrics and not node exporter.

My suggestion would be to remove these new joins from the node exporter queries and keep them where they make sense.

kubernetes-monitoring / kubernetes-mixin