kubernetes-monitoring / kubernetes-mixin

A set of Grafana dashboards and Prometheus alerts for Kubernetes.
Apache License 2.0
2.1k stars 599 forks source link

Cannot use rules on mixed kube-state-metrics/node-exporter deployments #67

Open xrstf opened 6 years ago

xrstf commented 6 years ago

In our setup, we have two clusters:

  1. One of them is under our control and we have node-exporter (NE) deployed and scraped from the Prometheus running in that cluster.
  2. But we also have a second cluster where no NE is deployed. The Prometheus in that "foreign" cluster is federated from our Prometheus, adding a cluster label to all imported metrics.

In both clusters, kube-state-metrics (KSM) is deployed.

This leads us with a situation where in our Prometheus we now have KSM+NE metrics about our nodes and only KSM metrics about the foreign nodes. This creates a discrepancy between the "nodes as seen by KSM" and "nodes as seen by NE". As a result, the rules break because Prometheus gets confused about the grouping labels.

For our usecase, we fixed this by restricting the two "base recording rules", :kube_pod_info_node_count: and node_namespace_pod:kube_pod_info: to only count KSM metrics without a cluster label (directly inside the generated YAML, for testing purposes):

 - name: node.rules
   rules:
-  - expr: sum(min(kube_pod_info) by (node))
+  - expr: sum(min(kube_pod_info{cluster=""}) by (node))
     record: ':kube_pod_info_node_count:'
   - expr: |
-      max(label_replace(kube_pod_info{job="kube-state-metrics"}, "pod", "$1", "pod", "(.*)")) by (node, namespace, pod)
+      max(label_replace(kube_pod_info{cluster="",job="kube-state-metrics"}, "pod", "$1", "pod", "(.*)")) by (node, namespace, pod)
     record: 'node_namespace_pod:kube_pod_info:'
   - expr: |
       count by (node) (sum by (node, cpu) (

This seems to have fixed the problem. I was wondering if we can/should submit a PR to introduce a new config variable to the mixins to allow people to customize the selection of KSM metrics.

metalmatze commented 6 years ago

Normally the kubeStateMetricsSelector in the config should be enough. Apparently it's not for you, but I'm not really sure that having a cluster label selector is something most users want. 🤔