grafana / k8s-monitoring-helm

Apache License 2.0
140 stars 58 forks source link

Question: Finding metric targets via prometheus.operator.* or discovery.kubernetes.* #381

Open marshallford opened 4 months ago

marshallford commented 4 months ago

Hello again! While reviewing the default metrics.river config I noticed that for the handful of services that export cluster/system metrics (node-exporter, etc) the discovery.kubernetes.* and discovery.relabel components are used to find the relevant targets for said services. Similarly, I also noted that the default config includes the use of the prometheus.operator.* components to find targets via the Prometheus Operator CRs -- which among other benefits mitigates the need to add discover.* components to an ever growing config.

Assuming I have that all straight, I'll get to my actual question: Given that the helm charts for kube-state-metrics, node-exporter, opencost, and even grafana-agent itself all include ServiceMonitor resources, why write configuration to scrape those targets when the helm chart already provides the information on how to scrape the services? In addition, any concern the targets will be scraped twice (if any of the helm charts mentioned enable the creation of the ServiceMonitor by default)?

Thanks!

petewall commented 4 months ago

Yes, there is the chance for double-scraping. But I think it's worth the risk considering:

  1. This chart controls the deployment of Node Exporter, Kube-State-Metrics, OpenCost, and the Agents. So we can turn those off by default.
  2. This chart aims to be a easy replacement for people who are used to instrumenting their cluster with: Grafana Agent Operator, Kube-prometheus-stack, and many other implementations. I wanted to have ServiceMonitor support in order to minimize the effort needed to "convert" everything, especially custom application metrics.

It's making me think of ways to minimize double-scraping, though. Perhaps there's a test that can be added to helm test that can check if a metric like kube-node-info is being scraped, but coming from multiple sources.

bentonam commented 1 month ago

To add some additional context to this issue. With large kubernetes deployments, using prometheus.scrape -> prometheus.relabel can be more beneficial than ServiceMonitors, PodMonitors and Probes. This is due to the prometheus.relabel component support an argument for max_cache_size (default: 100000) which can be tuned to meet the needs of the components it is relabeling. It should be 2x-5x the largest scrape job / sample size. This can dramatically reduce CPU by allowing the relabel cache to be fully leveraged.

Currently ServiceMonitors, PodMonitors, Probes, etc. use their own internal relabeling, which does not support tuning of the cache size. However, there is an open issue with Alloy to have those use the prometheus.relabel instead https://github.com/grafana/alloy/issues/888