wavefrontHQ / wavefront-collector-for-kubernetes

Monitoring Kubernetes Clusters using Wavefront
https://www.wavefront.com/
Other
30 stars 35 forks source link

Scrape annotation on service doesn't work correctly #563

Closed jeesmon closed 1 year ago

jeesmon commented 1 year ago

According to the doc here:

The collector can dynamically discover pods/services annotated with prometheus.io/scrape

If we annotate a service with prometheus.io/scrape, it is scraping metrics only from a single endpoint behind a service.

From collector log:

wavefront-collector-vjwvx wavefront-collector time="2023-03-24T02:37:18Z" level=info msg="Finished querying source" latency=44.800952ms name="prometheus_source: http://172.20.197.130:8082/metrics" total_metrics=158
wavefront-collector-vjwvx wavefront-collector time="2023-03-24T02:38:18Z" level=info msg="Finished querying source" latency=43.862206ms name="prometheus_source: http://172.20.197.130:8082/metrics" total_metrics=158

But when curl'ing metrics endpoint directly from another pod

curl -s 172.20.197.130:8082/metrics|grep -v '#'|wc -l
158
curl -s 172.20.197.130:8082/metrics|grep -v '#'|wc -l
510
curl -s 172.20.197.130:8082/metrics|grep -v '#'|wc -l
510
curl -s 172.20.197.130:8082/metrics|grep -v '#'|wc -l
439

Each endpoint holds different sets of metrics but collector is getting metrics only from one endpoint.

oppegard commented 1 year ago

Hi @jeesmon, if you want to scrape all the pods backed by a service you'll need to annotate all the pods with prometheus.io/scrape. The service-level annotation will scrape with the behavior of a K8s Service: each scrape request will be routed to a pod based on the load balancing algorithm of the Service in question.

jeesmon commented 1 year ago

@oppegard Thanks for the reply. As you can see load balancing a metrics point is not an intended behavior in many cases as each pod behind a service will have different metrics. If you look at the prometheus ServiceMonitor, the behavior is scraping from each endpoint behind the service so we get aggregated metrics from all pods.

oppegard commented 1 year ago

Sorry for the confusion @jeesmon. We do not support the equivalent of the Prometheus Operator's ServiceMonitor resource. We have raised this issue with our product management team for consideration though. We'll also try adding a note in our documentation to make the pod-level and service-level scrape behavior clearer.