newrelic-experimental / monitoring-kubernetes-with-opentelemetry

Apache License 2.0
9 stars 6 forks source link

Pod Scraping does not seem to be working. #76

Closed tvalchev2 closed 1 year ago

tvalchev2 commented 1 year ago

Description

We are using the 0.3.1 Release of nrotelk8s which added the pod scraping next to service scraping for metrics. However i don't see any metrics coming from Pods/Pod Scraping. The Metrics disappeared/stopped coming when i removed the Service prometheus annotations and are not coming from the pod prometheus annotations (as example flux-system namespace and zookeeper namespace)

Steps to Reproduce

Install the 0.3.1 nrotelk8s Release with global config enabled and 2 accounts. The flux-system Pods (for example - notification-controller) are marked with:

    prometheus.io/port: '8080'
    prometheus.io/scrape: 'true'

When i expose the notification-controller through a service with the same annotations, metrics are getting sent to the opsteam NR Account, but when i remove the annotations from the service and use the Pod Annotations instead, no metrics are getting sent.

Expected Behavior

The metrics of the Pod schould be getting sent to the corresponding account in newrelic (this should be for example opsteam NR Account for flux-system), or devteam1 NR Account for let's say solr-devteam1-int Namespace. This still works, if the metrics are exposed through Service Annotations, but not for Pod Annotations.

Relevant Logs / Console output

There are no error logs i could find in the STS regarding flux-system namespace Pod Scraping. I found following logs for the zookeeper namespace scraping, but i guess the Problem there lies in me not annotating the prometheus metrics port via annotation, i only have prometheus.io/scrape: 'true' on the zookeeper pods.

Here are the logs for zookeper namespace, however there are no error logs for flux-system namespace, where the pods are tagged correctly:

2023-10-30T10:40:19.092Z    warn    internal/transaction.go:123 Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1698662419087, "target_labels": "{__name__=\"up\", instance=\"10.240.0.238:3888\", job=\"kubernetes-pods\", namespace=\"zookeeper\", node=\"aks-system-14590509-vmss00000a\", pod=\"zookeeper-2\"}"}
2023-10-30T10:40:40.285Z    warn    internal/transaction.go:123 Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1698662440284, "target_labels": "{__name__=\"up\", instance=\"10.240.0.190:2181\", job=\"kubernetes-pods\", namespace=\"zookeeper\", node=\"aks-system-14590509-vmss000009\", pod=\"zookeeper-1\"}"}
2023-10-30T10:40:48.604Z    warn    internal/transaction.go:123 Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1698662448602, "target_labels": "{__name__=\"up\", instance=\"10.240.0.52:2181\", job=\"kubernetes-pods\", namespace=\"zookeeper\", node=\"aks-system-14590509-vmss000008\", pod=\"zookeeper-0\"}"}

Your Environment

utr1903 commented 1 year ago

@tvalchev2 The problem is with filtering... The newly added kubernetes-pods scrape job is not included in the namespaced-based filtering. Therefore, when the filter is applied, all of the metrics coming from that scrape job are dropped by the filterprocessor.

-> Will fix it today :)

utr1903 commented 1 year ago

Fixed with #77. I'll publish a new release today.