Pod Scraping does not seem to be working.

tvalchev2 commented 1 year ago

Description

We are using the 0.3.1 Release of nrotelk8s which added the pod scraping next to service scraping for metrics. However i don't see any metrics coming from Pods/Pod Scraping. The Metrics disappeared/stopped coming when i removed the Service prometheus annotations and are not coming from the pod prometheus annotations (as example flux-system namespace and zookeeper namespace)

Steps to Reproduce

Install the 0.3.1 nrotelk8s Release with global config enabled and 2 accounts. The flux-system Pods (for example - notification-controller) are marked with:

    prometheus.io/port: '8080'
    prometheus.io/scrape: 'true'

When i expose the notification-controller through a service with the same annotations, metrics are getting sent to the opsteam NR Account, but when i remove the annotations from the service and use the Pod Annotations instead, no metrics are getting sent.

Expected Behavior

The metrics of the Pod schould be getting sent to the corresponding account in newrelic (this should be for example opsteam NR Account for flux-system), or devteam1 NR Account for let's say solr-devteam1-int Namespace. This still works, if the metrics are exposed through Service Annotations, but not for Pod Annotations.

Relevant Logs / Console output

There are no error logs i could find in the STS regarding flux-system namespace Pod Scraping. I found following logs for the zookeeper namespace scraping, but i guess the Problem there lies in me not annotating the prometheus metrics port via annotation, i only have prometheus.io/scrape: 'true' on the zookeeper pods.

Here are the logs for zookeper namespace, however there are no error logs for flux-system namespace, where the pods are tagged correctly:

2023-10-30T10:40:19.092Z    warn    internal/transaction.go:123 Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1698662419087, "target_labels": "{__name__=\"up\", instance=\"10.240.0.238:3888\", job=\"kubernetes-pods\", namespace=\"zookeeper\", node=\"aks-system-14590509-vmss00000a\", pod=\"zookeeper-2\"}"}
2023-10-30T10:40:40.285Z    warn    internal/transaction.go:123 Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1698662440284, "target_labels": "{__name__=\"up\", instance=\"10.240.0.190:2181\", job=\"kubernetes-pods\", namespace=\"zookeeper\", node=\"aks-system-14590509-vmss000009\", pod=\"zookeeper-1\"}"}
2023-10-30T10:40:48.604Z    warn    internal/transaction.go:123 Failed to scrape Prometheus endpoint    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1698662448602, "target_labels": "{__name__=\"up\", instance=\"10.240.0.52:2181\", job=\"kubernetes-pods\", namespace=\"zookeeper\", node=\"aks-system-14590509-vmss000008\", pod=\"zookeeper-0\"}"}

Your Environment

Versions of your OTel collectors: 0.3.1

Kubernetes manifest files:

nameOverride: nr-otel
clusterName: my-cluster-name
traces:
enabled: true
deployment:
ports:
prometheus:
  port: 8888
prometheus:
importantMetricsOnly: false
lowDataMode: true
logs:
enabled: true
metrics:
enabled: true
events:
enabled: true
daemonset:
ports:
prometheus:
  port: 8888
prometheus:
importantMetricsOnly: false
lowDataMode: true
statefulset:
resources:
requests:
  cpu: 50m
  memory: 0.5G
limits:
  cpu: 250m
  memory: 1G
replicas: "3"
prometheus:
importantMetricsOnly: false
lowDataMode: true
kubeStateMetrics:
  enabled: false
  serviceNameRef: kube-prometheus-stack-kube-state-metrics
nodeExporter:
  enabled: false
  serviceNameRef: kube-prometheus-stack-prometheus-node-exporter
ports:
prometheus:
  port: 8888
global:
newrelic:
enabled: true
endpoint: otlp.eu01.nr-data.net:4317
teams:
  opsteam:
    licenseKey:
      secretRef:
        name: otel-monitoring-keys
        key: opsteam-shared-ingest-key
    namespaces:
    - kube-system
    - cert-manager
    - flux-system
    - monitoring
    - ops-system
    - opentelemetry
    - ingress*
    - default
    - solrcloud-operator
    - zookeeper
  devteam1:
    licenseKey:
      secretRef:
        name: otel-monitoring-keys
        key: devteam1-ingest-key
    namespaces:
    - .*devteam1*

Additional information

I have the feeling, that the pods are probably getting scraped, but the metrics are not getting sent to NR, but i might be wrong.

utr1903 commented 1 year ago

@tvalchev2 The problem is with filtering... The newly added kubernetes-pods scrape job is not included in the namespaced-based filtering. Therefore, when the filter is applied, all of the metrics coming from that scrape job are dropped by the filterprocessor.

-> Will fix it today :)

utr1903 commented 1 year ago

Fixed with #77. I'll publish a new release today.

newrelic-experimental / monitoring-kubernetes-with-opentelemetry