newrelic-experimental / monitoring-kubernetes-with-opentelemetry

Apache License 2.0
9 stars 6 forks source link

extraScrapeJobs exposed for custom metrics are not working #130

Closed tvalchev2 closed 6 months ago

tvalchev2 commented 7 months ago

Description

We are using the 0.8.0 Release of nrotelk8s. In Version 0.3.0 the pod scraping was added next to service scraping for metrics. However if one defines a new Job there, the metrics are not getting sent over. Our usecase is that for example we want to use whitelisting for unneeded metrics, however for some customer who wants to make some performance tests we want to send all metrics (but keep the metric whitelisting for other customer/teams on the cluster).

Steps to Reproduce

Therefore I defined a new scrapejob called for example kubernetes-pods-namespace-devteam1. However I don't see any metrics getting sent to newrelic for this:

    extraScrapeJobs:
      - job_name: 'kubernetes-pods-namespace-devteam1' 
        metric_relabel_configs:
          - action: keep
            source_labels: [namespace]
            regex: namespace-devteam1        
        scrape_interval: 60s
        honor_labels: true
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_slow]
            action: drop
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $$1:$$2
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node     
      # Kubernetes pod specific scrape job
      - job_name: 'kubernetes-pods'
        metric_relabel_configs:
          - action: keep
            regex: controller_runtime_reconcile_errors_total|controller_runtime_reconcile_time_seconds|controller_runtime_reconcile_total|go_info|go_memstats_alloc_bytes|gotk_reconcile_condition|gotk_reconcile_duration_seconds|process_cpu_seconds_total|rest_client_requests_total
            replacement: $1
            separator: ;
            source_labels: [__name__]

        scrape_interval: 60s
        honor_labels: true
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape_slow]
            action: drop
            regex: true
          - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
            action: replace
            target_label: __scheme__
            regex: (https?)
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            target_label: __address__
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $$1:$$2
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: pod
          - source_labels: [__meta_kubernetes_pod_node_name]
            action: replace
            target_label: node   

I suspect that the problem lies within the filter processor here: /helm/charts/collectors/templates/statefulset-otelcollector.yaml. The job is implemented and is probably scraping properly, but gets dropped by the filter processor, since it only supports kubernetes-pods.

Expected Behavior

The customer scrape jobs defined under extraScrapeJobs should be working and also added to the namespace filter processor to not be dropped.

Your Environment

Possible solution

An easy fix would be maybe to define the job kubernetes-pods in the filter processor as regex (kubernetes-pods*)

utr1903 commented 6 months ago

You are right. The scrape job names are not dynamically appended to the filterprocessor predicates. -> I'm on it!

tvalchev2 commented 6 months ago

While you are at it, could i ask you to help me how I can define it, so that it is more memory-usage forgiving on the collectors? If I do it like my example above all the namespaces/pods will be scraped by the 2nd job and with the metric relabel config only the desired namespaces will be saved in the prometheus database and then sent to NewRelic, but still everything on the cluster would get scraped.

Should I define something in the relabel_configs like:

          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: namespace
          - action: keep
            source_labels: [namespace]
            regex: namespace-devteam1          

Or maybe it is better to define it under kubernetes_sd_configs?:

kubernetes_sd_configs:
        - role: pod
          namespaces:
            names:
            - 'namespace1-devteam1'
            - 'namespace2-devteam1'

Or maybe the kubernetes_sd_configs.namespaces.names supports regex? (.*devteam1*)

From my understanding by doing it like this, then only the targeted namespaces will be scraped and not everything.

Any Help and Tips are welcome :)

utr1903 commented 6 months ago

@tvalchev2 Bug fixed with #131. New release coming! :)