open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.88k stars 2.25k forks source link

Metric relabel ignored #33089

Open milosgajdos opened 3 months ago

milosgajdos commented 3 months ago

Component(s)

receiver/prometheus

Describe the issue you're reporting

This was originally raised in https://github.com/open-telemetry/opentelemetry-operator/issues/2961

We have a small EKS cluster that runs kube-state-metrics for reporting the metrics from it.

The metrics are scraped by adot collector which has the following scrape config (the rest of the config is omitted):


receivers:
  prometheus:
    config:
      global:
        scrape_interval: 1m
        scrape_timeout: 40s

      scrape_configs:
      - job_name: 'kube-state-metrics'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_name]
            regex: kube-state-metrics
            action: keep
          - target_label: cluster
            replacement: our-cluster-name

This should add cluster="cluster-name" label to every metric scraped from kube-state-metrics job alas, it's completely ignored for some reason.

Other things we've tried:

          scrape_configs:
          - job_name: 'kube-state-metrics'
            kubernetes_sd_configs:
              - role: endpoints
            relabel_configs:
              - source_labels: [__meta_kubernetes_service_name]
                regex: kube-state-metrics
                action: keep
            metric_relabel_configs:
              - target_label: cluster
                replacement: our-cluster-name

I am not sure if this is a bug or if we are missing something here, but this seems like it should work as it works for other prometheus scrapers. Equally, if you check https://relabeler.promlabs.com/ this should add the cluster label in. If anyone has any ideas that'd be greatly appreciated!

github-actions[bot] commented 3 months ago

Pinging code owners:

dashpole commented 3 months ago

Either method should work. Can you reproduce it with the collector's self-observability metrics, like in the readme?

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/d66c0dc324fcfaaa0ab6fed8e7140588cbdb05d1/receiver/prometheusreceiver/README.md?plain=1#L96

milosgajdos commented 3 months ago

Either method should work

In theory, yes, in practice, I wouldn't have the soul as an OSS maintainer myself, to make this up, no.

Can you reproduce it with the collector's self-observability metrics, like in the readme?

I haven't tried static_configs, yet, no. Good shout worth a shot but the config I've mentioned on this issue should work, alas, no, it does not. Will report back, thanks.

milosgajdos commented 3 months ago

~The only thing that worked for us~ sticking external_label into the global config like so does seem to make it somewhat work:

    receivers:
      prometheus:
        config:
          global:
            scrape_interval: 1m
            scrape_timeout: 40s
            external_labels:
              cluster: our-cluster-name

Neither static_config nor "regular" target_label replacement inside a specific job config does the trick; either we're missing something here or this is somehow swallowed by config parsers 🤷‍♂️

milosgajdos commented 3 months ago

Actually, I may have spoken too soon. It seems like it works in a rather...strange way. The global config above should add that label to every metric, alas, it does not.

Some metrics are suspiciously missing it.

Example: the label is missing in kube_node_info scraped from kube-state-metrics (similarly kube_node_status_addresses, etc.) Weirdly enough, kube_pod_info metric also exported by the kube-state-metrics does have the label.

Seems like a very strange heisenbug 🫠 😮‍💨

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

milosgajdos commented 1 month ago

FYI: This is still an issue.