open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.08k stars 2.37k forks source link

[receiver/prometheus] "job or instance cannot be found from labels" when honor_labels is true with Federation #23060

Closed ravitri closed 11 months ago

ravitri commented 1 year ago

Component(s)

receiver/prometheus

What happened?

Description

Metrics ingestion using prometheus receiver doesn't work when match[] filter scrapes /federate endpoint that includes label/metric from a recording rule which doesn't have job/instance label.

Details

The requirement is that /federate endpoint is to be scraped using a match filter label or metric of the recording rule(s) with honor_labels: true but it doesn't work. However, the metric is exported to OTLPHTTP endpoint when honor_labels: false.

The debug logs report the following(log modified to redact details and extra labels):

2023-06-05T10:06:05.521Z    debug    scrape/scrape.go:1648    Unexpected error    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus-self", "target": "http://dummy-service.dummy-namespace.svc:9090/federate?match%5B%5D=%7Botel_collect%3D%22true%22%7D", "series": "recording_rule_metric{otel_collect=\"true\",instance=\"\"}", "error": "job or instance cannot be found from labels"}
2023-06-05T10:06:05.521Z    debug    scrape/scrape.go:1368    Append failed    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus-self", "target": "http://dummy-service.dummy-namespace.svc:9090/federate?match%5B%5D=%7Botel_collect%3D%22true%22%7D", "error": "job or instance cannot be found from labels"}

Steps to Reproduce

  1. Create a recording rule for testing purpose with label otel_collect: true:
apiVersion: monitoring.rhobs/v1
kind: PrometheusRule
metadata:
  name: test
spec:
  groups:
  - name: test-recording-rule
    interval: 30s
    rules:
    - record: recording_rule_metric
      expr: up{key=value}
      labels:
        otel_collect: "true"
  1. Make sure that all match filter conditions are successful when scraping the /federate endpoint directly.
$ curl -G --data-urlencode 'match[]={otel_collect="true"}' localhost:9090/federate
recording_rule_metric{otel_collect="true",instance=""} 1 1685965626428
$ curl -G --data-urlencode 'match[]={__name__="recording_rule_metric"}' localhost:9090/federate
recording_rule_metric{otel_collect="true",instance=""} 1 1685965626428
$ curl -G --data-urlencode 'match[]={__name__="dummy_metric"}' localhost:9090/federate
dummy_metric{endpoint="metrics",instance="10.128.11.43:60000",job="dummy-operator",name="defaut",pod="pod-88686c598-dr2f4"} 0 1685965629439
  1. OpenTelemetryCollector definition:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-example
spec:
  mode: deployment
  config: |
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: prometheus-self
              scrape_interval: 30s
              scrape_timeout: 10s
              metrics_path: /federate
              scheme: http
              honor_labels: true
              enable_http2: true
              kubernetes_sd_configs:
              - role: service
                namespaces:
                  own_namespace: false
                  names:
                  - dummy-namespace
                selectors:
                - role: service
                  label: "dummy-key=dummy-value"
              params:
                'match[]':
                   - '{otel_collect="true"}'
                   - '{__name__="recording_rule_metric"}'
                   - '{__name__="dummy_metric"}'
    exporters:
      otlphttp:
        endpoint: "https://dummy.example.com/api/otlp"
        headers:
          Authorization: "Api-Token ${TOKEN}"
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          exporters: [otlphttp]
  1. Check metrics in the OTLPHTTP endpoint to confirm.

Expected Result

  1. Scraping /federate endpoint with otel_collect: true should work fine when honor_labels: true.

Actual Result

  1. The recording_rule_metric metric is visible in otlphttp endpoint when honor_labels: false. Also, it does not have job or instance labels.
  2. The match[] filter doesn't work with otel_collect: true nor __name__="recording_rule_metric" when honor_labels: true.
  3. The dummy_metric has the job and instance labels and the metric is exported to otlphttp endpoint even when honor_labels: true.

Collector version

0.77.0

Environment information

Environment

OS: Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) Platform: AWS Kubernetes: v1.25.7

OpenTelemetry Collector configuration

receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: prometheus-self
              scrape_interval: 30s
              scrape_timeout: 10s
              metrics_path: /federate
              scheme: http
              honor_labels: true
              enable_http2: true
              kubernetes_sd_configs:
              - role: service
                namespaces:
                  own_namespace: false
                  names:
                  - dummy-namespace
                selectors:
                - role: service
                  label: "dummy-key=dummy-value"
              params:
                'match[]':
                   - '{otel_collect="true"}'
                   - '{__name__="recording_rule_metric"}'
                   - '{__name__="dummy_metric"}'
    exporters:
      otlphttp:
        endpoint: "https://dummy.example.com/api/otlp"
        headers:
          Authorization: "Api-Token ${TOKEN}"
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          exporters: [otlphttp]

Log output

2023-06-05T10:06:05.521Z    debug    scrape/scrape.go:1648    Unexpected error    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus-self", "target": "http://dummy-service.dummy-namespace.svc:9090/federate?match%5B%5D=%7Botel_collect%3D%22true%22%7D", "series": "recording_rule_metric{otel_collect=\"true\",instance=\"\"}", "error": "job or instance cannot be found from labels"}
2023-06-05T10:06:05.521Z    debug    scrape/scrape.go:1368    Append failed    {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "prometheus-self", "target": "http://dummy-service.dummy-namespace.svc:9090/federate?match%5B%5D=%7Botel_collect%3D%22true%22%7D", "error": "job or instance cannot be found from labels"}

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ravitri commented 1 year ago

cc @frzifus

frzifus commented 1 year ago

thx @ravitri, I will have a look asap.

ravitri commented 1 year ago

@frzifus - Would like to share an update on this.

We checked more on the recording rule expression and added job and instance in the aggregation as well and it worked for us. Example:

BEFORE:

sum by (mylabel)(sum_over_time(dummy_metric[1m]))

AFTER:

sum by (job, instance, mylabel)(sum_over_time(dummy_metric[1m]))

Basically what was understood is that setting honor_labels: true relied on either the job or the instance label to be present else it reported error. I am inclined to think that it's the expected behavior but also specifically from recording rules and cardinality point of view am also wondering if honor_labels: true should ignore absence of job and instance or not.

Would like to have your thoughts on it as well. Thanks a lot!

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 11 months ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.