open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

prometheusreceiver upsert job to service.name #8060

Closed CoderPoet closed 1 year ago

CoderPoet commented 2 years ago

Describe the bug When using prometheusreceiver, if the user has declared service.name in resource attributes, it will be overridden by the job

Steps to reproduce

  1. exporter some metrics with service.name resource attributes image

image

  1. use otel prometheusreceiver pull app metrics image

What did you expect to see? The service_name should be kafka-producer, not job

What did you see instead? The service_name should is job value

What version did you use? v0.45.1

What config did you use?

    extensions:
    processors:
    receivers:
      prometheus:
        config:
          global:
            evaluation_interval: 1m
            scrape_interval: 30s
            scrape_timeout: 10s
          scrape_configs:
          - job_name: kubernetes-pods
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - action: keep
              regex: true
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_scrape
            - action: replace
              regex: (.+)
              source_labels:
              - __meta_kubernetes_pod_annotation_prometheus_io_path
              target_label: __metrics_path__
            - action: replace
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $$1:$$2
              source_labels:
              - __address__
              - __meta_kubernetes_pod_annotation_prometheus_io_port
              target_label: __address__
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: replace
              source_labels:
              - __meta_kubernetes_namespace
              target_label: k8s_namespace_name
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: k8s_pod_name
            - action: drop
              regex: Pending|Succeeded|Failed
              source_labels:
              - __meta_kubernetes_pod_phase
            - source_labels:
              - __meta_kubernetes_service_name
              target_label: k8s_service_name
      zipkin: {}
    service:
      extensions:
      - health_check
      - pprof
      - zpages
      pipelines:
        metrics:
          exporters:
          processors:
          receivers:
          - prometheus

Environment MacOs go 17.6

Additional context https://github.com/open-telemetry/opentelemetry-collector/pull/3139#discussion_r811756868

Aneurysm9 commented 2 years ago

Setting the job label's value as service.name is required by the specification. The prometheus exporter to prometheus receiver pipeline cannot maintain the information that service.name (or any other resource attribute) was set as a resource attribute because the Prometheus exposition format does not provide a mechanism for distinguishing resource attributes from metric attributes.

Since you appear to be using the OTel Go SDK to generate these metrics, would it be possible to use the OTLP exporter and receiver? Those components will properly distinguish between, and preserve, resource and metric attributes.

CoderPoet commented 2 years ago

Setting the job label's value as service.name is required by the specification. The prometheus exporter to prometheus receiver pipeline cannot maintain the information that service.name (or any other resource attribute) was set as a resource attribute because the Prometheus exposition format does not provide a mechanism for distinguishing resource attributes from metric attributes.

Since you appear to be using the OTel Go SDK to generate these metrics, would it be possible to use the OTLP exporter and receiver? Those components will properly distinguish between, and preserve, resource and metric attributes.

Thank you very much for your reply! But I'm thinking, since otel has opened up the prom way to expose the metrics, then this problem must exist and need to be solved, right? Is it necessary to rethink the specification?

CoderPoet commented 2 years ago

Setting the job label's value as service.name is required by the specification. The prometheus exporter to prometheus receiver pipeline cannot maintain the information that service.name (or any other resource attribute) was set as a resource attribute because the Prometheus exposition format does not provide a mechanism for distinguishing resource attributes from metric attributes.

Since you appear to be using the OTel Go SDK to generate these metrics, would it be possible to use the OTLP exporter and receiver? Those components will properly distinguish between, and preserve, resource and metric attributes.

And my question is: prometheusreceiver set the value of the job to service.name, according to the specification should actually be the user-configured service name or target belongs to the service, rather than the value of the job?

image
crobertson-conga commented 2 years ago

I just ran into this. It doesn't make sense that the service doing the scraping gets to rename the service if it is already set. Especially if the endpoint is an aggregation of multiple services. My specific use case is using the Span Metrics Processor to set up timing for multiple services. The path recommended to use for rate-limiting makes this bug appear.

CoderPoet commented 2 years ago

I just ran into this. It doesn't make sense that the service doing the scraping gets to rename the service if it is already set. Especially if the endpoint is an aggregation of multiple services. My specific use case is using the Span Metrics Processor to set up timing for multiple services. The path recommended to use for rate-limiting makes this bug appear.

Yes, I also feel that not all metrics need the servce.name attribute, so can the prometheusreceiver consider not setting it?

crobertson-conga commented 2 years ago

Note, I did use the method that Aneurysm9 suggested (hooking up the pipelines via otlp) and that seems to work rather well for my use case.

CoderPoet commented 2 years ago

Note, I did use the method that Aneurysm9 suggested (hooking up the pipelines via otlp) and that seems to work rather well for my use case.

Yes, using OTLP is not a problem and meets expectations, but we also have to consider ways to be compatible with prometheusreceiver

dashpole commented 2 years ago

Sorry, just finding this. After https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/8265, we will support receiving resource information from the target info metric. It should be possible after that to choose either to keep service.name (from job), or to upsert it from the service_name resource attribute (which was set in the client lib) by using the resource processor.

github-actions[bot] commented 2 years ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

alexrudd commented 1 year ago

For anyone else who comes across this. Here's the processor config I used to correct my service_name label as it had been combining it with the prometheus receiver job_name.

processors:
  resource:
    attributes:
    - key: service.name
      from_attribute: service_name
      action: upsert
    - key: service_name
      action: delete

Though annoyingly this will delete the service_name label from the target_info metric. If anyone knows how to have this processor only apply to metrics other than target_info please share.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 1 year ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.