open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

[Prometheus Remote Write Exporter] Forward metrics labels #10115

Closed clouedoc closed 2 years ago

clouedoc commented 2 years ago

Is your feature request related to a problem? Please describe. I want to be able to cluster my metrics by host in Prometheus. On Prometheus' side, I do not get a host label to select:

image

I only get a job tag.

Previously, by using Datadog and the otlp exporter, I could aggregate metrics by host name, deployment version, etc. I believe that OTLP tags are not getting forwarded because of the Prometheus Remote Write exporter.

Describe the solution you'd like I want to get the following labels to be forwarded to Prometheus:

These will allow me to see if a specific version of my program is using more memory, and when.

Describe alternatives you've considered

Additional context

Here is my configuration:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:8080
      grpc:
        endpoint: 0.0.0.0:4040

exporters:
  logging:

  # Data sources: traces, metrics
  # On-premise endpoint
  otlphttp:
    endpoint: XXX

  otlp/grafana-cloud-tempo:
    endpoint: tempo-us-central1.grafana.net:443
    headers:
      authorization: XXX
  prometheusremotewrite/grafana-cloud:
    endpoint: https://prometheus-prod-10-prod-us-central-0.grafana.net/api/prom/push
    headers:
      authorization: XXX
processors:
  batch:
    timeout: 10s

extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679

service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, otlphttp, otlp/grafana-cloud-tempo]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, otlphttp, prometheusremotewrite/grafana-cloud]
dmitryax commented 2 years ago

cc @Aneurysm9 as code owner

clouedoc commented 2 years ago

I managed to solve my specific issue by using a prometheus exporter and starting up a sibling Prometheus instance that scrapes opentelemetry-collector and writes to Grafana Cloud. Note: I had to activate an option in the prometheus exporter to convert resource labels into Prometheus tags.

dmitryax commented 2 years ago

@clouedoc is this still an issue with Prometheus Remote Write Exporter or we can close it?

clouedoc commented 2 years ago

@dmitryax This is still an issue with Prometheus Remote Write Exporter that should be addressed, at least in the docs. It would be interesting to have @Aneurysm9's take on it. You can close if it clutters the tracker; I consider my job done as long as future explorers can find this thread 😁

gouthamve commented 2 years ago

Hi, this is now done using the target_info metric: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#resource-attributes-1

Please let us know if target_info is not available or doesn't work for your use-case.

jack78901 commented 2 years ago

Could this be an issue where there is an undocumented feature of the Prometheus Remote Write exporter (through the exporter helper) that does the same thing as the Prometheus Exporter?

I recently ran into the same issue where I needed certain labels that were appearing as resource attributes but were not getting properly added to the actual metrics as data point attributes (read that as Prometheus labels).

I was able to add the same thing as @clouedoc did for the Prometheus Exporter right on the Prometheus Remote Writer. Namely:

    exporters:
      prometheusremotewrite:
        endpoint: "https://example.com"
        resource_to_telemetry_conversion:
          enabled: true

The problem with the Target_info metric is it does not actually associate with metrics (such as system_cpu_utilization from the hostmetrics receiver) in any way, which makes it impossible to see how the CPU is doing for a particular host.

clouedoc commented 2 years ago

@jack78901 really interesting finding; this removes the need for a Prometheus intermediate server altogether. Thank you for reporting it.

@gouthamve I have to admit that I have a hard time understanding what target_info is from a first read. I built my alerting system on the configuration I mentioned in my earlier comments, so I do not want to break it unnecessarily, but I'm interested in how you would approach this problem with target_info. My use case also involves collecting CPU metrics so I'm not sure if this would solve it. Thank you for bringing this property to my attention

dmitryax commented 2 years ago

@clouedoc @gouthamve do we still need any doc updates to highlight this config option or the ticket can be closed?

clouedoc commented 2 years ago

Hello @dmitryax, I proposed a doc update via #11860.

dmitryax commented 2 years ago

Thanks @clouedoc