open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

Prometheus exporter does not convert time units to seconds #18903

Open jonatan-ivanov opened 1 year ago

jonatan-ivanov commented 1 year ago

Component(s)

exporter/prometheus

What happened?

Description

Prometheus uses seconds as time unit by default. If I send an OTLP histogram with a different time unit, the value will not be converted to seconds (as it should be) but will be used as-is.

Steps to Reproduce

Send a histogram with unit: "milliseconds" to the OTel collector where the receiver is otlp/http/protobuf (but I think any otlp receiver should produce the same result) and the exporter is prometheus. Then check the Prometheus /metrics endpoint. E.g.:

metrics {
  name: "test.timer"
  unit: "milliseconds"
  histogram {
    data_points {
      start_time_unix_nano: 1677210838494000000
      time_unix_nano: 1677210839021000000
      count: 1
      sum: 123.0
    }
    aggregation_temporality: AGGREGATION_TEMPORALITY_CUMULATIVE
  }
}

Expected Result

test_timer_sum{...} 0.123

Actual Result

test_timer_sum{...} 123

Collector version

otel/opentelemetry-collector-contrib:cdf47846a7ff

Environment information

Environment

OS: MacOS 13.2.1

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:

exporters:
  prometheus:
    endpoint: '0.0.0.0:9090'
    metric_expiration: 1m
    enable_open_metrics: true
    resource_to_telemetry_conversion:
      enabled: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus]

Log output

No response

Additional context

No response

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Aneurysm9 commented 1 year ago

The specification requires that the unit be handled as follows:

The Unit of an OTLP metric point SHOULD be converted to the equivalent unit in Prometheus when possible. This includes:

  • Converting from abbreviations to full words (e.g. "ms" to "milliseconds").
  • Dropping the portions of the Unit within brackets (e.g. {packets}). Brackets MUST NOT be included in the resulting unit. A "count of foo" is considered unitless in Prometheus.
  • Special case: Converting "1" to "ratio".
  • Converting "foo/bar" to "foo_per_bar".

The resulting unit SHOULD be added to the metric as OpenMetrics UNIT metadata and as a suffix to the metric name unless the metric name already contains the unit, or the unit MUST be omitted. The unit suffix comes before any type-specific suffixes.

That does not include changing the unit to a different unit or modifying the value in any way.

jonatan-ivanov commented 1 year ago

Since the Prometheus exporter is the concern of the collector, I think the client should never know that the data that it published in OTLP format will be converted to Prometheus format. Because of this, I think any unit that is supported by OTLP should work and the client should not care. Maybe the Prometheus exporter is not configured right now but it will be starting from tomorrow. I think making a change on the exporters should not involve changing all the clients.

Can this behavior lead to impossible scenarios?

shakuzen commented 1 year ago

The resulting unit SHOULD be added to the metric as OpenMetrics UNIT metadata and as a suffix to the metric name unless the metric name already contains the unit

The bold part is not happening. The included actual result above shows the metric name from the Prometheus exporter is test_timer_sum - no unit in the name. A consumer of the exporter in Prometheus format has no way to know what the unit is. I can't speak to whether the former part is happening or not because I was never able to get the Prometheus exporter to return OpenMetrics format, even when setting enable_open_metrics: true. Regardless, the UNIT metadata is not part of Prometheus format, so it wouldn't help consumers of the Prometheus exporter that are scraping Prometheus format rather than OpenMetrics format.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jonatan-ivanov commented 1 year ago

@Aneurysm9 Could you please check the last two comments and mark this issue so that it won't be auto-closed?

gouthamve commented 1 year ago

The bold part is not happening. The included actual result above shows the metric name from the Prometheus exporter is test_timer_sum - no unit in the name.

This is now happening in the latest releases (since https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/20519).

Regarding converting milliseconds to seconds, while this is possible in fixed-bucket histograms, it is not possible to do in exponential histograms. This was one of the main motivations to adopt seconds as the default unit for HTTP (and hopefully other) duration measurements in OTel Semantic Conventions. Ideally the producer will send seconds (as defined in the semantic conventions).

I don't think converting milliseconds to seconds is appropriate in fixed-bucket histograms while not converting in exponential histograms.

shakuzen commented 1 year ago

This is now happening in the latest releases (since https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/20519).

Yes, we noticed in Micrometer when it broke our integration tests: https://github.com/micrometer-metrics/micrometer/issues/3796.

This was one of the main motivations to adopt seconds as the default unit for HTTP (and hopefully other) duration measurements in OTel Semantic Conventions. Ideally the producer will send seconds (as defined in the semantic conventions).

I think this is tying things together that shouldn't be tied together. OTLP is a format for telemetry data; it defines the data model but not the semantic naming. Someone should not have to use the OTel semantic convention to successfully use OTLP or the OTel Collector. I understand all of these things are branded OpenTelemetry, but it would behoove adoption and usefulness to users if they could be used separately. And it was my understanding they were intended to be usable without using everything.

It hurts the Collector's general usefulness if the Prometheus exporter expects the input is already in seconds so it matches data produced specifically for Prometheus/OpenMetrics. If the producer is a Prometheus client, it's clear what conventions it should follow as far as unit, but not all producers know where data they are producing will be stored, especially if it is in OTLP format (and sent to the Collector) that is supported by different backends.

Regarding converting milliseconds to seconds, while this is possible in fixed-bucket histograms, it is not possible to do in exponential histograms.

That's unfortunate and I don't have any solution. It feels like it leaves us in this bad state where the Collector can't deliver its full potential of being a universal adapter. Users are going to have to make more breaking changes to align with its limitations.

jonatan-ivanov commented 1 year ago

Fyi: it seems that starting from 0.80.0 the unit was removed (brakes our integration tests again): https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/23229

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jonatan-ivanov commented 1 year ago

@Aneurysm9 Could you please add the never stale label on the issue so that I don't need to play ping-pong with the bot?

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jonatan-ivanov commented 1 year ago

@Aneurysm9 or someone else: Could you please add the never stale label on the issue so that I don't need to play ping-pong with the bot?