grafana / alloy

OpenTelemetry Collector distribution with programmable pipelines
https://grafana.com/oss/alloy
Apache License 2.0
1.39k stars 203 forks source link

Stale time series misbehavior when converting opentelemetry metrics to prometheus #1712

Open sulfin opened 1 month ago

sulfin commented 1 month ago

What's wrong?

In a setup where we scrape prometheus metrics, convert them to opentelemetry with the otelcol.receiver.prometheus component then convert to prometheus with the otelcol.exporter.prometheus component, when a time series goes stale the value is set to 9218868437227405000 for 5 minutes before disappearing.

I already looked in the code to find the issue and I think the issue is in the file internal/component/otelcol/exporter/prometheus/internal/convert.go. In the function writeSeries, when the dataPoint don't have a value, the stale marker is set using float64(value.StaleNaN) but in other place in the code (ex: internal/component/otelcol/receiver/prometheus/internal/transaction.go, in the getOrCreateMetricFamily function) the StaleNaN value is used with math.Float64frombits(value.StaleNaN). I recompiled with math.Float64frombits(value.StaleNaN) and the problem was solved.

I don't have enough knowledge in Go and Prometheus to notice potential side effects, but I am confident it is the source of the issue.

Steps to reproduce

Create an Alloy config with this chain : prometheus.scrape -> otelcol.receiver.prometheus -> otelcol.exporter.prometheus -> prometheus.remote_write

Create a stale time series and the value should be 9218868437227405000.

System information

Linux Docker container

Software version

Grafana Alloy v1.3.1

Configuration

prometheus.scrape "demo" {
  targets = [
    {"__address__" = "stale-metric-generator:8080"},
  ]
  forward_to = [otelcol.receiver.prometheus.test.receiver]
  scrape_interval = "10s"
}

otelcol.receiver.prometheus "test" {
  output {
    metrics = [otelcol.exporter.prometheus.test.input]
  }
}

otelcol.exporter.prometheus "test" {
  forward_to = [prometheus.remote_write.test.receiver]
}

prometheus.remote_write "test" {
  endpoint {
    url = "http://mimir:9009/api/v1/push"
  }
}

Logs

No response

github-actions[bot] commented 1 week ago

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!