zio / zio-metrics-connectors

Monitoring, Metrics and Diagnostics for ZIO
https://zio.dev/zio-metrics-connectors
Apache License 2.0
30 stars 24 forks source link

ZIO Metrics Micrometer histogram issue #64

Open calvinlfer opened 1 month ago

calvinlfer commented 1 month ago

I’ve found an issue for histograms where the numbers in the buckets are not getting populated properly - the wrong numbers are showing up. This happens for the zio micrometer integration which is then hooked up to Prometheus. This is not an issue for the direct Prometheus connector.

val histogramMetric: Metric.Histogram[Double] =
    Metric.histogram("histogramMetricReproducer", MetricKeyType.Histogram.Boundaries(Chunk(0.10, 1.0, 5.0, 10.0)))

  val exampleProgram =
    for {
      _ <- histogramMetric.update(0.04)
      _ <- histogramMetric.update(0.5)
      _ <- histogramMetric.update(0.5)
      _ <- histogramMetric.update(6.0)
    } yield ()

If you run this in SampleMicrometerApp and look at the metrics on http://localhost:8080/micrometer/prometheusMetrics You end up with

# HELP histogramMetricReproducer  
# TYPE histogramMetricReproducer histogram
histogramMetricReproducer_bucket{le="0.1"} 0 --> the value 0.04 is <= 0.1 but not showing up here
histogramMetricReproducer_bucket{le="1.0"} 3
histogramMetricReproducer_bucket{le="5.0"} 3
histogramMetricReproducer_bucket{le="10.0"} 4
histogramMetricReproducer_bucket{le="+Inf"} 4
histogramMetricReproducer_count 4
histogramMetricReproducer_sum 7.04
# HELP histogramMetricReproducer_max  
# TYPE histogramMetricReproducer_max gauge
histogramMetricReproducer_max 6.0

which is super odd, it did not record 0.04 in the <0.1 bucket :S

Running this directly against the prometheus connector with the same code produces the correct results:

histogramMetricReproducer_bucket{le="0.1",} 1.0 1725633095942
histogramMetricReproducer_bucket{le="1.0",} 3.0 1725633095942
histogramMetricReproducer_bucket{le="5.0",} 3.0 1725633095942
histogramMetricReproducer_bucket{le="10.0",} 4.0 1725633095942
histogramMetricReproducer_bucket{le="+Inf",} 4.0 1725633095942

histogramMetricReproducer_sum 7.04 1725633095942
histogramMetricReproducer_count 4.0 1725633095942
histogramMetricReproducer_min 0.04 1725633095942
histogramMetricReproducer_max 6.0 1725633095942

i found something very interesting:

  val histogramMetric: Metric.Histogram[Double] =
    Metric.histogram("histogramMetricReproducer", MetricKeyType.Histogram.Boundaries(Chunk(1.0, 10.0, 20.0, 100.0)))

  val exampleProgram =
    for {
      _ <- histogramMetric.update(1)
      _ <- histogramMetric.update(8)
      _ <- histogramMetric.update(15)
      _ <- histogramMetric.update(90)
    } yield ()

If you increase it past 1, then it captures correctly via micrometer:

# HELP histogramMetricReproducer  
# TYPE histogramMetricReproducer histogram
histogramMetricReproducer_bucket{le="1.0"} 1
histogramMetricReproducer_bucket{le="10.0"} 2
histogramMetricReproducer_bucket{le="20.0"} 3
histogramMetricReproducer_bucket{le="100.0"} 4
histogramMetricReproducer_bucket{le="+Inf"} 4
histogramMetricReproducer_count 4
histogramMetricReproducer_sum 114.0
# HELP histogramMetricReproducer_max  
# TYPE histogramMetricReproducer_max gauge
histogramMetricReproducer_max 90.0 

This is problematic because zio-http and tapir integrations use seconds as their histogram boundaries for ZIO Metrics histograms which result in small numbers and will cause metric latencies not to appear properly if you use the micrometer integration.

I tried to dig deeper to see what the cause is and I believe the problem resides with Micrometer

test("catch zio histogram updates") {
    val name     = "testHistogram"
    val testData = Seq(0.04, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5)
    val buckets  = MetricKeyType.Histogram.Boundaries(Chunk(0.10, 1.0, 3.0, 5.0))

    testMetric(MetricKey.histogram(name, buckets))(testData).map { searchResult =>
      val summary                     = searchResult.summary()
      val snapshot: HistogramSnapshot = summary.takeSnapshot()
      println("ACTUAL: " + snapshot.histogramCounts().toList)
      println(
        "EXPECTED: " + buckets.values
          .map(bound => new CountAtBucket(bound, testData.count(_ <= bound).toDouble))
          .toList,
      )
      assertTrue(
        summary.getId.getBaseUnit eq null,
        snapshot.count() == testData.size.toLong,
        snapshot.total() == testData.sum,
        snapshot.max() == testData.max,
        snapshot.histogramCounts().toList == buckets.values
          .map(bound => new CountAtBucket(bound, testData.count(_ <= bound).toDouble))
          .toList,
      )
    }
  }

I also added some debug statements here to print what is being fed into updateHistogram and it appears the correct values are being fed in https://github.com/zio/zio-metrics-connectors/blob/53965271606b09a003d58f90db9227a934d1d20b/micrometer/src/main/scala/zio/metrics/connectors/micrometer/MicrometerMetricListener.scala#L40

calvinlfer commented 1 month ago

@Grryum is kindly following up here https://github.com/micrometer-metrics/micrometer/issues/2592