zio / zio-metrics-connectors

Monitoring, Metrics and Diagnostics for ZIO
https://zio.dev/zio-metrics-connectors
Apache License 2.0
30 stars 24 forks source link

#34 Adhere to Prometheus Exposition format #38

Closed tup916 closed 1 year ago

tup916 commented 1 year ago

Closes #34

TLDR: PrometheusEncoder produces JVM metrics that violate Prometheus's specified exposition format. Furthermore, it produces parsing errors when linked with telegraf. The solution is to group all the metrics of the same type together so that there is exactly ONE TYPE and HELP line in order to adhere to Prometheus's specified exposition format.

Problem

Under the Prometheus Exposition Formats, "Only one HELP line and only one TYPE line may exist for any given metric name" and "The TYPE line for a metric name must appear before the first sample is reported for that metric name". The current implementation violates this specification, and downstream metric collecting agents such as Telegraf depend on this specification to parse metrics, which is currently failing with parsing errors. There are plenty of reproducable violating examples if we include DefaultJvmMetrics to our metrics.

Solution

We need to group together the metrics that have the same TYPE line but may have different labels. For each group of metrics with the same TYPE, we need one TYPE line, followed by one HELP line, followed by the numerous metrics of the given type.

As an example, prior to this change, the output of DefaultJvmMetrics encoded the jvm memory committed bytes as:

# TYPE jvm_memory_bytes_committed gauge
# HELP jvm_memory_bytes_committed
jvm_memory_bytes_committed{area="heap",} 1.073741824E9 1681853772729
...
... other metrics
...
# TYPE jvm_memory_bytes_committed gauge
# HELP jvm_memory_bytes_committed
jvm_memory_bytes_committed{area="nonheap",} 3.06315264E8 1681853772729

This violates the Prometheus specification as described above.

With the changes in this pull request, the output looks like:

# TYPE jvm_memory_bytes_committed gauge
# HELP jvm_memory_bytes_committed
jvm_memory_bytes_committed{area="heap",} 1.073741824E9 1681853772729
jvm_memory_bytes_committed{area="nonheap",} 3.06315264E8 1681853772729

Suggested improvements

I noticed that I am introducing some magic numbers (thm.take(2)) coming out of the assumption that each Chunk of String encoded will start with TYPE and HELP strings. To make the code more readable, we could introduce two case classes, case class EncodedHead(typeComment: String, helpComment: String) case class EncodedMetric(encodedHead: EncodedHead, metrics: Chunk[String]) and change the PrometheusEncoder.encode(..) method to return EncodedMetric instead of Chunk[String]. This will allow us to make the groupMetricByType method cleaner to read:

  def groupMetricByType(report: Chunk[EncodedMetric]): Chunk[Chunk[String]] = {
    Chunk.fromIterable(report.groupMap(thm => thm.encodedHead)(thm => thm.metrics).map {
      case (EncodedHead(t, h), m) => Chunk(t, h) ++ m.flatten
    })
  }

However, I am unsure about the performance tradeoffs with creating instances of the two introduced classes for each metric, so I am leaving it as a suggestion. I'd be happy to include it in if deemed viable.

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.