micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.41k stars 976 forks source link

Bad DistributionSummary histogram for OpenTSDB #3866

Open Furcube opened 1 year ago

Furcube commented 1 year ago

Describe the bug OpenTSDBMeterRegistry applies TimeUnit scaling for DistributionSummary

Environment Micrometer version 1.11.0 Micrometer registry OpenTSDBMeterRegistry OS: windows, linux jdk 17.0.6

To Reproduce How to reproduce the bug: Create DistributionSummary, increment with different values, push to victoriametrics. Result: grafana can plot it using histogram_quantile, but values are much smaller than expected because following scaling being applied:

    public double bucket(TimeUnit unit) {
        return TimeUtils.nanosToUnit(bucket, unit);
    }

Expected behavior histogram_quantile produces expected quantiles for supplied data

Additional context Fixed it with this patch:

diff --git a/implementations/micrometer-registry-opentsdb/src/main/java/io/micrometer/opentsdb/OpenTSDBMeterRegistry.java b/implementations/micrometer-registry-opentsdb/src/main/java/io/micrometer/opentsdb/OpenTSDBMeterRegistry.java
--- a/implementations/micrometer-registry-opentsdb/src/main/java/io/micrometer/opentsdb/OpenTSDBMeterRegistry.java  (revision f6f8eaed9fe7ff1399964bf85b7b850ad0732187)
+++ b/implementations/micrometer-registry-opentsdb/src/main/java/io/micrometer/opentsdb/OpenTSDBMeterRegistry.java  (date 1685389707505)
@@ -201,7 +201,7 @@
         }

         if (histogramCounts.length > 0) {
-            metrics.addAll(writeHistogram(wallTime, summary, histogramCounts, count, getBaseTimeUnit()));
+            metrics.addAll(writeHistogram(wallTime, summary, histogramCounts, count, null));
         }

         return metrics.stream();
marcingrzejszczak commented 7 months ago

Sorry for the delay! Can you write a test that proves that this is not working fine? Or at least show the histogram values (the current ones vs expected ones)? Thank you.

Furcube commented 7 months ago

To reproduce without additional setup:

package io.micrometer.opentsdb;

import com.sun.net.httpserver.HttpServer;
import io.micrometer.core.instrument.Clock;
import io.micrometer.core.instrument.DistributionSummary;
import io.micrometer.core.instrument.MockClock;

import java.io.IOException;
import java.io.InputStream;
import java.net.InetSocketAddress;
import java.nio.charset.StandardCharsets;
import java.util.zip.GZIPInputStream;

class DsTest {
    public static void main(String[] args) throws IOException {

        final HttpServer server = HttpServer.create(new InetSocketAddress("localhost", 4242), 0);

        final String[] data = new String[1];

        server.createContext("/api/put", exchange -> {
            try (InputStream requestBody = exchange.getRequestBody()) {
                final String value = new String(new GZIPInputStream(requestBody).readAllBytes(), StandardCharsets.UTF_8);
                data[0] = value;
                exchange.sendResponseHeaders(200, 0);
            }
        });
        server.start();

        try {
            final OpenTSDBMeterRegistry registry = new OpenTSDBMeterRegistry(new OpenTSDBConfig() {
                @Override
                public String get(String key) {
                    return null;
                }

                @Override
                public OpenTSDBFlavor flavor() {
                    return OpenTSDBFlavor.VictoriaMetrics;
                }
            }, Clock.SYSTEM);
            final DistributionSummary ds = DistributionSummary.builder("ds").publishPercentileHistogram(true).register(registry);

            ds.record(1_000);
            ds.record(1_000);
            ds.record(2_000);

            registry.publish();

            System.out.println(data[0]);

        } finally {
            server.stop(0);
        }
    }
}

Without fix it will produce vmrange like these:

  {
    "metric": "ds_bucket",
    "timestamp": 1703938667608,
    "value": 1,
    "tags": {
      "vmrange": "1.5e-6...2.0e-6"
    }
  }

Which have values scaled down

Expected following vmrange

  {
    "metric": "ds_bucket",
    "timestamp": 1703938641939,
    "value": 1,
    "tags": {
      "vmrange": "1.5e3...2.0e3"
    }
  }

Prometheus-style is not so obvious, so i pushed it to victoriametrics then used query histogram_quantile(0.50, sum(ds_bucket) by (le))

And it produced following:

current

[
  {
    "metric": {},
    "value": [
      1703938448.183,
      "0.000001002"
    ],
    "group": 1
  }
]

expected

[
  {
    "metric": {},
    "value": [
      1703938513.557,
      "1002"
    ],
    "group": 1
  }
]