kumahq / kuma

🐻 The multi-zone service mesh for containers, Kubernetes and VMs. Built with Envoy. CNCF Sandbox Project.
https://kuma.io/install
Apache License 2.0
3.6k stars 332 forks source link

[MeshMetric] Broken prometheus histograms #11039

Open tolwi opened 1 month ago

tolwi commented 1 month ago

What happened?

Policy

apiVersion: kuma.io/v1alpha1
kind: MeshMetric
metadata:
  name: ...
  namespace: kuma-system
spec:
  targetRef:
    kind: MeshService
    name: ...
  default:
    sidecar:
      profiles:
        appendProfiles:
          - name: Basic
    backends:
      - type: Prometheus
        prometheus:
          port: 5670
          path: /metrics

After digging for a while, I believe I've found the root cause: there is an inconsistency in histogram types between Prometheus and Otel. Envoy exposes a 'cumulative' histogram, which is then fetched into an Otel histogram, considered 'native'. Therefore, when this histogram is exported to Prometheus, it should be converted back into a 'cumulative' format."(here)

tolwi commented 1 month ago
how to reproduce ```golang package metrics import ( "context" "github.com/prometheus/client_golang/prometheus/promhttp" "github.com/stretchr/testify/assert" otel_prom "go.opentelemetry.io/otel/exporters/prometheus" sdkmetric "go.opentelemetry.io/otel/sdk/metric" "go.opentelemetry.io/otel/sdk/metric/metricdata" "net/http/httptest" "strings" "testing" "time" ) const InHistogram = ` # TYPE envoy_cluster_upstream_cx_connect_ms histogram envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="0.5"} 0 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="1"} 0 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="5"} 0 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="10"} 0 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="25"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="50"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="100"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="250"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="500"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="1000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="2500"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="5000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="10000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="30000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="60000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="300000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="600000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="1800000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="3600000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="+Inf"} 1 envoy_cluster_upstream_cx_connect_ms_sum{envoy_cluster_name="ads_cluster"} 20.5 envoy_cluster_upstream_cx_connect_ms_count{envoy_cluster_name="ads_cluster"} 1 ` type testProducer struct { metrics metricdata.ScopeMetrics } func (p *testProducer) Produce(ctx context.Context) ([]metricdata.ScopeMetrics, error) { return []metricdata.ScopeMetrics{p.metrics}, nil } func Test(t *testing.T) { metrics, err := AggregatedOtelMutator()(strings.NewReader(InHistogram)) assert.NoError(t, err) p := &testProducer{ metrics: metricdata.ScopeMetrics{ Metrics: FromPrometheusMetrics(metrics, "", "", "", make(map[string]string), time.Now()), }, } promExporter, err := otel_prom.New(otel_prom.WithProducer(p)) assert.NoError(t, err) sdkmetric.NewMeterProvider(sdkmetric.WithReader(promExporter)) handler := promhttp.Handler() recorder := httptest.NewRecorder() handler.ServeHTTP(recorder, httptest.NewRequest("GET", "/metrics", nil)) println(string(recorder.Body.Bytes())) } ```