kumahq / kuma

🐻 The multi-zone service mesh for containers, Kubernetes and VMs. Built with Envoy. CNCF Sandbox Project.
https://kuma.io/install
Apache License 2.0
3.67k stars 333 forks source link

[MeshMetric] Broken prometheus histograms #11039

Open tolwi opened 3 months ago

tolwi commented 3 months ago

What happened?

Policy

apiVersion: kuma.io/v1alpha1
kind: MeshMetric
metadata:
  name: ...
  namespace: kuma-system
spec:
  targetRef:
    kind: MeshService
    name: ...
  default:
    sidecar:
      profiles:
        appendProfiles:
          - name: Basic
    backends:
      - type: Prometheus
        prometheus:
          port: 5670
          path: /metrics

After digging for a while, I believe I've found the root cause: there is an inconsistency in histogram types between Prometheus and Otel. Envoy exposes a 'cumulative' histogram, which is then fetched into an Otel histogram, considered 'native'. Therefore, when this histogram is exported to Prometheus, it should be converted back into a 'cumulative' format."(here)

tolwi commented 3 months ago
how to reproduce ```golang package metrics import ( "context" "github.com/prometheus/client_golang/prometheus/promhttp" "github.com/stretchr/testify/assert" otel_prom "go.opentelemetry.io/otel/exporters/prometheus" sdkmetric "go.opentelemetry.io/otel/sdk/metric" "go.opentelemetry.io/otel/sdk/metric/metricdata" "net/http/httptest" "strings" "testing" "time" ) const InHistogram = ` # TYPE envoy_cluster_upstream_cx_connect_ms histogram envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="0.5"} 0 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="1"} 0 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="5"} 0 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="10"} 0 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="25"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="50"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="100"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="250"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="500"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="1000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="2500"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="5000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="10000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="30000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="60000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="300000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="600000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="1800000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="3600000"} 1 envoy_cluster_upstream_cx_connect_ms_bucket{envoy_cluster_name="ads_cluster",le="+Inf"} 1 envoy_cluster_upstream_cx_connect_ms_sum{envoy_cluster_name="ads_cluster"} 20.5 envoy_cluster_upstream_cx_connect_ms_count{envoy_cluster_name="ads_cluster"} 1 ` type testProducer struct { metrics metricdata.ScopeMetrics } func (p *testProducer) Produce(ctx context.Context) ([]metricdata.ScopeMetrics, error) { return []metricdata.ScopeMetrics{p.metrics}, nil } func Test(t *testing.T) { metrics, err := AggregatedOtelMutator()(strings.NewReader(InHistogram)) assert.NoError(t, err) p := &testProducer{ metrics: metricdata.ScopeMetrics{ Metrics: FromPrometheusMetrics(metrics, "", "", "", make(map[string]string), time.Now()), }, } promExporter, err := otel_prom.New(otel_prom.WithProducer(p)) assert.NoError(t, err) sdkmetric.NewMeterProvider(sdkmetric.WithReader(promExporter)) handler := promhttp.Handler() recorder := httptest.NewRecorder() handler.ServeHTTP(recorder, httptest.NewRequest("GET", "/metrics", nil)) println(string(recorder.Body.Bytes())) } ```
github-actions[bot] commented 1 week ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.