open-telemetry / opentelemetry-go

OpenTelemetry Go API and SDK
https://opentelemetry.io/
Apache License 2.0
4.99k stars 1.01k forks source link

Performance vs. Prometheus SDK #5542

Open yurishkuro opened 1 week ago

yurishkuro commented 1 week ago

Description

Jaeger is in the process of migrating away from Prometheus SDK towards OTEL SDK. We're currently blocked by a massive performance degradation, as illustrated by this benchmark https://github.com/jaegertracing/jaeger/pull/5676. Are we not using OTEL SDK correctly? We're seeing 10-25x slowdown compared to Prometheus SDK.

$ go test -benchmem -benchtime=2s -bench=Benchmark ./internal/metrics/
BenchmarkPrometheusCounter-10           342003924            6.984 ns/op           0 B/op          0 allocs/op
BenchmarkOTELCounter-10                 33299455            71.73 ns/op        0 B/op          0 allocs/op
BenchmarkOTELCounterWithLabel-10        12442818           190.6 ns/op        16 B/op          1 allocs/op

Environment

Steps To Reproduce

https://github.com/jaegertracing/jaeger/pull/5676

Expected behavior

Expecting to see counter bumps to be in the ballpark with Prometheus counters.

pellared commented 1 week ago

For me it looks like a correct usage of OTel Metrics SDK.

dashpole commented 1 week ago

To summarize my findings in https://github.com/open-telemetry/opentelemetry-go/pull/5544.

yurishkuro commented 1 week ago

Bound instruments OTEP https://github.com/open-telemetry/oteps/blob/main/text/metrics/0070-metric-bound-instrument.md

dashpole commented 1 week ago

We should take a close look at the exemplar reservoir performance when exemplars are disabled. It currently makes up a substantial (~50%) portion of the overhead for the no-attributes case.

This appears to be because of the time.Now() call for each measurement. We should at least consider moving the time.Now call into the exemplar reservoir so that it is only invoked when we are actually recording an exemplar.

dashpole commented 1 week ago

I also found that the benchmark did not change if I swapped out the OTel prometheus exporter with a manual reader (which is expected). I'm removing the prometheus exporter label.

dashpole commented 1 week ago

https://github.com/open-telemetry/opentelemetry-go/pull/5545 is a ~45% performance improvement for the zero-attributes case, and a ~20% performance improvement for the single-attribute case.