open-telemetry / opentelemetry-go

OpenTelemetry Go API and SDK
https://opentelemetry.io/docs/languages/go
Apache License 2.0
5.2k stars 1.05k forks source link

Metric benchmark investigation #5544

Closed dashpole closed 3 months ago

dashpole commented 3 months ago

Looking into https://github.com/open-telemetry/opentelemetry-go/issues/5542. Closing, as this is not meant to be merged.

My original local benchmark:

$ go test -benchmem -benchtime=2s -bench=Bench .
goos: linux
goarch: amd64
pkg: go.opentelemetry.io/opentelemetry-go/sdk/benchmark
cpu: AMD EPYC 7B12
BenchmarkPrometheusCounter-24           762422826            3.154 ns/op           0 B/op          0 allocs/op
BenchmarkOTELCounter-24                 21401206           113.2 ns/op         0 B/op          0 allocs/op
BenchmarkOTELCounterWithLabel-24         8772984           271.5 ns/op        16 B/op          1 allocs/op

Exemplar collection accounts for ~50ns of the overhead (https://github.com/open-telemetry/opentelemetry-go/commit/a1bead9640dbcc045de3df8b1c6c05e1f08eba28), even though I believe we shouldn't be collecting exemplars by default, and we aren't doing tracing. This is probably a good area to optimize.

BenchmarkOTELCounter-24                 42491396            56.06 ns/op        0 B/op          0 allocs/op
BenchmarkOTELCounterWithLabel-24        10961318           220.1 ns/op        16 B/op          1 allocs/op

Attribute cardinality limiting seems to account for a very small (~2ns) portion of the overhead.

Lookup based on the attribute set accounts for ~40ns of the overhead for the no-attributes case, and the vast majority of the overhead for the with-attributes case. OTel would need to introduce bound instruments to remove this chunk of overhead.

BenchmarkOTELCounter-24                 148504042           16.23 ns/op        0 B/op          0 allocs/op
BenchmarkOTELCounterWithLabel-24        38502283            61.73 ns/op       16 B/op          1 allocs/op

Our counter increment function (with locking) accounts for ~8ns of the overhead. We use a simple lock and increment a counter value. Prometheus appears to have implemented some optimizations for this. Benchmarks without any measurement whatsoever:

BenchmarkOTELCounter-24                 299598176            7.974 ns/op           0 B/op          0 allocs/op
BenchmarkOTELCounterWithLabel-24        43538810            55.06 ns/op       16 B/op          1 allocs/op

The remaining overhead is from the API, and from the Options pattern which requires calling NewAddConfig. This would presumably be eliminated if instruments were already bound to attributes.