open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
337 stars 164 forks source link

OTEP 149: Base-2 exponential histogram mapping reference implementation #179

Closed jmacd closed 3 years ago

jmacd commented 3 years ago

Adds two implementations of the base-2 exponential histogram mapping from the review thread of https://github.com/open-telemetry/opentelemetry-proto/pull/322, both for posterity and to record a program that generates the table of constants used in the LookupTable implementation.

goos: darwin
goarch: amd64
pkg: github.com/open-telemetry/oteps/text/metrics/0149
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkHistogram/mapping_lookup_3-12          100000000           10.40 ns/op
BenchmarkHistogram/boundary_lookup_3-12         80854910            14.96 ns/op
BenchmarkHistogram/mapping_logarithm_3-12       60637861            20.04 ns/op
BenchmarkHistogram/boundary_logarithm_3-12      137750319            8.696 ns/op
BenchmarkHistogram/mapping_lookup_10-12         100000000           10.30 ns/op
BenchmarkHistogram/boundary_lookup_10-12        80481817            14.98 ns/op
BenchmarkHistogram/mapping_logarithm_10-12      60081852            19.97 ns/op
BenchmarkHistogram/boundary_logarithm_10-12     138988699            8.708 ns/op

cc: @yzhuge @oertl

yzhuge commented 3 years ago

NrSketch lookup method is slightly different. Its indices array has 2x entries as the boundary array. Because of the denser indices array, it only need to do one correction on "rough". See code at https://github.com/newrelic-experimental/newrelic-sketch-java/blob/main/src/main/java/com/newrelic/nrsketch/indexer/SubBucketLookupIndexer.java#L106 and doc at https://github.com/newrelic-experimental/newrelic-sketch-java/blob/main/Indexer.md#how-the-lookup-indexer-works

In the "go" code context like https://github.com/open-telemetry/oteps/pull/179/files#diff-5d6d040c9f07b433ed472d22341f4f4b598113cd3ee2d08b3c21d655a8e1fb61R132, it would need to only check "mantissa >= lt.boundaries[rough+1]". It does not need to look at [rough+2].

This is a trade off of cpu vs memory. NrSketch saves cpu time, at the cost of memory.

oertl commented 3 years ago

@yzhuge

This is a trade off of cpu vs memory. NrSketch saves cpu time, at the cost of memory.

I am not sure if the NrSketch approach is really signifcantly faster. Since the accessed elements are likely in the same cache line the extra array lookup will be very cheap. Furthermore, I expect that modern compilers parallelize the comparisons using SIMD instructions and remove the branching. It might even be that the NrSketch is slower as the lookup array is larger, which increases the chance of cache misses.

jmacd commented 3 years ago

Both of your methods are blazingly fast--even the method based on logarithm() is relatively speedy. I clock @oertl's method at 10ns and the logarithm method at 20ns. We have diminishing returns here, it's a good thing!

reyang commented 3 years ago

I'm not good at Golang so I didn't look into the code, the description in README.md looks good to me (with some minor comments).

yzhuge commented 3 years ago

To monitor events 1ms or longer (like web server request), even the log method is fine. 1ms is 1M ns. So an extra 10ns from the log method is so tiny. Adding an actual histogram around the mapping method, it will cost around 20ns per insert (lookup method), and 30ns per insert (add "synchronized" on Java functions for concurrent access). With log method, these would be around 30 and 40ns per insert, still small for 1ms events. Benchmark data from NrSketch.

I think the majority of OTel applications are in the 1ms or longer event range (application monitoring). At 1 microsecond per event, then 30ns monitoring cost (3% of 1microsecond) will begin to matter. These will be less common use cases. One thing I can think of is tracking per message send/recv time in a network router. At those level, we are closer to hardware and custom (possibly hardware assisted) monitoring will likely be needed.

jmacd commented 3 years ago

I've posted a link to the output of generate 16 i.e. a large precomputed lookup table that could be useful for validation. https://gist.github.com/jmacd/64031dc003410db6eb3ae04a1db286fd

jmacd commented 3 years ago

I will re-open this when it is ready following the feedback. I plan to move the code to the otel-go-contrib repo and incorporate copies of it in some new text near the end of OTEP 149.

jmacd commented 3 years ago

FYI I moved the contents of this OTEP into https://github.com/open-telemetry/opentelemetry-specification/pull/1935