open-telemetry / community

OpenTelemetry community content
https://opentelemetry.io
Apache License 2.0
785 stars 237 forks source link

[Donation Proposal]: OTel Exponential Histogram Golang reference implementation #1403

Closed jmacd closed 1 month ago

jmacd commented 1 year ago

Description

Lightstep wishes to donate the library developed for computing and merging OpenTelemetry exponential histograms in Golang.

Benefits to the OpenTelemetry community

This library implements the exponential histogram data structure required to support this aggregation in OpenTelemetry SDKs. It is well unit-tested and has been in production use at Lightstep for 9 months.

This library was used as the reference implementation for Python and Javascript OpenTelemetry SDKs contributed by Lightstep engineers @ocelotl and @mwear, hence the library's design has received additional testing and validation.

Reasons for donation

This library carefully implements an exacting specification, which is not trivial to do for the full range of floating point values especially at the boundaries and for subnormal numbers. The engineering time invested in this library makes it appealing to use in other parts of the OpenTelemetry ecosystem, including the logging exporter (where it prints boundaries) and the statsd receiver (where it aggregates histogram and timing observations). Any OpenTelemetry collector processor (in Golang) that wants to merge exponential histograms will want to use code that is available in this library.

This library would be useful in supporting exponential histograms for the OTel-Go metrics SDK. We believe this code does not belong in that repository but should still be made available to OpenTelemetry developers using Golang.

Repository

https://github.com/lightstep/go-expohisto

Existing usage

The statsdreceiver uses this library.

The loggingexporter does not use this library and it has bugs that have not been addressed. This PR would address the bugs by using functions from this library.

Maintenance

Maintenance will be initially by @jmacd (the original author) and @ocelotl (who has audited this code).

We do not expect much maintenance will be necessary except to add new features, where this library can continue to serve as a reference implementation. For example, work will be needed to add support for zero tolerance and other forms of limit, but those are areas that have not been specified in OTel.

Licenses

It was written with the intent of this donation, already has OpenTelemetry copyright statements in its headers.

Trademarks

n/a

Other notes

No response

trask commented 1 year ago

This library would be useful in supporting exponential histograms for the OTel-Go metrics SDK. We believe this code does not belong in that repository

can you explain just a bit why not? thx

jmacd commented 1 year ago

I actually do think this code belongs in that repository, https://github.com/open-telemetry/opentelemetry-go/issues/2501 https://github.com/open-telemetry/opentelemetry-go/issues/3027 but my proposal is unpopular with the maintainers.

My understanding is that because the library supports public APIs it creates a burden on the maintainers of that repository, whereas an internal library would allow supporting the exponential histogram without additional maintenance burden.

jmacd commented 1 year ago

Thought experiment: Couldn't the users of the dedicated exponential histogram API avoid a direct dependency on its API by making use of OTel instruments and a privately-held SDK configured just so?

The answer will be yes in some cases, but probably not yes in all cases. For the statsdreceiver use-case, I think the answer is yes -- with a lot of work -- the OTel-Go SDK could implement all the aggregation responsibilities of the statsdreceiver (including exponential histogram) and the receiver would use its Collect() API to emit pdata-formatted metrics into the OTC pipeline. This would could even help: the statsdreceiver could be put into cumulative mode easily, whereas today we rely on prometheusexporter to perform delta->cumulative translation)

Unfortunately, I think the answer is "not always".

This thought experiment would be blocked on a multi-count feature](https://github.com/open-telemetry/opentelemetry-specification/issues/3369).

This would not help with the use-case in https://github.com/open-telemetry/opentelemetry-collector/pull/5938, as an example (to correctly print an exponential histogram).

I have a feeling that a realistic metrics aggregation processor, one that could safely and correctly strip attributes from metrics data, would run into a real need to control its data structures directly. This processor will need an exponential histogram, too, to control when it merges and when it's dropped from memory, so although it would be nice to hide the expohisto inside the OTel-Go SDK, I think we need more than that.

trask commented 1 month ago

@jmacd we're doing some triage of old issues, should we close this, or do you want to keep it open? thanks

mtwo commented 1 month ago

Closing, as I don't think that this is relevant anymore (?). @jmacd let us know if this is incorrect and we'll re-open this issue.