Open etep opened 2 months ago
This is a very important capability, but not documented well. I was planning to defer this till 1.0 (strictly to keep scope in control!), so didn't spent much time on optimizing it either!
We can probably add support for picking Explicit vs Exponential using env variable, as I am unsure if we can keep the View mechanism to change aggregation. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk_exporters/otlp.md#additional-environment-variable-configuration
If you have bandwidth to help, appreciate it. I'll keep the issue open to track this for stable release.
@cijothomas -
Thank you for the note. Not sure how much bandwidth I have to pitch in, but there's a tentative interest on my side.
My experience & the docs.: First the story, why I'm here - I wanted to setup OTel metrics for a Rust project I work on. At the moment, I don't want to learn about the full scope of OTel, I just want (you know) an exponential histogram; data getting sent out, say, to a collector, then onward to a cloud service provider.
Unfortunately, the minimal setup for this is not well documented, and I haven't found (yet) any example projects to look at. Maybe there's something useful that I haven't found yet!
Thoughts on the impl.
The project I'm working on claims to be very sensitive to perf. Regardless of the overall contribution to CPU time, we want all the code paths to be efficient. Exponential histogram has some good things for our use case, but, the impl. seems to expand the histogram by copying as the buckets start to fill in - see, e.g., this in the record
method.
I'm curious to know if there is appetite for enabling a pre-allocated bucket structure or any other ideas here, e.g. append all samples to a vector and then allocate the buckets using min/max of that when the data is ready to be flushed. I'm sure there are other good ideas to be had around this topic - not sure if it will fit in with the project's needs, APIs, etc.
Thank you!
Unfortunately, the minimal setup for this is not well documented
You are right. The main reason is - Exponential Histogram is a very new thing in OTel world. It is not yet supported by all vendors, and the mapping to Prometheus is still experimental. In other words, using ExplicitBuckets is still the main scenario with Histograms today. Once ExponentialHistogram is more mainstream, it should get better treatment in docs/examples.
claims to be very sensitive to perf.
It is always a pleasure to hear such statements! We did a lot of work to optimize performance for Metrics aggregation recently, but Exponential Histograms were left out from that (to keep initial scope for 1.0 under control), but there are some active work going on now to make ExponentialHistograms also to get similar perf gains as other aggregations.
And yes, we welcome ideas to improve the performance of Histogram aggregations, including the idea of pre-allocating as much as possible! Would appreciate if you can open a separate issue with the suggestions, to better track them.
Describe the solution you'd like:
The examples include explicit bucket histogram but not exponential histogram. It's not clear if one needs to create a metric and then an aggregation, or if directly creating the aggregation is feasible. @jtescher