open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.07k stars 2.37k forks source link

deltatocumulative: Number of buckets in exponential histograms should be capped #33277

Open euroelessar opened 5 months ago

euroelessar commented 5 months ago

Component(s)

processor/deltatocumulative

What happened?

Description

Currently size of an individual histogram is unbounded and can grow until OOM is reached. This is prominent if an application has large distribution of data overall but only relatively small in an individual delta datapoint. In this case cumulative aggregation keeps the scale but keeps growing number of buckets to fit all the data.

Steps to Reproduce

Send two histogram datapoints with the same scale but drastically different offsets.

Expected Result

Cumulative exponential histogram is downscaled to keep around ≈160 buckets.

Actual Result

Exponential histogram grows number of buckets instead, leading to OOM.

Collector version

v0.101.0

Environment information

Environment

OS: Linux Compiler(if manually compiled): go 1.22.2

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

github-actions[bot] commented 5 months ago

Pinging code owners:

github-actions[bot] commented 3 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

sirianni commented 3 months ago

Hitting this issue as well image

edma2 commented 2 months ago

https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34157 should address it, we've been using it in production for over a month. Can someone please review?

github-actions[bot] commented 1 week ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

edma2 commented 6 days ago

This problem is still relevant and we would not be able to use deltatocumulative in production without https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34157.

@sh0rez @RichieSams @jpkrohling please review 😄