open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.88k stars 2.25k forks source link

Refactor tail-sampling processor - Decision cache #31583

Open jpkrohling opened 6 months ago

jpkrohling commented 6 months ago

The decision cache can be a regular cached, where the key is the trace ID and the value is the decision. We can use a ring buffer to limit the number of entries in this cache, but this can be an implementation detail. Whenever a new span arrives, we'd run it through the cache and immediately release/drop based on a previous decision.

github-actions[bot] commented 6 months ago

Pinging code owners for processor/tailsampling: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

kentquirk commented 6 months ago

Drop decisions only need a single bit -- "we dropped this", so there's benefit to splitting the decision cache into a Bloom or Cuckoo filter for drops and a separate cache for keep (attaching metadata to spans like sampling probability may require the keep cache to be more than a filter).

Cache size should be configurable.

djluck commented 6 months ago

@kentquirk can you elaborate a bit more on why we'd need to capture sampling probability per-trace? My understanding is that by entering a trace id into the "sampled" cache we're stating that all spans for this trace should be sampled.

jpkrohling commented 6 months ago

@kentquirk, do you want to work on this one? I can assign this to you. I'd like to work on converting the metrics to use OTel, which could be helpful in adding observability to the cache.

kentquirk commented 6 months ago

@djluck You're correct that the probability applies to all spans in a trace, but sampling probability / adjusted count is of interest when analyzing the results. When I see a trace I want to know "how many traces like this occurred?" For a simple example, if I normally sample at 10% but keep all errors, I need to be able to multiply non-error traces by 10 in order to calculate the appropriate fraction of errors. To do that, I need to know what the sampling probability was. That data needs to be attached to the outgoing spans.

@jpkrohling You can assign it to me. It might end up getting split into a couple of pieces.

github-actions[bot] commented 3 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

jamesmoessis commented 3 months ago

Hi @kentquirk @jpkrohling this is certainly something I think this processor could use, and would help with early sampling decisions and preventing many spans from being stored in memory for longer than needed.

The discussions around bloom/cuckoo filter are definitely on the right track. Looking through Refinery's code, it uses a dual cuckoo-filter mechanism which seems to be quite memory efficient and relatively low rate of false-positives.

Another possible implementation is using an LRU cache. To make it twice as memory efficient you could also just store the right half of the trace id as a uint64.

As my team is in the midst of implementing tail sampling at quite a large scale, I'd be interested in taking a look at this in the coming weeks.

jpkrohling commented 3 months ago

I would prefer if someone from Honeycomb could implement the cuckoo filter, to prevent any misunderstandings on the code ending up similar to Refinery's, and while I thought LRU would be the optimal one except for the size, your suggestion of using only the right-half of the IDs is wonderful!

jpkrohling commented 2 months ago

For now, I'll implement the cache as a circular buffer, and we can have alternative implementations right after that.

github-actions[bot] commented 3 weeks ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.