open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.9k stars 2.27k forks source link

Tail sampling processor: add a way to sample all spans that have a span link to a sampled span. #33568

Open isobelormiston opened 3 months ago

isobelormiston commented 3 months ago

Component(s)

processor/tailsampling

Is your feature request related to a problem? Please describe.

I often have spans X and Y that are part of different traces but are linked via a span link. There appears to be no way to configure the tail sampling processor such that whenever one of these spans is sampled, the other is also sampled.

Describe the solution you'd like

Suppose I have two spans, X and Y, that are part of different traces but are linked via a span link. I want to be able to set up the tail sampling processor such that whenever X is sampled, Y is also sampled, as is the whole trace that Y belongs to. e.g.

processors:
  tail_sampling:
    sample_linked_spans: true
    ...

A (possibly simpler) alternative to achieve this: we can update our code to give the two spans the same value of an attribute some-new-attribute. So, we will be able to achieve what we need if the tail sampling processor can be updated so that we can specify an attribute, and if X is sampled, any spans that share that attribute are also sampled (along with their whole trace). e.g.

processors:
  tail_sampling:
    policies: [{
        name: some-new-attribute-policy
        type: shared_attribute
        attribute: some-new-attribute
    },...

Describe alternatives you've considered

No response

Additional context

https://github.com/open-telemetry/opentelemetry-specification/issues/2918 is a similar problem for head sampling

It appears that the probabilistic sampler can be configured to sample based on some other attribute other than the trace ID. This is similar to what we might need: as well as the processor collecting together all spans that share a trace ID, and sampling all these, it would collect together all spans that share a custom attribute, and sample all these.

github-actions[bot] commented 3 months ago

Pinging code owners:

isobelormiston commented 2 months ago

Update: I asked this on the Cloud Native Computing Foundation slack channel. The main problem with this is that linked traces won't necessarily be processed by the same collector instance. So this would likely require an update to both the load balancing processor and the tail sampling processor.

jpkrohling commented 2 months ago

@isobelormiston, I believe the caching feature that @jamesmoessis just built for the tail-sampling would work well for that. I agree that further changes to the load-balancing would also be needed. If you are familiar with that component, would you be able to create a proposal for it?

jamesmoessis commented 2 months ago

This is an interesting and tricky problem. I'm unsure of the best way to solve this.

Since only one span in the trace knows about the span link, it seems infeasible to route the span to the same node that the linked trace would be routed to. It also doesn't consider that the linked trace could have also been linked to yet another trace. So, I think we need to keep strictly sharding by trace ID.

The only feasible way I could think of is a cache (e.g. Redis) which holds the span link information and sampling decisions. This could struggle if there were a lot of linked spans though.

@isobelormiston I think if you have the capability, the best solution would be to make sure that sampling.priority is set on your linked spans in uniform, and then look for that attribute on the tail sampler using numeric_attribtue policy (0 == not sampled, > 0 == sample). Similar to how you described.

I think it's the easiest and most reasonable course for action if you have access to the code / instrumentation to add those attributes.

github-actions[bot] commented 3 weeks ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Darkemon commented 2 weeks ago

I wonder, if TraceIdRatioBasedSampler on client and collector sides behaves identically and is deterministic - the same result for the same id with specific ratio. Then I could predict if a span is going to be sampled by collector and add sampling.priority: 1 to the linked spans.

jpkrohling commented 1 week ago

I believe @jmacd was working on having them compatible among themselves, but I'm not 100% what's the current state.

Darkemon commented 1 week ago

@jmacd could you confirm the sampler works this way?

At least golang implementation of the sampler is deterministic https://github.com/open-telemetry/opentelemetry-go/blob/932a4d8a5f2536645618d7aee8e5da6b8e3b6751/sdk/trace/sampling.go#L71C1-L84C2

isobelormiston commented 1 week ago

That sounds reasonable, but in our particular case we're not sampling with the trace ID sampling policy. We're using the error-policy and latency-policy, so unfortunately there's no way we'll be able to predict in advance whether a span will be sampled.

@jamesmoessis on using the numeric_attribute policy, and making sure we've an attribute set consistently across all linked spans: the difficulty here is again that we don't know in advance whether we're going to want to sample span X, so we wouldn't know what to set the sampling.priority to ahead of time for the two linked spans.

Since only one span in the trace knows about the span link, it seems infeasible to route the span to the same node that the linked trace would be routed to.

This is a good point, I think this would make it very difficult to implement the required changes to the load balancer exporter.