Open isobelormiston opened 3 months ago
Pinging code owners:
processor/tailsampling: @jpkrohling
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Update: I asked this on the Cloud Native Computing Foundation slack channel. The main problem with this is that linked traces won't necessarily be processed by the same collector instance. So this would likely require an update to both the load balancing processor and the tail sampling processor.
@isobelormiston, I believe the caching feature that @jamesmoessis just built for the tail-sampling would work well for that. I agree that further changes to the load-balancing would also be needed. If you are familiar with that component, would you be able to create a proposal for it?
This is an interesting and tricky problem. I'm unsure of the best way to solve this.
Since only one span in the trace knows about the span link, it seems infeasible to route the span to the same node that the linked trace would be routed to. It also doesn't consider that the linked trace could have also been linked to yet another trace. So, I think we need to keep strictly sharding by trace ID.
The only feasible way I could think of is a cache (e.g. Redis) which holds the span link information and sampling decisions. This could struggle if there were a lot of linked spans though.
@isobelormiston I think if you have the capability, the best solution would be to make sure that sampling.priority
is set on your linked spans in uniform, and then look for that attribute on the tail sampler using numeric_attribtue
policy (0 == not sampled, > 0 == sample). Similar to how you described.
I think it's the easiest and most reasonable course for action if you have access to the code / instrumentation to add those attributes.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
I wonder, if TraceIdRatioBasedSampler on client and collector sides behaves identically and is deterministic - the same result for the same id with specific ratio. Then I could predict if a span is going to be sampled by collector and add sampling.priority: 1
to the linked spans.
I believe @jmacd was working on having them compatible among themselves, but I'm not 100% what's the current state.
@jmacd could you confirm the sampler works this way?
At least golang implementation of the sampler is deterministic https://github.com/open-telemetry/opentelemetry-go/blob/932a4d8a5f2536645618d7aee8e5da6b8e3b6751/sdk/trace/sampling.go#L71C1-L84C2
That sounds reasonable, but in our particular case we're not sampling with the trace ID sampling policy. We're using the error-policy
and latency-policy
, so unfortunately there's no way we'll be able to predict in advance whether a span will be sampled.
@jamesmoessis on using the numeric_attribute
policy, and making sure we've an attribute set consistently across all linked spans: the difficulty here is again that we don't know in advance whether we're going to want to sample span X, so we wouldn't know what to set the sampling.priority
to ahead of time for the two linked spans.
Since only one span in the trace knows about the span link, it seems infeasible to route the span to the same node that the linked trace would be routed to.
This is a good point, I think this would make it very difficult to implement the required changes to the load balancer exporter.
Component(s)
processor/tailsampling
Is your feature request related to a problem? Please describe.
I often have spans X and Y that are part of different traces but are linked via a span link. There appears to be no way to configure the tail sampling processor such that whenever one of these spans is sampled, the other is also sampled.
Describe the solution you'd like
Suppose I have two spans, X and Y, that are part of different traces but are linked via a span link. I want to be able to set up the tail sampling processor such that whenever X is sampled, Y is also sampled, as is the whole trace that Y belongs to. e.g.
A (possibly simpler) alternative to achieve this: we can update our code to give the two spans the same value of an attribute
some-new-attribute
. So, we will be able to achieve what we need if the tail sampling processor can be updated so that we can specify an attribute, and if X is sampled, any spans that share that attribute are also sampled (along with their whole trace). e.g.Describe alternatives you've considered
No response
Additional context
https://github.com/open-telemetry/opentelemetry-specification/issues/2918 is a similar problem for head sampling
It appears that the probabilistic sampler can be configured to sample based on some other attribute other than the trace ID. This is similar to what we might need: as well as the processor collecting together all spans that share a trace ID, and sampling all these, it would collect together all spans that share a custom attribute, and sample all these.