Open kalyanaj opened 6 months ago
Based on OTEP 235, updated requirement 3 above to include trace flags (so that a root span can determine whether the Random Trace ID Flag is set).
@jmacd please take a look: since you self-assigned this, I assume you are the sponsor for this?
https://github.com/jmacd/go-sampler/blob/main/README.md is a work in progress.
[Filing this per the discussion in the Sampling SIG. The goal is to improve clarity for everyone involved (including me) - by attempting to summarize in one place the motivation for why we need to enhance the Sampling API: what problems we need to solve and how we will solve them.]
Note: This is work in progress, feedback/corrections are welcome, and will iterate on this text.
V2 of Sampler API: What, Why, and How?
Executive Summary
The V1 of the Sampler and Span Processor APIs were defined in the OpenTelemetry specification a few years back. Since that time, customers and community members have shared many feedback items to improve it.
These involve aspects such as supporting deferred dropping of spans, making certain additional fields available for the sampling decision, better support for consistent sampling of linked traces, and the ability to have isolated processor/exporter pipelines.
Hence, we must introduce a V2 of the Sampling API to solve these problems. This will make sampling more powerful and flexible for OpenTelemetry customers to enable them to achieve the above aspects.
Problems with the current Sampling API
Here are a few problems that have been identified with the current sampling API:
1. No support for deferred dropping of spans
Currently, once a sampler decides to drop a span, it is dropped before it gets to the span processor or exporter. There's no support for deferring the dropping of such spans to a later stage in data collection, say to an out-of-proc collector. Why is this a problem? There are two reasons:
For more details, see the below issues:
2. No support for customizing behavior per exporter
Currently in the Tracing API, there's no way to cleanly have multiple processing + export pipelines with isolated behaviors. Currently when multiple processors are configured, a subsequent processor sees the changes made by a prior processor. However, there are situations where you want to have independent sampling behavior and independent processing and exporting of spans.
For example, in Metrics SDK's Reader and Exporter model, it is possible to have independent MetricReader and MetricExporter pipelines:
In the tracing API, we need a way of specifying that each exporter should have isolated control over its custom processing and sampling decision.
For more details, see Sampling: Each exporter should have isolated control over its sampler decision and custom processing · Issue #3284 · open-telemetry/opentelemetry-specification (github.com).
3. Certain fields are not available when making sampling decision
There are a few fields which are not available today while making a sampling decision. The below are the fields, and a summary of why it would be helpful:
1) SpanID: This can be useful when each span corresponds to an item in a batch. In such cases, customers want to use the spanID to make decisions. While a random number could be generated and used for it, having the spanid could help achieve consistent sampling decisions across logs and spans. 2) Instrumentation Library: Customers want this to suppress spans from certain instrumentation libraries. 3) Resources: Customers want this so that they can make sampling decisions based on resource attributes such as service.name. 4) TraceFlags: For new spans,
ShouldSample
doesn't currently have a way to know the new Span'sTraceFlags
, so it can't determine whether the Random Trace ID Flag is set. Hence, we should consider takingTraceFlags
as an additional parameter.For more details, see the below issues:
4. The description of a sampler is immutable which makes it less useful
Currently, the sampler's description is immutable. Ideally, this should be mutable, so that a sampler's current behavior (e.g., its current sampling rate - say if it was updated by talking to a config service) can be used programmatically or for debugging purposes.
For more details, see issue Remove unreasonable restriction on Sampler's description to be immutable · Issue #2095 · open-telemetry/opentelemetry-specification (github.com).
5. Composing samplers in a consistent manner is difficult
With the current sampler model, it is not easy enough to achieve composition of samplers that play well with consistent probability sampling requirements. For an example of a specific problem, please see this comment:
This is being addressed by the following OTEP.
Solution Approach
This section is TBD (work in progress).