open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.73k stars 888 forks source link

Define TraceState handling for root spans with `NewRootTraceState` option #4241

Open jmacd opened 2 weeks ago

jmacd commented 2 weeks ago

What are you trying to achieve?

As discussed in https://github.com/open-telemetry/opentelemetry-specification/pull/4162, there is a pending sampling specification that defines the use of TraceState set before creating a root span. In some respects, #4162 calls for Trace SDKs to have a mechanism for inserting trace randomness before calling the Sampler, since the SDK knows whether the TraceID is random according to Trace Context Level 2. On the other hand, we know there exist use-cases for consistent sampling using randomness provided by the user, which can support a number of coordinating sampling practices.

The OTel tracing API specification is ambiguous on how a user can modify the TraceState of a root span. It states:

Implementations MUST provide an option to create a `Span` as
a root span, and MUST generate a new `TraceId` for each root span created.
For a Span with a parent, the `TraceId` MUST be the same as the parent.
Also, the child span MUST inherit all `TraceState` values of its parent by default.

IOW it defines what the SDK will do for child spans, not for root spans. On the MUST-provided option to create a new root span, I've seen inconsistency in at least one OTel SDK. When the NewRoot option is selected in OTel-Go, the SDK resets the TraceState to an empty value. Otherwise, it leaves the incoming TraceState of the root context. It is possible to control the TraceState of a root span, except not when using NewRoot.

What did you expect to see?

The API specification should have an explicit option such as NewRootTraceState that sets the TraceState of root spans. SDKs should use NewRootTraceState if set, otherwise an empty TraceState when creating root spans. The incoming TraceState should not be used implicitly in cases where the TraceState is non-empty and the TraceID is invalid.

danielgblanco commented 2 weeks ago

@jmacd assuming here as well that the Sampling SIG will be working on this. Thanks.