open-telemetry / opentelemetry-go-contrib

Collection of extensions for OpenTelemetry-Go.
https://opentelemetry.io/
Apache License 2.0
1.08k stars 509 forks source link

B3 Propagators should set sample bit when debug flag is set #367

Closed TommyCpp closed 3 years ago

TommyCpp commented 3 years ago

As suggested in b3 propagation spec

Debug is an emphasized accept decision that implies accept

Currently, when debug flag is set, the span context's traceflags will be set to FlagsDebug. One example can be found in tests. Also when trace flag is set to FlagsDebug only, the SpanContext.IsSampled method will return false

I digged into commit history to find some ideas. I saw below lines in commit message of c506e99.

Update SpanContext.IsSampled to return if the sampling decision is to sample rather than if the sample bit is set. This means that if the debug bit is also set it will return true.

but it didn't really change the isSampled method. As for now SpanContext.IsSampled still returns whether the sample bit is set.

MrAlias commented 3 years ago

Given the debug bit implies the accept sample decision there is not an explicit need to set the sampled flag directly, but it does seem correct to evaluate this flag when reporting if the span is sampled.

I had originally made the change to IsSampled that you referenced from that commit, but ended up reverting based on the change to existing behavior. I wanted to limit the changes included in that PR, but forgot to follow up on that. It seems valid that we updated that method with something like the following:

// IsSampled returns if the sampling bit is set in the trace flags.
func (sc SpanContext) IsSampled() bool {
    return (sc.TraceFlags&FlagsSampled == FlagsSampled) || sc.isDebug()
}

The sampling consequences of this need to be fully explored though. Changing the OpenTelemetry implementation to support B3 and causing other sampling decisions for other propagators to change is not desired.

TommyCpp commented 3 years ago

I think W3C TraceContext format only have sampled for now so TraceContext should be good. CorrelationContext doesn't seems to have anything to do with sample decision. So we could probably make the change

mrveera commented 3 years ago

can i pick this one?

MrAlias commented 3 years ago

Digging a bit further into this again I realized I had reverted the original change because the IsSampled method directly states that it just returns if the Sampled bit is set:

https://github.com/open-telemetry/opentelemetry-go/blob/224629bd0ba38bd65f6abd347ddff878e7d7e4dc/api/trace/span_context.go#L194

This is in contrast to the interpreted meaning that it returns if the span is sampled. From this, and based on the reference implementation this was based on ...

https://github.com/openzipkin/zipkin-go/blob/c29478e51bfb2e9c93e0e9f5e015e5993a490399/propagation/b3/spancontext.go#L52-L57

https://github.com/openzipkin/zipkin-go/blob/c29478e51bfb2e9c93e0e9f5e015e5993a490399/propagation/b3/spancontext.go#L164-L165

the sampled bit is not set when the debug flag is set given it's implied state.

This means that to implement the changes this issue suggests we would diverge from the reference implementation and need to change the semantic meaning of the IsSampled method. I do not think we should do the former, but possibly could see the latter making sense.

@veera83372 I would hold off on implementing this until we can resolve a direction.

TommyCpp commented 3 years ago

I was looking at the Java implementation and they set the debug bit and sample bit at the same time when the decision is debug.

https://github.com/openzipkin/brave/blob/34dfe4a3c5f7e5e2712465f0144d79df485375e6/brave/src/main/java/brave/propagation/SamplingFlags.java#L26-L30

I guess the problem is if we change IsSampled implantation, it may cause unexpected behavior with existing code for users. But given we are before GA. breaking changes are kind of expected.

MrAlias commented 3 years ago

I guess the problem is if we change IsSampled implantation, it may cause unexpected behavior with existing code for users.

That is going to be the worry for sure :+1: .

I was looking at the Java implementation and they set the debug bit and sample bit at the same time when the decision is debug.

I'm definitely not an expert in Java, but looking at that link I'm not seeing the sampling flag being set when the debug flag is set, rather the other way around (setting the debug flag when the debug and sampling flag are set).

https://github.com/openzipkin/brave/blob/34dfe4a3c5f7e5e2712465f0144d79df485375e6/brave/src/main/java/brave/propagation/SamplingFlags.java#L29

(Guessing I'm ignorant of something there though)

I think this points out an important thing we should take into consideration in that we probably want to follow the implementation in the language most users of this project will use (i.e. Go). The reason I didn't use the referred library directly was the overhead of translating from the libraries representation to our SpanContext didn't seem worth it. But I think the closer we can stay to that implementation the better chances we will have with compatibility.

I tried looking through the B3 implementation repo as well as the specification for issues and related PRs around this topic but did not find any. @TommyCpp do you think it would be worth opening an issue there to get some guidance on this?

TommyCpp commented 3 years ago

@MrAlias Yes I definitely think we could open an issue in B3 repo to clearify thing a little bit. https://github.com/openzipkin/b3-propagation/issues/43

About the Java implementation/Brave,

setting the debug flag when the debug and sampling flag are set

Yes that was actually my point. In that implementation when flag extracted is debug. The sample bit and debug bit in tracing context will both be set to 1. If the flag extracted is sampled then only sample bit in tracing context will be set to 1.

sample() method will only check if the sample bit is set. So when flag extracted is either debug or sampled, this method will return true

And also you are right we should probably stick to Go version. It's just I find comparing multiple different versions of implementations can be a good way to understand the protocol in general.

MrAlias commented 3 years ago

Looks like https://github.com/openzipkin/b3-propagation/issues/43#issuecomment-698632676 indicates that we should indeed set the sampling bit when debug is set. I'm :+1: for a PR for this at this point. @TommyCpp guessing you were hoping to do this work or would you be okay if someone else wanted to jump on this?

I'm going to migrate this issue to the contrib repo as the B3 propagator now resides there.

TommyCpp commented 3 years ago

@MrAlias Yep i'd love to! Could finish this along with the Jaeger propagator over the weekend.