open-telemetry / opentelemetry-java

OpenTelemetry Java SDK
https://opentelemetry.io
Apache License 2.0
1.96k stars 806 forks source link

B3 Propagator can't properly handle missing `X-B3-Sampled` header #3517

Open jkwatson opened 3 years ago

jkwatson commented 3 years ago

If the X-B3-Sampled header is missing, the propagator is considering this as "not sampled" when creating the resulting TraceFlags. This will cause Samplers to then consider the parent as "not sampled" when actually B3 says that this should "leave the decision to the server".

I think this might be a bigger issue, however. When propagation is happening in Otel, we only have 2 options: sampled & not-sampled. We don't have a value that represents "you decide".

This might be an issue for the specs and interop with B3.

flangknecht commented 2 years ago

Hi @jkwatson, there's another issue with B3 propagation and the X-B3-Sampled header: In the OpenZipkin B3 Propagation Spec it says

Sampling state is almost always sent with Identifiers, but it can be sent alone (a predefined decision).

With the OTel Java SDK, the sampling decision is only evaluated if both Trace and Span IDs can be extracted from B3 headers. Thus, OTel-Java is oblivious to external sampling decisions from systems that do not themselves partake in tracing.

https://github.com/open-telemetry/opentelemetry-java/blob/3518c98c457516374f1f5f3eacc0d7133cd67908/extensions/trace-propagators/src/main/java/io/opentelemetry/extension/trace/propagation/B3PropagatorExtractorMultipleHeaders.java#L33-L59

The main use case where I would like to only send the X-B3-Sampled header is to suppress trace creation for monitoring endpoints (think Kubernetes liveness and readiness probes). For now, we're able to work around this by sending 0-padded 1 trace and span IDs (traceid: 00000000000000000000000000000001, spanid: 0000000000000001).

anuraaga commented 2 years ago

Thanks for bringing this up @flangknecht - I agree that we should respect the sampling-decision-only case. Unfortunately our code is quite entangled to the invalid span == no parent span, starting a new trace, so using 0 IDs to represent this would be tough in the SDK too. The only approach I can think of that doesn't cause these spans to participate in other areas, for example logs / exemplar injection (by using valid trace IDs like 000001) is to have a special SpanContext that is represented as invalid, except when making a parenting decision in SdkSpanBuilder here

https://github.com/open-telemetry/opentelemetry-java/blob/main/sdk/trace/src/main/java/io/opentelemetry/sdk/trace/SdkSpanBuilder.java#L173

Does anyone else have any ideas on this?

anuraaga commented 2 years ago

Well, exemplar I think already checks whether a span is sampled or not I think. Perhaps it's our logs injection logic which is a bit off, should we be checking sampled instead of invalid?

https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/log4j/log4j-context-data/log4j-context-data-2.16/library-autoconfigure/src/main/java/io/opentelemetry/instrumentation/log4j/v2_16/OpenTelemetryContextDataProvider.java#L34

jkwatson commented 2 years ago

As the original issue stated, I think this is a gap in otel's general interop with B3, and needs to be sorted in the specification, so it can be done uniformly across all the languages.

anuraaga commented 2 years ago

I think a missing header delegating a sampling decision is a totally foreign concept and definitely needs to be in the spec.

An explicit zero is just a compression mechanism though so it seems like it should be simpler. But I guess it's true the main trickiness is treating no span in context identically to invalid span in context, and guess we need to see how to deal with that

dipsk2 commented 10 months ago

We hit the same issue.

From the library's point of view, can you atleast provide a way for users to over-ride the default behavior (of setting it to 0 = not sampled), so that we can set it to an appropriate value ourselves that fits our infra setup. I think it will be very helpful for people inter-operating (and transitioning slowly from b3 -> w3c) and solve compatibility issues in a company with tons of microservices on different versions of spring-boot for e.g.

E.g. the proxies usually generates the span/trace ids, but does not send sampling flag to the server. In this case, since a proxy is a given for us, this results into NO tracing at all for us (by default) when we want to support B3 in the input.

I have created an issue in spring-boot project which also has a sample project for re-producing the issue - if needed. https://github.com/spring-projects/spring-boot/issues/37994