spring-projects-experimental / spring-cloud-sleuth-otel

Spring Cloud Sleuth support for OpenTelemetry
https://spring-projects-experimental.github.io/spring-cloud-sleuth-otel/docs/current/reference/html/
Apache License 2.0
111 stars 33 forks source link

Baggage is not propagated to child span #124

Closed yaoxuwan closed 1 year ago

yaoxuwan commented 2 years ago

Describe the bug Baggage is not propagated to child span. I have been able to get it to work with the default brave instrumentation but it appears that context is lost when call method OtelTracer::nextSpan.

Sample curl http://localhost --header 'x-request-id: 123' image image

marcingrzejszczak commented 2 years ago

Can you create a sample that replicates this? Please use Java and Maven if possible

yaoxuwan commented 2 years ago

@marcingrzejszczak sample

image

ghost commented 2 years ago

I'm seeing the same thing when using an @Async method.

ghost commented 2 years ago

Recreate in https://github.com/borismorris-anaplan/context-lost. Run with gradle run and send a GET to localhost:8080/hello with a header X-My-Baggage: SomeValue

In the logs you will see

{
    "instant": {
        "epochSecond": 1665170988,
        "nanoOfSecond": 800056000
    },
    "thread": "http-nio-8080-exec-1",
    "level": "INFO",
    "loggerName": "com.example.contextlost.HelloController",
    "message": "About to do some service stuff",
    "endOfBatch": false,
    "loggerFqcn": "org.apache.logging.slf4j.Log4jLogger",
    "contextMap": {
        "X-My-Baggage": "SomeValue",
        "spanId": "74a2b07177d603dd",
        "traceId": "6373e78687247e6e71839c26dcc0b369"
    },
    "threadId": 28,
    "threadPriority": 5,
    "timestamp": "1665170988800"
}

Followed by

{
    "instant": {
        "epochSecond": 1665170988,
        "nanoOfSecond": 814273000
    },
    "thread": "ThreadPoolTaskExecutor-1",
    "level": "INFO",
    "loggerName": "com.example.contextlost.MyServiceImpl",
    "message": "Doing some service stuff",
    "endOfBatch": false,
    "loggerFqcn": "org.apache.logging.slf4j.Log4jLogger",
    "contextMap": {
        "spanId": "cbe903b59ebf349f",
        "traceId": "6373e78687247e6e71839c26dcc0b369"
    },
    "threadId": 48,
    "threadPriority": 5,
    "timestamp": "1665170988814"
}

The second message is missing the X-My-Baggage key in the context.


Having debugged a little I can see in org.springframework.cloud.sleuth.instrument.async.TraceRunnable#run the parent has a context of type ArrayBasedContext containing the appropriate baggage, however the created child span has no context at all - it's an empty atomic reference.

marcioemiranda commented 2 years ago

Hello,

I have faced an issue using Sleuth with Otel that may be related to this ticket. We have a custom instrumentation of a messaging library that works fine with Brave but did not work on the consumer side when we enabled the otel bridge.

The relevant code snippet follows:

public Span nextSpan(MessageHeaderAccessor headers, Consumer<Span> spanCustomizer) {

        Span.Builder extracted = propagator.extract(headers, getter); //<-- baggage fields should be extracted from headers into trace context here. Context becomes span builder parent 

        //HACK
        Context extractedContext = this.extractContext(headers, getter);
        extractedContext.makeCurrent();
       //HACK

        Span span = extracted.start();

        Context context = OtelTraceContext.toOtelContext(span.context());
        logger.debug("@@@ Otel context {}",context.toString()); //<-- with the above hack I can see the context with baggage data propagated

        spanCustomizer.accept(span);

        threadLocalSpan.set(span); //<-- with the proper context in span, baggage is propagated to logs

        if (logger.isTraceEnabled()) {
            logger.trace("preHandle newSpan {}", span);
            logger.trace("preHandle currentSpan {}", tracer.currentSpan());
        }

        return span;
    }

Here is the analysis of the problem, starting with the propagator:


@Override
    public <C> Span.Builder extract(C carrier, Getter<C> getter) {
        Context extracted = this.propagator.extract(Context.current(), carrier, new TextMapGetter<C>() {
            @Override
            public Iterable<String> keys(C carrier) {
                return fields();
            }

            @Override
            public String get(C carrier, String key) {
                return getter.get(carrier, key);
            }
        }); //<-- this part works , the context is properly extracted with baggage information

        io.opentelemetry.api.trace.Span span = io.opentelemetry.api.trace.Span.fromContextOrNull(extracted);
        if (span == null || span.equals(io.opentelemetry.api.trace.Span.getInvalid())) {
            return OtelSpanBuilder.fromOtel(tracer.spanBuilder(""));
        }

        return OtelSpanBuilder.fromOtel(this.tracer.spanBuilder("").setParent(extracted)); // <-- let us drill down here
    }

The snippet bellow creates an otel span builder and sets the extracted context as parent, that´s fine

this.tracer.spanBuilder("").setParent(extracted)

The other part just wraps the otel span builder ...

return OtelSpanBuilder.fromOtel(...)

... as we can see in the bridge implementation of SpanBuilder

OtelSpanBuilder(io.opentelemetry.api.trace.SpanBuilder delegate) {
        this.delegate = delegate;
    }

    static Span.Builder fromOtel(io.opentelemetry.api.trace.SpanBuilder builder) {
        return new OtelSpanBuilder(builder);
    }

Now, when we start the span we have the following code from the bridge implementation of SpanBuilder

@Override
    public Span start() {
        io.opentelemetry.api.trace.Span span = this.delegate.startSpan();
        if (StringUtils.hasText(this.name)) {
            span.updateName(this.name);
        }
        if (this.error != null) {
            span.recordException(error);
        }
        this.annotations.forEach(span::addEvent);
        return OtelSpan.fromOtel(span); //<-- this is the problematic part
    }

The code above uses otel span builder (delegate) to start the span, decorate with some calls and then wraps otel´s span with OtelSpan.

Now, let´s take a look into OtelSpan

OtelSpan(io.opentelemetry.api.trace.Span delegate) {
        this.delegate = delegate;
        if (delegate instanceof SpanFromSpanContext) {
            SpanFromSpanContext fromSpanContext = (SpanFromSpanContext) delegate;
            this.context = fromSpanContext.otelTraceContext.context;
        }
        else {
            this.context = new AtomicReference<>(Context.current());
        }
    }

static Span fromOtel(io.opentelemetry.api.trace.Span span) {
        return new OtelSpan(span);
    }

We can see that it creates the wrapper class from the static method passing otel´s span. In the constructor it checks the implementation class, which in this case was SdkSpan, and then sets the context to otel´s Context.current(), but at this point Context.current() was empty.

The hack consisted of extracting the context using otel´s propagator api and making it the current context before starting the span.

public <C> Context extractContext(C carrier, Getter<C> getter) {

        Context extracted = this.propagation.getTextMapPropagator().extract(Context.current(), carrier, new TextMapGetter<C>() {
            @Override
            public Iterable<String> keys(C carrier) {
                return fields();
            }

            @Override
            public String get(@Nullable C carrier, String key) {

                return getter.get(carrier, key);
            }
        });

        return extracted;
    } 

Hope that helps ...

CarlosAFroes commented 2 years ago

... and it did help :) @marcioemiranda

I also had the same issue and marcioemiranda's investigation saved me precious time. Although the baggage propagation works fine with this 'hack' are we getting a fix for it in the future?

thx

marcingrzejszczak commented 2 years ago

We will - I have it on my radar