openzipkin / brave

Java distributed tracing implementation compatible with Zipkin backend services.
Apache License 2.0
2.36k stars 713 forks source link

Preserve MDC when using onLastOperator - webflux #1059

Open tony-clarke-amdocs opened 4 years ago

tony-clarke-amdocs commented 4 years ago

There was a huge performance problem when using sleuth in the reactive stack. See here

This was partially addressed by this

But the downside 1478 of contents of MDC would be lost anytime event loop thread is changed. @marcingrzejszczak suggests that to fix this a change would be required in brave. See here

Rational I don't see how we can use spring cloud sleuth with reactive spring without taking a huge performance hit or having an invalid MDC.

Example Scenario I want to use Spring Cloud Gateway and add custom key/values to MDC.

codefromthecrypt commented 4 years ago

I think this issue needs to become more concrete as it mentions mostly things not in this repository.

tony-clarke-amdocs commented 4 years ago

I hope @marcingrzejszczak can explain since he was the one that said it needed a change in brave :)

marcingrzejszczak commented 4 years ago

I think it comes down to a native Brave, Reactor support and improved context propagation. I can try to patch stuff in Sleuth but we could try with making Brave somehow (I have no idea yet how) provide an option not to use ThreadLocal but Reactor context to pass the tracing context. I don't even know if that's possible though if someone wants to just do tracer.currentSpan() from anywhere in their code.

codefromthecrypt commented 4 years ago

thanks, that's indeed more to the point, I guess something similar to the rxjava context stuff?

codefromthecrypt commented 4 years ago

I will try to help triage this

anuraaga commented 4 years ago

Having trace context in reactor context sounds similar to the Armeria integration

https://github.com/line/armeria/blob/master/brave/src/main/java/com/linecorp/armeria/common/brave/RequestContextCurrentTraceContext.java

codefromthecrypt commented 4 years ago

good point, it is indeed more the other way around (our rxjava being wrapping in a context vs armeria pulling a context)

codefromthecrypt commented 4 years ago

ps the referenced issues want coherence in the MDC state (as driven by brave), which is something we need to look closely at hooks on, if at reactor level.

codefromthecrypt commented 4 years ago

what I mean is that in order to do this we need users to be more aware of the reactor context (which may already be taken care of in webflux).. in fact at least limited to webflux this may be solvable. In armeria it isn't just the context storage but also that some lifecycle/propagation occurs in the instrumentation such that things inherit propertly. We'd probably need to look at that. For starters here's the reactor context docs https://projectreactor.io/docs/core/release/reference/#context

anuraaga commented 4 years ago

Looked at the context API a bit and noticed it's possible to get a context statically (e.g., in Brave's CurrentRequestContext) https://projectreactor.io/docs/core/3.3.0.RC1/api/reactor/core/publisher/Mono.html#subscriberContext--, I couldn't find a way to set it, which is needed when creating a new span. So approach similar to Armeria where CurrentRequestContext reads and stores from reactor's context may not be feasible right now.

codefromthecrypt commented 4 years ago

I'm going to start by refactoring what's in sleuth as it will teach me what functionality is needed, which will hone what we may or may not have to optimize

cambierr commented 3 years ago

Any news here guys ?