Closed 0x006EA1E5 closed 2 years ago
Thanks for creating this issue. I'll try to summarize the current state of things here.
MVC and WebFlux have a different instrumentation and that's intended.
With Spring MVC, we're indeed opening an Observation.Scope
so that the current context is put in the ThreadLocal
.
With Spring WebFlux (and as of #29388) we're not opening a scope, because populating the ThreadLocal
doesn't make sense. Instead, we're setting the current observation in the reactor context so that it can be picked up as a parent observation later for tracing.
Micrometer documents the various instrumentation strategies for reactive libraries. If I'm not mistaken, using the .tap()
or handle()
operators will set things up for you, or you can get the reactor context and set up thread locals yourself, as demonstrated in this micrometer sample.
I think you're conflating observation scopes and traces.
The Observation API provides an abstraction, and ObservationHandler
implementations can react to observations and create metrics, traces as a result. If you're not convinced, you can check out the tracing handler implementation in micrometer. An Observation.Scope
is merely a way to wrap calls and make the tracing context available to that snippet. The Javadoc for openScope()
and parentObservation()
should heopfully clear that up.
We will not open a scope in the reactive Observation filter as it won't solve this problem. Your sample is using a simple, synchronous variant of a Controller handler, but using the same setup and logging from the pipeline (within a deferContextual
for example) will not yield the expected result.
The critical difference here is that executing a Controller handler method in WebFlux can be asynchronous - it sets up the reactive pipeline but nothing gets executed until the subscription happens. In the case of a reactive web application, this subscription happens on any or even multiple server worker threads.
I'm closing this issue as a result, pinging @marcingrzejszczak so that he can correct me if I said something wrong! Thanks!
I can add one more thing that setting scope in reactive (so putting in thread local) and closing out on finally can lead to the situation where observation is put in thread local for thread 1 but it's cleared for thread 2 (cause on finally would be called in thread 2). So we would be polluting thread number 1.
I think you're conflating observation scopes and traces.
Well, this is probably true; I am just trying to learn this stuff now, so apologies if I have misunderstood. :)
I think I follow you now, and see the issue.
I have had another look at this. It seems that it can be made to work using a Reactor Hook
, to call ContextSnapshot.setThreadLocalsFrom
around a Subscriber
's onNext()
:
https://github.com/0x006EA1E5/spring-boot-3-reactive-trace-logging/blob/use-reative-hook/src/main/java/me/geales/spring/trace/logging/ObservationContextSnapshotLifter.java#L54
This seems to work well across multiple execution threads.
Would something like the above be problematic, perhaps from a performance point of view?
The thing with all this is, the initial problem statement remains: users will expect the Observation
stuff to "just work". As it does in MVC, people will want it to work in a WebFlux app, e.g., if something writes a log deep inside the work of processing a request, it should have the trace info available.
It seems to me that expecting users to known about ContextSnapshot.setThreadLocalsFrom
, and paste that boilerplate all over their code, is an unfortunate barrier to adopting "observability" in the Spring WebFlux eco-system 😿
So getting as much of this to work "out of the box" seems like a reasonable thing to expect, at least in Spring Boot.
I have had another look at this. It seems that it can be made to work using a Reactor Hook, to call ContextSnapshot.setThreadLocalsFrom around a Subscriber's onNext(): https://github.com/0x006EA1E5/spring-boot-3-reactive-trace-logging/blob/use-reative-hook/src/main/java/me/geales/spring/trace/logging/ObservationContextSnapshotLifter.java#L54
This seems to work well across multiple execution threads.
Would something like the above be problematic, perhaps from a performance point of view?
The MDC topic is extensively documented in Reactor and your performance concerns have been experienced by others.
The thing with all this is, the initial problem statement remains: users will expect the Observation stuff to "just work". As it does in MVC, people will want it to work in a WebFlux app, e.g., if something writes a log deep inside the work of processing a request, it should have the trace info available.
Taking a step back, this is not really about observability but the fact that MDC is based on ThreadLocal
. We could discuss here the runtime behavior and programming style of reactive and how ThreadLocal
could be replaced in such an environment, but that's partially the goal of the entire context-propagation project. Spring Framework here merely applies the guidance provided by context-propagation, micrometer-observation and reactor-core.
It seems to me that expecting users to known about ContextSnapshot.setThreadLocalsFrom, and paste that boilerplate all over their code, is an unfortunate barrier to adopting "observability" in the Spring WebFlux eco-system 😿
So getting as much of this to work "out of the box" seems like a reasonable thing to expect, at least in Spring Boot.
You can follow the reasoning and multiple solutions considered in reactor/reactor-core#3149.
I understand the appeal of that feature, but I think that making the MDC available everywhere in reactive code is a non-goal in general. This is likely to make the runtime cost higher than just using a Servlet-based approach. Issues related to ThreadLocal
usage are not limited to reactive, but also a well-known design problem for Loom.
Thanks for the detailed reply.
It is interesting that since https://github.com/reactor/reactor-core/pull/3180 it is now the default behaviour to transparently restore the ThreadLocals for the handle
and tap
operators, if the context-propagation
library is available at runtime.
One last question: is it worth considering, in the special case where the handler method returns a non-reactive type, to use the handle
operator instead of Mono.just(result)
?
Something like:
if (adapter == null) {
return Mono.just(args)
.handle((arguments, synchronousSink) -> synchronousSink.next(method.invoke(getBean(), arguments)))
.map(value -> new HandlerResult(this, value, returnType, bindingContext));
}
This then delegates all of the special handling to Reactor.
Sorry, the above suggestion is for org.springframework.web.reactive.result.method.InvocableHandlerMethod#invoke
I don't think we should apply this suggestion as this would only be applied for synchronous method signatures but not on reactive ones. This would confuse developers even more. We should document this properly.
Please have a look at the outcome of the related issue: https://github.com/spring-projects/spring-boot/issues/33372#issuecomment-1431142013
Good morning! I'm running into the similar issue when switching from Spring Cloud Sleuth to Micrometer Tracing. With Sleuth everything just seems to work with Reactor based Spring project. When upgrading our service to Spring Boot 3.1.0 and adding the dependency,
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
the traceId
and spanId
are no logging present in the logs. I've also tried using Hooks.enableAutomaticContextPropagation()
, that didn't work either. So it seems that when updating to Spring Boot 3.x, we lose the ability for distributed tracing. I've read this post and many others that seemingly tracing is not supported using Spring Webflux. If there's an update on when this will be available to Webflux projects that would great to now. Thank you!
@joseph-freeman I'm afraid this is a bit vague. We do have working applications supporting this case. Please create a new issue with a minimal sample application showing the problem.
I'm having exactly the same issue. Using spring boot 3.1.0 with spring-boot-starter-web and micrometer-tracing-bridge-brave I can see the traceId and spanId from the logs. If I switch to spring-boot-starter-webflux the traceId and spanId are gone.
This is the logging pattern for both versions:
logging.pattern.level=%5p [%X{traceId:-},%X{spanId:-}]
Same thing @gianielsevier , this description is not enough for us to help you. Please create a new issue with a sample application showing the problem.
Affects: 6.0.0-RC4
Expected Behaviour When using
spring-boot-starter-webflux
I expect log message written in@RequestMapping
s methods to have thetraceId
andspanId
in the MDC.When using
spring-boot-starter-web
(non-reactive) this is indeed the observed behaviour, but it is not when usingspring-boot-starter-webflux
, as the logger MDC has notraceId
norspanId
properties.Having the
traceId
andspanId
in the MDC is helpful, as it allows correlation of spans to log messages. This is an established use case, see for example the "jump from trace to logs" feature in Grafana Tempo.Observed Behaviour See this repo for example https://github.com/0x006EA1E5/spring-boot-3-reactive-trace-logging
I can see when using
spring-boot-starter-web
that this functions as expected, as the non-reactiveServerHttpObservationFilter
first create, then starts a newObservation
, then wraps the rest of the filter chain in a try-with-resources block that opens anObservation.Scope
.However, using
spring-boot-starter-webflux
, with the reactiveServerHttpObservationFilter
, noObservation.Scope
is opened, although anObservation
is created andstart()
edIt seems odd, that although an
Observation
is created and started, noObservation.Scope
is explicitly opened.I can see that the
Observation.Scope
isAutoCloseable
. Indeed, it appears that if theAutoCloseable
is not closed at the end of a request, then the next request maintains the sametraceId
, which seems obviously wrong.Therefore, it would seem that the
AutoCloseable
should be created as a step in theMono
, and closed in adoFinally()
handler, but I don't understand this well.See this thread for more discussion: https://gitter.im/spring-projects/spring-boot?at=636bb545abfb5860bb0aca48
Alternatively, perhaps when an
Observation
is started, a defaultObservation.Scope
should be started, which should be closed when theObservation
isstop()
ed?