spring-projects / spring-boot

Spring Boot
https://spring.io/projects/spring-boot
Apache License 2.0
74.58k stars 40.55k forks source link

Spring Boot 3 Webflux project missing traceId and spanId in logs #33372

Closed sanderino666 closed 1 year ago

sanderino666 commented 1 year ago

When upgrading to Spring Boot 3 (and Micrometer) I noticed the traceId and spanId are not added to the logs when it's a Webflux project.

Webflux: https://github.com/sanderino666/webflux-micrometer-logs

2022-11-26T17:27:44.991+01:00 INFO [,,] 19376 --- [ctor-http-nio-3] com.example.demo.DemoApplication : home() has been called

MVC: https://github.com/sanderino666/mvc-micrometer-logs

2022-11-26T17:24:12.050+01:00 INFO [,63823dabe8475ad8d31ec1fc81853dcc,d31ec1fc81853dcc] 1848 --- [nio-8080-exec-1] com.example.demomvc.DemoMvcApplication : home() has been called

bclozel commented 1 year ago

This is the expected behavior. See https://github.com/spring-projects/spring-framework/issues/29466

I'm keeping this opened for now as we need to find the best place to document this.

sanderino666 commented 1 year ago

Interesting, we are switching from Spring Cloud Sleuth to Micrometer and with SCS this all worked just like it did in a MVC app. Was that sheer luck?

I am trying to understand the answers spring-projects/spring-framework#29466, does it mean we have to wrap each time we want pass the traceId and spanId to the logs?

codependent commented 1 year ago

I was also surprised when upgraded and didn't see the traceId/spanId in the logs anymore. It'd be really helpful to document the change of behaviour and how to make it work in webflux applications.

codependent commented 1 year ago

I guess we are expected to open the scope using the observation located in the context, aren't we?

This allowed the creation of the traceId/spanId. However the WebClient doesn't seem to be propagating the correlation headers. Could you point at how we can achieve that? I think Sleuth automatically instrumented the WebClient, is there an equivalent with Micrometer Tracing or we should implement it by ourselves?

    @GetMapping("/hello")
    fun hello(): Mono<String> {
        return webClient.get().uri("http://localhost:7654/helloWc")
            .retrieve()
            .bodyToMono(String::class.java)
            .doOnNext {
                logger.info("Retrieved hello")
            }
            .contextWrite{ context: Context ->
                val observation = context.get<Observation>(ObservationThreadLocalAccessor.KEY)
                observation.openScope()
                logger.info("Generated traceId/spanId")
                context
            }
    }

    @GetMapping("/helloWc")
    fun helloWc(): Mono<String> {
        return Mono.just("HelloWc")
            .map {
                logger.info("HelloWc map!")
                it
            }
            .contextWrite{ context: Context ->
                val observation = context.get<Observation>(ObservationThreadLocalAccessor.KEY)
                observation.openScope()
                logger.info("Propagated traceId - generated spanId")
                context
            }
    }
2022-11-28T23:34:15.761+01:00  INFO [server,638537626c6674288df2d3f47ac89a95,8df2d3f47ac89a95] 15811 --- [ctor-http-nio-2] d.DemotracingApplication$$SpringCGLIB$$0 : Generated traceId/spanId
2022-11-28T23:34:20.625+01:00  INFO [server,6385376b55b888c43c9ede86ca645b6f,3c9ede86ca645b6f] 15811 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Propagated traceId - generated spanId
2022-11-28T23:34:22.242+01:00  INFO [server,6385376b55b888c43c9ede86ca645b6f,3c9ede86ca645b6f] 15811 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc map!
2022-11-28T23:34:23.074+01:00  INFO [server,638537626c6674288df2d3f47ac89a95,8df2d3f47ac89a95] 15811 --- [ctor-http-nio-2] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved hello

Notice the Propagated traceId - generated spanId has a new traceId (6385376b55b888c43c9ede86ca645b6f) instead of the original one (638537626c6674288df2d3f47ac89a95)

wilkinsona commented 1 year ago

/cc @marcingrzejszczak

marcingrzejszczak commented 1 year ago

WebClient automatically injects trace context to the headers - https://github.com/spring-projects/spring-framework/blob/main/spring-webflux/src/test/java/org/springframework/web/reactive/function/client/WebClientObservationTests.java . We have also standalone samples for that here https://github.com/micrometer-metrics/micrometer-samples/tree/main/webclient

I guess we are expected to open the scope using the observation located in the context, aren't we?

You don't have to because you will pollute the threads. What we do is we require the ObservationThreadLocalAccessor.KEY to be there in Reactor Context and then we manually correlate observations - https://github.com/spring-projects/spring-framework/blob/main/spring-webflux/src/main/java/org/springframework/web/reactive/function/client/DefaultWebClient.java#L464

Have you created the WebFilter as a bean like this? https://github.com/micrometer-metrics/micrometer-samples/blob/main/webclient/src/main/java/com/example/micrometer/WebClientApplication.java#L48 Maybe your ObservationRegistry wasn't injected. For sure you shouldn't open scopes as you showed that in the sample.

marcingrzejszczak commented 1 year ago

I am trying to understand the answers https://github.com/spring-projects/spring-framework/issues/29466, does it mean we have to wrap each time we want pass the traceId and spanId to the logs?

For now yes, you can read more about it here https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries

codependent commented 1 year ago

@marcingrzejszczak thanks for the clarification, I'll go over the references you provided.

codependent commented 1 year ago

@marcingrzejszczak

I've updated the sample, now it looks like this:

@RestController
@SpringBootApplication
class DemotracingApplication(
    private val webClient: WebClient,
    private val observationRegistry: ObservationRegistry,
    private val tracer: Tracer
) {

    private val logger = LoggerFactory.getLogger(javaClass)

    @GetMapping("/hello")
    fun hello(): Mono<String> {
        val observation: Observation = Observation.start("webclient-sample", observationRegistry)
        return Mono.just(observation).flatMap { span ->
            observation.scoped {
                logger.info("Hello!")
            }
            webClient.get().uri("http://localhost:7654/helloWc")
                .retrieve()
                .bodyToMono(String::class.java)
                .doOnNext {
                    observation.scoped {
                        logger.info("Retrieved hello")
                    }
                }
        }.doFinally { signalType ->
            observation.stop()
        }.contextWrite { it.put(ObservationThreadLocalAccessor.KEY, observation) }
    }

    @GetMapping("/helloWc")
    fun helloWc(): Mono<String> {
        val observation: Observation = Observation.start("webclient-samplewc", observationRegistry)
        observation.scoped {
            logger.info("HelloWc!")
        }
        return Mono.just("HelloWc")
            .map {
                observation.scoped {
                    logger.info("HelloWc map!")
                }
                it
            }
    }
}

The WebClient is defined as a bean:

@Configuration
internal class WebClientConfiguration {

    @Bean
    fun webClient (): WebClient {
        return WebClient.builder().build()
    }
}

I can see the generated traceId/spanId in the observation.scoped block logs. However, the service that is invoked by the WebClient still shows a different traceId:

2022-11-29T17:05:58.008+01:00  INFO [server,63862de503de9741d4bf5e608895f0b6,d4bf5e608895f0b6] 19035 --- [ctor-http-nio-2] d.DemotracingApplication$$SpringCGLIB$$0 : Hello!
2022-11-29T17:05:58.243+01:00  INFO [server,63862de60d0ff4fcb42ff242ce42f22c,b42ff242ce42f22c] 19035 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc!
2022-11-29T17:05:58.244+01:00  INFO [server,63862de60d0ff4fcb42ff242ce42f22c,b42ff242ce42f22c] 19035 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc map!
2022-11-29T17:05:58.265+01:00  INFO [server,63862de503de9741d4bf5e608895f0b6,d4bf5e608895f0b6] 19035 --- [ctor-http-nio-2] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved hello

What am I missing?

wilkinsona commented 1 year ago

@codependent You should inject the auto-configured WebClient.Builder rather than creating a new one and then call build() on it to create your WebClient bean. Using the auto-configured builder ensures that the client's instrumented for observation.

codependent commented 1 year ago

@wilkinsona How strange, I updated it to use the WebClient.Builder but still see that requests aren't being correlated.

The sample project I'm using is available here in case you can have a look a it: https://github.com/codependent/demotracing

@Configuration
internal class WebClientConfiguration {

    @Bean
    fun webClient (webClientBuilder: WebClient.Builder): WebClient {
        return webClientBuilder.build()
    }
}
2022-11-29T17:51:10.870+01:00  INFO [server,6386387e6592debc20bd5a56f3a991a9,20bd5a56f3a991a9] 19544 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Hello!
2022-11-29T17:51:10.969+01:00  INFO [server,6386387ed0cd998c9e9e1aa80a2fd1ea,9e9e1aa80a2fd1ea] 19544 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc!
2022-11-29T17:51:10.970+01:00  INFO [server,6386387ed0cd998c9e9e1aa80a2fd1ea,9e9e1aa80a2fd1ea] 19544 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : HelloWc map!
2022-11-29T17:51:10.994+01:00  INFO [server,6386387e6592debc20bd5a56f3a991a9,20bd5a56f3a991a9] 19544 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved hello
codependent commented 1 year ago

I had a look at the Webflux sample and adapted the code:

    @GetMapping("/hello")
    fun hello(): Mono<String> {
        return Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                .use { scope: ContextSnapshot.Scope ->
                    val traceId = tracer.currentSpan()!!.context().traceId()
                    logger.info("<ACCEPTANCE_TEST> <TRACE:{}> Hello!", traceId)
                    webClient.get().uri("http://localhost:7654/helloWc")
                        .retrieve()
                        .bodyToMono(String::class.java)
                        .doOnNext {
                            logger.info("Retrieved helloWc {}", it)
                        }
                }
        }
    }

    @GetMapping("/helloWc")
    fun helloWc(): Mono<String> {
        return Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                .use { scope: ContextSnapshot.Scope ->
                    val traceId = tracer.currentSpan()!!.context().traceId()
                    logger.info("<ACCEPTANCE_TEST> <TRACE:{}> HelloWc", traceId)
                    Mono.just(traceId)
            }
        }
    }

Now I can see that the log correlation almost works as expected:

2022-11-29T22:08:33.824+01:00  INFO [server,638674d1a9fb39a38867f5b4c3bbe2ec,8867f5b4c3bbe2ec] 20745 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:638674d1a9fb39a38867f5b4c3bbe2ec> Hello!
2022-11-29T22:08:33.938+01:00  INFO [server,638674d1a9fb39a38867f5b4c3bbe2ec,49ff37d958d5d315] 20745 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:638674d1a9fb39a38867f5b4c3bbe2ec> HelloWc
2022-11-29T22:08:33.960+01:00  INFO [server,,] 20745 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved helloWc io.micrometer.context.DefaultContextSnapshot$DefaultScope@7739ec0c

If you look at the last line of the log, that correspondes to the doOnNext block, it doesn't show the traceId/spanId.

Modifying the code to include another nested ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY) call inside the ´doOnNext{}´, the traceId/spanId are shown:

    @GetMapping("/hello")
    fun hello(): Mono<String> {
        return Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                .use { scope: ContextSnapshot.Scope ->
                    val traceId = tracer.currentSpan()!!.context().traceId()
                    logger.info("<ACCEPTANCE_TEST> <TRACE:{}> Hello!", traceId)
                    webClient.get().uri("http://localhost:7654/helloWc")
                        .retrieve()
                        .bodyToMono(String::class.java)
                        .doOnNext {
                            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                                .use { scope: ContextSnapshot.Scope ->
                                    logger.info("Retrieved helloWc {}", it)
                                }
                        }
                }
        }
    }
2022-11-29T22:12:58.103+01:00  INFO [server,638675da838c746176c5388aedc7127f,76c5388aedc7127f] 20849 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:638675da838c746176c5388aedc7127f> Hello!
2022-11-29T22:12:58.215+01:00  INFO [server,638675da838c746176c5388aedc7127f,3fd552c513cb339b] 20849 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:638675da838c746176c5388aedc7127f> HelloWc
2022-11-29T22:12:58.237+01:00  INFO [server,638675da838c746176c5388aedc7127f,76c5388aedc7127f] 20849 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : Retrieved helloWc 638675da838c746176c5388aedc7127f

It seems the change from Sleuth to Micrometer Tracing is going to need quite a lot of work on our codebases. Can you confirm this is the proper usage?

BTW, thanks for your answers.

marcingrzejszczak commented 1 year ago

It seems the change from Sleuth to Micrometer Tracing is going to need quite a lot of work on our codebases. Can you confirm this is the proper usage?

For now, if you want to have the loggers have correlated MDC context, then yes, it will require some work. Because you could instead of doing doOnNext just call tap() or handle() and then you'll get the MDC correlated for you.

I'm in touch with @OlegDokuka and the reactor team to work on something more automated that would resemble how Sleuth worked but would be more performant and less buggy. But that will take some time.

codependent commented 1 year ago

It seems the change from Sleuth to Micrometer Tracing is going to need quite a lot of work on our codebases. Can you confirm this is the proper usage?

For now, if you want to have the loggers have correlated MDC context, then yes, it will require some work. Because you could instead of doing doOnNext just call tap() or handle() and then you'll get the MDC correlated for you.

I'm in touch with @OlegDokuka and the reactor team to work on something more automated that would resemble how Sleuth worked but would be more performant and less buggy. But that will take some time.

Great to know!

One last thing, it'd be good to document how to proceed with Kotlin coroutines as well. This approach doesn't look quite idiomatic, is there a better (more idiomatic) way to do the MDC setup in this case?

    @PostMapping
    @ResponseStatus(CREATED)
    suspend fun registerMetric(
        Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
            logger.info("registerMetric()")
            metricEventProducer.send(authentication.clientId, metric).toMono()
        }.awaitSingle()
    }
marcingrzejszczak commented 1 year ago

Hmm theoretically if you have micrometer-core on the classpath it should work out of the box? (https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/kotlin/io/micrometer/core/instrument/kotlin/AsContextElement.kt)

codependent commented 1 year ago

@marcingrzejszczak micrometer-core is actually on the classpath, I'm setting the context using asContextElement:

class DemotracingApplication(
    private val webClient: WebClient,
    private val tracer: Tracer,
    private val observationRegistry: ObservationRegistry
) {

    private val logger = LoggerFactory.getLogger(javaClass)

    @GetMapping("/hello")
    suspend fun hello(): String {
        val context = observationRegistry.asContextElement()
        return withContext(context) {
            val traceId = tracer.currentSpan()?.context()?.traceId()
            logger.info("<ACCEPTANCE_TEST> <TRACE:{}> Hello!", traceId)
            val response: String = webClient.get().uri("http://localhost:7654/helloWc")
                .retrieve()
                .bodyToMono(String::class.java)
                .awaitSingle()
            response
        }

    }

    @GetMapping("/helloWc")
    suspend fun helloWc(): String {
        val context = observationRegistry.asContextElement()
        return withContext(context) {
            val traceId = tracer.currentSpan()?.context()?.traceId()
            logger.info("<ACCEPTANCE_TEST> <TRACE:{}> HelloWc", traceId)
            "Hi there"
        }
    }

}

Unfortunately MDC doesn't seem to be set and traceId is null:

2022-11-30T12:41:30.734+01:00  INFO [server,,] 23335 --- [ctor-http-nio-3] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:null> Hello!
2022-11-30T12:45:08.144+01:00  INFO [server,,] 23335 --- [ctor-http-nio-4] d.DemotracingApplication$$SpringCGLIB$$0 : <ACCEPTANCE_TEST> <TRACE:null> HelloWc
marcingrzejszczak commented 1 year ago

I'm not an expert with kotlin - can you debug that the kotlin classes in micrometer do actually get called?

codependent commented 1 year ago

What I've seen so far is:

@GetMapping("/hello")
suspend fun hello(): String {
    val context: CoroutineContext = observationRegistry.asContextElement()
    return withContext(context) {

When withContext is invoked it ends up calling KotlinObservationContextElement.updateThreadContext():

package io.micrometer.core.instrument.kotlin;

class KotlinObservationContextElement implements ThreadContextElement<ContextSnapshot.Scope> {

    @Override
    @SuppressWarnings("MustBeClosedChecker")
    public ContextSnapshot.Scope updateThreadContext(CoroutineContext coroutineContext) {
        return this.contextSnapshot.setThreadLocals(ObservationThreadLocalAccessor.KEY::equals);
    }
    public Scope setThreadLocals(Predicate<Object> keyPredicate) {
        Map<Object, Object> previousValues = null;
        for (ThreadLocalAccessor<?> accessor : this.contextRegistry.getThreadLocalAccessors()) {

Here the this.contextRegistry.getThreadLocalAccessors() is an empty Collection.

Unlike the Reactive sample, with coroutines ObservationThreadLocalAccessor.setValue(Observation value) is never called

    @Override
    public void setValue(Observation value) {
        Observation.Scope scope = value.openScope();
        scopes.set(scope);
    }
rp199 commented 1 year ago

@codependent I'm also interested in getting this to work in a webflux+coroutine environment and I think I know why you sample code doesn't work.

As far as I've understood, the observationRegistry.asContextElement() relies on the ThreadLocal to load the ContextSnapshot (not 100% sure what is this, but I understand that it's where the "observation settings and values" for the given span are "stored"). In your example, the ThreadLocal would be empty as there's nothing that configures/decorates for that request (unless something leaks from other requests ofc) . I guess this extensions was intended to be used on the 1 thread per request model, where the ThreadLocal can be (most of the times) trusted.

As we are in the "reactor" world here, we need to have a way of building this context element from the reactor context instead. I managed to get it to work by using the following snippet on filter (I'm using functional endpoints):

val coContextElement = ContextSnapshot.setThreadLocalsFrom(
                coroutineContext[ReactorContext]!!.context,
                ObservationThreadLocalAccessor.KEY
            ).use {
                observationRegistry.asContextElement()
            }

withContext(coContextElement){
// Everything should work just fine inside this scope
}

I've quickly tested this and it seems to be working fine, I'm able to log with all MDC fields, including trace and span id.

Anyway, I think there could be a simpler way to achieve this, as this is verbose and also feels that is doing more steps than needed:

I'm deducting this based on my understanding of coroutines and reactor contexts, I don't understand much what happens behind the scenes on the tracing part, so be aware that I could be assuming wrong things here.

It would be nice to have a better way to achieve here. Maybe an extension that could build the context element directly from the reactor context?

Would it be also nice it we could manage to get the traceId/spanId and any other baggage fields from the reactor context directly, instead of having to having to inject a tracer and doing these workarounds (on spring sleuth, it was possible to set and extract baggage values purely on the reactor context, here I couldn't find a direct way).

codependent commented 1 year ago

@rp199 you nailed it!

I agree with you, things were quite easier with Sleuth, where all the instrumentation was completely transparent from our code. This new approach feels kind of invasive but I guess there must be good reasons (performance, bugs...) for it.

As you mentioned it'd be nice to have a simpler way to deal with the obvervations and context propagation.

akoufa commented 1 year ago

Hello to all. I have created the following helper (still hacky code, will improve it later)

suspend fun Logger.debugAsync(message: String) { Mono.just(Unit).handle { t, u -> error(message) u.next(t) }.awaitSingleOrNull()

This works and I am seeing traceId and spanId in the logs.

Also using the io.micrometer.context-propagation the traceId is successfully passed to the other microservices

codependent commented 1 year ago

@rp199 you mentioned that you applied it using a filter, I guess you meant a HandlerFilterFunction implementation, right? Would you mind showing some code of the function|handler|filter?

I was wondering if there could be a way to move up that configuration to a traditional WebFilter and came across two problems

  1. Logger invocations inside the filter aren't being correlated
@Component
class ObservationContextPropagationFilter: WebFilter {

    private val logger = LoggerFactory.getLogger(javaClass)
    override fun filter(exchange: ServerWebExchange, chain: WebFilterChain): Mono<Void> {
        return Mono.deferContextual { contextView: ContextView ->
            ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY)
                .use { scope: ContextSnapshot.Scope ->
                    logger.info("Filter")
                    chain.filter(exchange)
                }
        }
    }
}
2022-12-02T11:15:40.322+01:00  INFO [server,,] 34193 --- [ctor-http-nio-3] .e.d.ObservationContextPropagationFilter : Filter
  1. It doesn't seem to be possible to set the coroutine context from a WebFilter and propagate it downstream.

I'm scraching my head to see if I can come up with a way to configure the appropriate context without having to modify lots of places in a huge codebase.

@akoufa That's a nice workaround, the only downside is that we would need to wrap quite a few methods, for different log levels and number of arguments:

suspend fun Logger.observedInfo(message: String) {
    observedInfoGen { info(message) }
}

suspend fun Logger.observedInfo(message: String, arg: Any) {
    observedInfoGen { info(message, arg) }
}

suspend fun Logger.observedInfo(message: String, arg: Any, arg2: Any) {
    observedInfoGen{ info(message, arg, arg2) }
}

private suspend fun observedInfoGen(f: () -> Any?) {
    Mono.just(Unit)
        .handle { t: Unit, u: SynchronousSink<Any> ->
            f()
            u.next(t)
        }.awaitSingleOrNull()
}

Besides, when having multiple calls...

logger.observedInfo("Hey")
...
logger.observedInfo("My message is {}", "Hey")

... each one of them sets up the current Observation into the context. I wonder if it might be less performant that setting it up only once per request using withContext(coContextElement).

akoufa commented 1 year ago

@codependent Yes unfortunately my solution has a downside to adapt every log statement or at least the important ones like debug, error needed for log correlation when debugging. But I did not found any other less invasive solution yet.

I think you just need to add .contextCapture() . It work's without it but according to the docs they are adding it after the handle or tap . But I did not figure out yet why it seems to work without it. Maybe @marcingrzejszczak can help here

marcingrzejszczak commented 1 year ago

I think you just need to add .contextCapture() . It work's without it but according to the docs they are adding it after the handle or tap . But I did not figure out yet why it seems to work without it. Maybe @marcingrzejszczak can help here

You don't need to use contextCapture if the reactor context already contains all the necessary information (like the ObservationThreadLocalAccessor.KEY entry)

bclozel commented 1 year ago

From a Spring Boot perspective, I think we're missing a NOTE in our documentation (and possibly our migration guide) about this. I'm turning this issue into a documentation one.

@marcingrzejszczak is there a page we could point people to about reactive+tracing+MDC logging specifically? The context propagation page seems too generic IMO; I know that work is being done to improve support here, so we should have a resource that explains the current situation somewhere. What do you think?

akoufa commented 1 year ago

@marcingrzejszczak Ok I don't understand then, why if I did not manually put ObservationThreadLocalAccessor.KEY entry to the Reactor Context, I see trace id and span id in the logs using the handle approache I posted in a previous comment. Does Spring Boot or any other of the micrometer libs put this information in the Reactor Context?

marcingrzejszczak commented 1 year ago

@marcingrzejszczak Ok I don't understand then, why if I did not manually put ObservationThreadLocalAccessor.KEY entry to the Reactor Context, I see trace id and span id in the logs using the handle approache I posted in a previous comment. Does Spring Boot or any other of the micrometer libs put this information in the Reactor Context?

Yes, Framework does it here https://github.com/spring-projects/spring-framework/blob/main/spring-web/src/main/java/org/springframework/web/filter/reactive/ServerHttpObservationFilter.java#L123

akoufa commented 1 year ago

@marcingrzejszczak Ok thank you. Last question: So this means that I can use the function:

suspend fun Logger.d(message: String) {
    Mono.just(Unit).handle { t, u ->
        debug(message)
        u.next(t)
    }.awaitSingleOrNull()
}

in any place in my Spring Boot code and the Reactor context will have the correct values so that trace and span id will be included in the logs?

rp199 commented 1 year ago

@rp199 you mentioned that you applied it using a filter, I guess you meant a HandlerFilterFunction implementation, right? Would you mind showing some code of the function|handler|filter?

@codependent I'm using the coRouter to build my routes and filters, so that means that I could use suspending functions directly there. But, it turns out that I made a wrong assumption there and I'm afraid that the it's not possible to achieve this via filter.

What I tried:

    @Bean
    fun router() = coRouter {
        GET("/test") {
            logger.info("Hello from test") // traceId and spanId missing in the logs. coroutineContext.currentObservation() returns null
            ServerResponse.ok().buildAndAwait()
        }
        filter { serverRequest, next ->
            val r = ContextSnapshot.setThreadLocalsFrom(
                coroutineContext[ReactorContext]!!.context,
                ObservationThreadLocalAccessor.KEY
            ).use {
                observationRegistry.asContextElement()
            }
            withContext(r) {
                logger.info("Hello from filter") // traceId and spanId set in the logs. coroutineContext.currentObservation() is present 
                next(serverRequest)
            }
        }
    }

Within the withContext block, you can launch several coroutines and change dispatchers, and the context information will be always there, and logs work just fine. BUT on the routes itself, this context element will be missing and the coroutineContext.currentObservation() will always return null.

The root cause of this issue is on the bridge between reactor and coroutines... The strategy used across the project it's to wrap the suspending functions with:

            mono(Dispatchers.Unconfined) {
                             suspendFun()
            }

This is also used to build the handler functions on the routes (org.springframework.web.reactive.function.server.CoRouterFunctionDsl):

    private fun asHandlerFunction(init: suspend (ServerRequest) -> ServerResponse) = HandlerFunction {
        mono(Dispatchers.Unconfined) {
            init(it)
        }
    }

The mono function internally copies the reactor context into the coroutine context, but unfortunately any other coroutine context defined in the filter is lost. I was assuming that this was taken care behind the scenes, but I was wrong. As per the current implementation of these functions, I see no obvious easy way to change in order to guarantee that any coroutine context defined on the filter is propagated to the child coroutines.

It seems to be changing back and forth between "reactor" and "coroutine" world, I see suspending functions being wrapped with monos in several places, which mean that even if the filter would accept a coroutine context, that could be lost in the next step.

TL;DR - Any custom coroutineContext defined in the filter is not propagated to the routes.

I think the workaround proposed by @akoufa it's still the best option as of now to ensure the MDC logging. We are using something similar in production projects even with spring sleuth as we heavily rely on coroutines, and if we don't do that the mdc context is easily lost or buggy, due to hopping between threads (e.g launching coroutines in different dispatchers)

I'm still interested into having another look at the CoRouterFunctionDsl to see if there could be a way to ensure that the filter coroutineContext could be propagated down the wire, will try it later.

akoufa commented 1 year ago

The cumbersome solution I have proposed above seems to be working, albeit only with the Otel implementation of the Tracing lib as with Brave I was getting for all the requests the same traceId.

I continued on this path and I have created the following:

class CustomThreadLocalAccessor : ThreadLocalAccessor<CustomContext> {
    override fun key() = CUSTOM_CONTEXT_KEY

    override fun getValue() = null

    override fun reset() {
        MDC.remove(X_REQUEST_ID)
    }

    override fun setValue(value: MsgContext) {
        MDC.put(X_REQUEST_ID, value.requestId)
    }
}

Then I only have to put the X_REQUEST_ID to the Reactor Context in a contextWrite function and the X_REQUEST_ID will be shown in the logs correctly.

After researching though I found out that the Slf4JBaggageEventListener would do the same for me if I had included the X_REQUEST_ID as a baggage. But unfortunately baggage seems to not work correctly. I also did not found anything in the documentation. How should I add values from headers to baggage in a Spring Boot Webflux application. Defining remote and corellation fields in the application properties does not work.

@marcingrzejszczak @bclozel Do you maybe have any idea on this? Thanks

marcingrzejszczak commented 1 year ago

Baggage with OTel tracing could have some bugs, we're planning to fix them in 1.0.1 (https://github.com/micrometer-metrics/tracing/milestone/11?closed=1)

marcingrzejszczak commented 1 year ago

Can you update your code to tracing 1.0.1 and see if things got fixed ?

akoufa commented 1 year ago

@marcingrzejszczak Thanks for the great work. I have updated to Spring Boot 3.0.2 which contains tracing 1.0.1 and seems to work. I will check it out futher thoroughly and will come back with a final confirmation the next days.

marcingrzejszczak commented 1 year ago

Cool, so let's close this issue then

akoufa commented 1 year ago

@marcingrzejszczak The inclusion of the baggage in the logs does work as expected now. But propagating the baggage to other services does not work yet. We have to still add a request filter to our webclients and add the values ourselves to the headers

bclozel commented 1 year ago

@akoufa are you building your WebClient instances by using the WebClient.Builder bean auto-configured by Spring Boot? Without that, the ObservationRegistry is not configured on the client.

Something like this:

@Component
class MyService {

  private final WebClient client;

  public MyService(WebClient.Builder builder) {
    this.client = builder.build();
  }

  public void service() {
    // using this.client
  }

}
singhbaljit commented 1 year ago

Is there a way to change the header key for the traceId when making a WebClient request? I'm making a call to a third-party APIs that only accepts X-Correlation-ID.

Also, what is the best way to extract the traceId from the Reactor context (or ServerWebExchange)? I can propagate that header myself using ExchangeFilterFunction. But I would also like to access the traceId for additional processing.

akoufa commented 1 year ago

@bclozel Yes I am using the autoconfigured builder like this:

@Bean
fun webclient(httpClient: HttpClient, webClientBuilder: Builder) = webClientBuilder
    .clientConnector(ReactorClientHttpConnector(httpClient))
    .filter(requestExchangeFilterFunction)
    .build()

and actually the traceparent is successfully passed to the next microservices. Only the baggage is missing and not passed through automatically.

jonathannaguin commented 1 year ago

I am also a bit lost here... I went throught the migration guides and I can't honestly make Spring Webflux work with tracing as I used to with Sleuth.

marcingrzejszczak commented 1 year ago

It won't work at this point the way it worked with Sleuth. We're working on improving that. Currently you need to use tap, handle or ContextSnapshot API directly https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries

jonathannaguin commented 1 year ago

@marcingrzejszczak I also tried to use that manual instrumentation however that seems to require using the following construct in every class:

return Mono.deferContextual(contextView -> {
            try (ContextSnapshot.Scope scope = ContextSnapshot.setThreadLocalsFrom(contextView, ObservationThreadLocalAccessor.KEY))
            {}
        });

I tried to put this logic on the @RestController layer but any subsequent call drops the context.

Wondering if it is worth resurrecting the Spring Sleuth logic for Reactor but with micrometer in mind.

marcingrzejszczak commented 1 year ago

@marcingrzejszczak I also tried to use that manual instrumentation however that seems to require using the following construct in every class:

Yes, that would be required.

Wondering if it is worth resurrecting the Spring Sleuth logic for Reactor but with micrometer in mind.

We are working on that :)

jonathannaguin commented 1 year ago

@marcingrzejszczak do we have a release plan for that?

Anonymous-Coward commented 1 year ago

@marcingrzejszczak We are working on that :)

I just hope you move the logic that was there in Sleuth to Micrometer, rather than resurrecting the Sleuth dependency. Wouldn't be fun to have to reconfigure dependencies dozens of projects once this is out.

jonathannaguin commented 1 year ago

@marcingrzejszczak would you be able to provide details on which project this logic will be implemented and when? Thanks

marcingrzejszczak commented 1 year ago

Please check this entry in the docs https://micrometer.io/docs/observation#instrumentation_of_reactive_libraries . For your convenience I'm copying it below

In Reactor 3.5.3 release (through this PR) an option to turn on automated context propagation was added. To use this, please ensure that you’re using the following projects at minimum in the following versions:

To use the feature call the new Reactor’s Hook method (e.g. in your public static void main method) like this

Hooks.enableAutomaticContextPropagation();

So just manually bump the versions of all of the libraries presented above and you will have this feature available. I guess we're planning to add this support to the next Boot release cc @bclozel ?

chemicL commented 1 year ago

I just created #34201 - please watch the progress on that one.

In the meantime, feel free to try out the feature we just added in reactor-core 3.5.3: https://github.com/reactor/reactor-core/pull/3335.

At the start of the application, call Hooks.enableAutomaticContextPropagation() and the logs should be present in the entire reactive chain, so there's no need to use handle or tap with this feature enabled.

[edit] while posting I noticed @marcingrzejszczak provided all the necessary details, so just follow what's in the post above :-)

Please report any issues discovered using this mechanism.

bclozel commented 1 year ago

Closing in favor of #34201

bclozel commented 1 year ago

Reopening as #34201 will not solve this issue completely.

VinishKumar1 commented 1 year ago

Still not able to see the traceid and spanid propagate to calls inside methods like .doOnError() or .onErrorResume(). Were those cases be handled as well? @marcingrzejszczak