Guidance on logging with MDC

reactor / reactor-core

Non-Blocking Reactive Foundation for the JVM

http://projectreactor.io

Apache License 2.0

4.97k stars 1.2k forks source link

Guidance on logging with MDC #1985

Closed rstoyanchev closed 4 years ago

rstoyanchev commented 4 years ago

The Reactor documentation has Adding Context To a Reactive Sequence that covers the topic of ThreadLocal alternatives but there is no mention of how to log with MDC context.

@simonbasle has a blog post but I think it must not be easy to find because various suggestions keep showing up that typically tend to not work in all cases or have significant trade-offs.

There should be some guidance that is easier to find. Possibly even a short section in the Reactor docs with a link to the blog post. Whatever it ends up being, we can further link to it from the Spring Framework docs.

A more ambitious option would be to provide logging framework integrations but that could end up as a separate issue. In the mean time some minimal documentation would really help.

ttddyy commented 4 years ago

Hi,

I had usecase using MDC with WebClient and summarized how to do it in my blog post, MDC with WebClient in WebMVC.

I put implementations with Schedulers.onScheduleHook and Schedulers.addExecutorServiceDecorator, as well as Hooks.onEachOperator.

The implementation is I only considerd for WebMVC environment(I haven't thought about pure reactive env), but it is helpful to have such info as well since usage of WebClient brings people to use reactor in servlet environment.

membersound commented 4 years ago

What about the solution presented with reactor.core.publisher.Hooks? https://github.com/archie-swif/webflux-mdc/tree/master/src/main/java/com/example/webfluxmdc

@ttddyy could you give a working example with Schedulers.onScheduleHook() for spring-boot-2.2.x? I tried both approaches. While the one with Hooks.onEachOperator() worked, the Schedulers solution did not. But as the later looks really nice, I'd prefer it. Still I don't know why it's not working and the MDC is not copied correctly.

membersound commented 4 years ago

Here a proof that the mdc is not retained using the Schedulers approach:

    public static ExchangeFilterFunction logRequest =
        ExchangeFilterFunction.ofRequestProcessor(clientRequest -> {
            System.out.println("MDC=" + MDC.getCopyOfContextMap());
            return Mono.just(clientRequest);
        });

    public static ExchangeFilterFunction logResponse =
        ExchangeFilterFunction.ofResponseProcessor(clientResponse -> {
            System.out.println("MDC=" + MDC.getCopyOfContextMap());
            return Mono.just(clientResponse);
        });

    public static ExchangeFilterFunction mdcFilter = (request, next) -> {
        // here runs on main(request's) thread
        Map<String, String> map = MDC.getCopyOfContextMap();
        return next.exchange(request)
                .doOnRequest(value -> {
                    // here runs on reactor's thread
                    if (map != null) {
                        MDC.setContextMap(map);
                    }
                });
    };

@Configuration
@ConditionalOnClass(WebClient.class)
public class WebClientLoggingAutoConfiguration implements WebClientCustomizer {
    static {
        Schedulers.onScheduleHook("mdc", runnable -> {
            Map<String, String> map = MDC.getCopyOfContextMap();
            return () -> {
                if (map != null) {
                    MDC.setContextMap(map);
                }
                try {
                    runnable.run();
                } finally {
                }
            };
        });
    }

    @Override
    public void customize(WebClient.Builder builder) {
        builder.filter(mdcFilter)
                .filter(logRequest)
                .filter(logResponse);
    }
}

If you execute this, the MDC is correctly filled during the Request logging. But on Response logging the MDC is empty!

ttddyy commented 4 years ago

Hi @membersound

Thanks for checking the post.

There are couple of things: First, I had typo on my blog post. For mdcFilter, it should be doOnNext instead of doOnRequest. The intention here is to perform MDC-set-operation(mdcFilter) immediately after WebClient exchange has happened in reactor thread. (Sorry for the confusion, I fixed this on my blog post)

    public static ExchangeFilterFunction mdcFilter = (request, next) -> {
        // here runs on main(request's) thread
        Map<String, String> map = MDC.getCopyOfContextMap();
        return next.exchange(request)
                .doOnNext(value -> {       //   <======= HERE
                    // here runs on reactor's thread
                    if (map != null) {
                        MDC.setContextMap(map);
                    }
                });
    };

Another thing is the order of applying filters to WebClient. Since the MDC-set-operation(mdcFilter) needs to happen BEFORE logResponse filter, mdcFilter needs to be added AFTER logResponse filter.

WebClient webClient = WebClient.builder()
        .filter(logResponse)
        .filter(logRequest)
        .filter(mdcFilter)
        .build();

Please try with these two changes.

membersound commented 4 years ago

@ttddyy thanks for the guidance, I could get it working when using your filter order. Could you go a bit more into detail why only the reverse order works? Are those filters evaluated buttomup? (I'd expect topdown evaluation instead).

Plus I had to change the doOnNext() to doOnEach(), as otherwise the MDC does not get propagated in case of errors!

       return next.exchange(request)
                .doOnEach(value -> {
                    // here runs on reactor's thread
                    if (map != null) MDC.setContextMap(map);
                });

Now the question is: is the Schedulers.onScheduleHook approach better than the Hooks.onEachOperator?

@simonbasle Thanks for adding a faq to MDC logging directly on a Mono/Flux. But this issue is about finding a global MDC configuration so that not each mono/flux has to be boilerplated with logging code. Maybe you could extend the docs regarding this?

mddubey commented 3 years ago

Hello Everyone,

In our webflux project we are using the publisher hooks as mentioned above to populate traceIDs which are used to trace logs across microservices. It is working very well functionally, but there are performance issues because of this. Please check below Without TraceIDs

For  1000 concurrent requests mean response time is 0.6 sec.
For  5000 concurrent requests mean response time is 1.5 sec.
For  10000 concurrent requests mean response time is 2.8 sec.
For  20000 concurrent requests mean response time is 4.8 sec.

With traceIds

For  1000 concurrent requests mean response time is 0.8 sec.
For  5000 concurrent requests mean response time is 3.2 sec.
For  10000 concurrent requests mean response time is 6.2 sec.
For  20000 concurrent requests mean response time is 17 sec.

If you see the response time increase exponentially with the load while without traceIDs it is much better. Any suggestion if there are better ways of using traceids without hampering the performance.

Note: There is a slight change in our implementation, wee are also calling the copyToMdC method in the onError and onComplete methods too, because we observed the traceIds missing in some error scenarios. The exact implementation can be found here.

HaloFour commented 3 years ago

I ran into the same problems. We're using OpenTelemetry and I was sticking the trace info into MDC and then using a Reactor operator to copy the MDC between threads. This seemed to work well but does not scale. Using features in log4j2 I've found a few mitigations that so far seem to work pretty well. They generally require leaning on storing the information in the OpenTelemetry context and using a Reactor operator to propagate that around instead.

First I am using a custom ContextDataProvider which reads Span.current() and if there is a current span it returns a map of the trace/span properties. We populate trace.id, span.id, span.name and optionally parent.id if the span is not a root span.

Second I am using a custom implementation of ThreadContextMap which primarily stores additional MDC content in the Context as a mutable map instead of as thread locals. This does require priming the Context with such a map at the beginning of each request as the Context itself is immutable, which I am doing in a WebFilter. Apart from the trace properties, we're storing about 8-12 values in MDC, including the request path, the mapped template URI and the controller class and method handling the request.

Our not-very-scientific k6-based benchmarks are showing that with 1000 RPS before these changes we were seeing a P95 of 2.89s, and afterwards P95 is down to 11.43ms. YMMV.

Guillaume-Mayer commented 3 years ago

Hello fellow programmers,

Sorry, but I don't understand how the guidance examples resolve the MDC logging problem.

The main point IMHO is to be able to set context variables so they can be displayed in each logging entry occurring in the same context after that, it's useful because it enables the third party library log entries to show those variables.

But in the guide examples, the MDC variables are set just before the log statement and unset just after it. Wouldn't it be just as simple to include the variables directly in the log message in this case?

I'm probably missing something, but still, if so, I guess I won't be the only one. Can that point be explained more clearly?

Anyway, many thanks for your impressive work

PaulDMooney commented 2 years ago

I'm interested in this as well. It would be interesting to have a way that the MDC can be populated for the entire life of a particular stream of events. Currently I have a way to set it for code that is in my control but as soon as I delegate to another library like mongodb driver the contextual information in the logs is lost.