spring-projects / spring-boot

Spring Boot helps you to create Spring-powered, production-grade applications and services with absolute minimum fuss.
https://spring.io/projects/spring-boot
Apache License 2.0
75.47k stars 40.76k forks source link

No transactionId anymore in implementation classes of ErrorWebExceptionHandler #42600

Closed sigriswil closed 1 month ago

sigriswil commented 1 month ago

Spring boot version: 3.3.2 Spring cloud dependencies version: 2023.0.3 sample issue spring cloud gateway repository: https://github.com/sigriswil/spring-gateway-sample

It contains a SpringBoot application that can be started locally. When reaching the endpoint /, it raises an exception which triggers the class GlobalErrorWebExceptionHandler which implements ErrorWebExceptionHandler.

In the main method, Hooks.enableAutomaticContextPropagation() and ContextRegistry.getInstance().registerThreadLocalAccessor(...) are called in FirstGatewayFilter.

Add the MDC's trasactionId to the FirstGatewayFilter, An exception occurs in the SecondGatewayFilterFactory. transactionId is output from SecondGatewayFilterFactory and FirstGatewayFilter, but transactionId is not output from GlobalErrorWebExceptionHandler.

INFO 48248 --- c.e.s.FirstGatewayFilter: transactionId created. MDC: {transactionId=2e58da57-2f01-4d29-b63d-61b2e9682301}
INFO 48248 --- c.e.s.SecondGatewayFilterFactory: MDC: {transactionId=2e58da57-2f01-4d29-b63d-61b2e9682301}
ERROR 48248 --- c.e.s.GlobalErrorWebExceptionHandler: MDC: null
...
INFO 48248 --- c.e.s.FirstGatewayFilter: doFinally, MDC: {transactionId=2e58da57-2f01-4d29-b63d-61b2e9682301}
philwebb commented 1 month ago

I think someone from the Micrometer, Reactor or Spring Cloud team will need to help with this issue. Spring Boot isn't really involved at this level.

It looks to me like io.micrometer.context.DefaultContextSnapshotFactory.setAllThreadLocalsFrom is clearing the MDC thread local.

@spencergibb @chemicL are you able to help identify what's causing the issue and which project can help?

chemicL commented 1 month ago

I have no knowledge about Spring Cloud Gateway's lifecycle with the AbstractGatewayFilterFactory but this situation where there is a global instance of ErrorWebExceptionHandler sounds similar to the case encountered with Spring Framework where observability was added as a WebFilter which was at a more internal layer than the catch-all ExceptionHandlingWebHandler which is outside (check related discussion and fix). So MDC keys set in one AbstractGatewayFilterFactory (FirstGatewayFilterFactory) won't be visible to any filter that is ordered prior to it. I am certain @spencergibb is able to provide guidance :)

Regarding the comment of clearing thread locals: If these MDC entries were not cleared by the context-propagation then you'd face leaks from one request to another. The solution to the problem is proper ordering of call scopes and making sure things are properly cleared when exiting a scope and handling something unrelated.