micrometer-metrics / tracing

Provides tracing abstractions over tracers and tracing system reporters.
https://micrometer.io
Apache License 2.0
248 stars 45 forks source link

Baggage is lost within spring-statemachine #359

Closed davefarrelly closed 1 month ago

davefarrelly commented 1 year ago

I have a project that is using spring-statemachine which was previously using spring-cloud-sleuth for context-propagation but since updating to spring-boot 3.X.X we have migrated to micrometer-tracing/context-propagation.

I have noticed that we are losing our custom baggage fields after we switch states in the state machine, and have created a basic reproducer spring-statemachine-with-micrometer.zip which produces the following logs when the endpoint /statemachine/start is called. The TRACE_ID disappears after "State changed to: START"

{"@timestamp":"2023-09-18T13:37:57.609+01:00","@version":"1","message":"Starting state machine","logger_name":"com.davefarrelly.baggage.BaggageController","THREAD":"reactor-http-nio-3","level":"INFO","TRACE_ID":"f59ba968-79f2-476f-b240-0c93a4e0932d"}
{"@timestamp":"2023-09-18T13:37:57.758+01:00","@version":"1","message":"Start ction called!","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"reactor-http-nio-3","level":"INFO","TRACE_ID":"f59ba968-79f2-476f-b240-0c93a4e0932d"}
{"@timestamp":"2023-09-18T13:37:57.844+01:00","@version":"1","message":"State changed to: FIRST","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"reactor-http-nio-3","level":"INFO","TRACE_ID":"f59ba968-79f2-476f-b240-0c93a4e0932d"}
{"@timestamp":"2023-09-18T13:37:57.846+01:00","@version":"1","message":"First action called!","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-1","level":"INFO","TRACE_ID":"f59ba968-79f2-476f-b240-0c93a4e0932d"}
{"@timestamp":"2023-09-18T13:37:57.851+01:00","@version":"1","message":"State changed to: START","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"reactor-http-nio-3","level":"INFO","TRACE_ID":"f59ba968-79f2-476f-b240-0c93a4e0932d"}
{"@timestamp":"2023-09-18T13:37:57.891+01:00","@version":"1","message":"State changed to: SECOND","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-1","level":"INFO"}
{"@timestamp":"2023-09-18T13:37:57.891+01:00","@version":"1","message":"Second action called!","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-2","level":"INFO"}
{"@timestamp":"2023-09-18T13:37:57.898+01:00","@version":"1","message":"State changed to: THIRD","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-2","level":"INFO"}
{"@timestamp":"2023-09-18T13:37:57.898+01:00","@version":"1","message":"Third action called!","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-3","level":"INFO"}
{"@timestamp":"2023-09-18T13:37:57.904+01:00","@version":"1","message":"State changed to: END","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-3","level":"INFO"}

This behavior was only seen after the migration to micrometer-tracing.

Versions I am working with:

Also for reference I am attaching a version built using spring-boot-starter-webflux2.7.1 and spring-cloud-starter-sleuth:3.1.5 in which the TRACE_ID is correctly propagated throughout the statemachine spring-statemachine-with-sleuth.zip

MichaelHoOfficial commented 1 year ago

@marcingrzejszczak Could you help check we face this issue as well as critical in this moment as the TraceID no longer working.

marcingrzejszczak commented 11 months ago

AFAIR there was some reactive stuff going on in the example. Can you please update to the latest versions and see if the problem persists?

davefarrelly commented 11 months ago

Hi @marcingrzejszczak, tried with the latest versions:

spring-boot-starter-webflux:3.2.0
micrometer-core:1.12.0
micrometer-tracing:1.2.0
micrometer-tracing-bridge-brave:1.2.0
context-propagation:1.1.0

Unfortunately now I am not seeing the TRACE_ID in my logs at all:

{"@timestamp":"2023-12-08T13:33:50.529Z","@version":"1","message":"Starting state machine","logger_name":"com.davefarrelly.baggage.BaggageController","THREAD":"reactor-http-nio-3","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.574Z","@version":"1","message":"Start action called!","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"reactor-http-nio-3","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.599Z","@version":"1","message":"State changed to: FIRST","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"reactor-http-nio-3","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.6Z","@version":"1","message":"First action called!","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-1","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.601Z","@version":"1","message":"State changed to: START","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"reactor-http-nio-3","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.617Z","@version":"1","message":"State changed to: SECOND","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-1","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.627Z","@version":"1","message":"Second action called!","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-2","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.629Z","@version":"1","message":"State changed to: THIRD","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-2","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.639Z","@version":"1","message":"Third action called!","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-3","level":"INFO"}
{"@timestamp":"2023-12-08T13:33:50.642Z","@version":"1","message":"State changed to: END","logger_name":"com.davefarrelly.baggage.StateMachineConfiguration","THREAD":"parallel-3","level":"INFO"}

Maybe I am setting up the baggage incorrectly?

@Configuration
public class BaggageConfig {
    @Bean
    BaggageField traceIdField() {
        return BaggageField.create("TRACE_ID");
    }

    @Bean
    public CurrentTraceContext.ScopeDecorator mdcScopeDecorator(BaggageField traceIdField) {
        return MDCScopeDecorator.newBuilder()
                .clear()
                .add(CorrelationScopeConfig.SingleCorrelationField.newBuilder(traceIdField).flushOnUpdate().build())
                .build();
    }
}
github-actions[bot] commented 11 months ago

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

github-actions[bot] commented 10 months ago

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open.

davefarrelly commented 10 months ago

I'm not sure why this ticket was closed, I provided requested information. Is there any more info needed from me?

marcingrzejszczak commented 10 months ago

Hey, it's accidental - we've attached a bot to close issues and it must have closed the issue. I'm reopening it.

mrodal commented 2 months ago

we are having this same issue, did you find a solution @davefarrelly ?

davefarrelly commented 2 months ago

we are having this same issue, did you find a solution @davefarrelly ?

@mrodal I haven't gotten around to actually fixing it yet, but I now believe it is due to a bad implementation on our side. We have implemented the action on our states by using the standard Action interface, but I suspect we actually need to switch to the ReactiveAction to get the context carried through the states.

mrodal commented 2 months ago

We are using spring webmvc and face the same issue of losing the tracing context. Im debugging the spring state machine code and the context seems to be getting lost on ReactiveStateMachineExecutor when using a Sinks.many().multicast()

I believe the issue is on the spring-statemachine library. This didnt happen on the previous version we were using (2.x) that doesnt use the reactive executor

mrodal commented 2 months ago

We fixed it by removing Hooks.enableAutomaticContextPropagation() from the application starter. The issue can be reproduced with this code

        log.info("log 1");
        var sink = Sinks.one();

        sink.asMono().flatMap(s->{
            log.info("log 3"); // Context missing here
            return Mono.just(1);
        }).subscribe();

        sink.emitValue("", Sinks.EmitFailureHandler.FAIL_FAST);

related to: https://github.com/reactor/reactor-core/issues/3887

marcingrzejszczak commented 1 month ago

cc @chemicL

chemicL commented 1 month ago

No need to open so many issues and keep discussing this so widely :) The spring-statemachine one should suffice, as that's where the problem lies. Please check my description of the problem: https://github.com/reactor/reactor-core/issues/3887#issuecomment-2340359937

jonatan-ivanov commented 1 month ago

Duplicate of https://github.com/spring-projects/spring-statemachine/issues/1162