micrometer-metrics / micrometer

An application observability facade for the most popular observability tools. Think SLF4J, but for observability.
https://micrometer.io
Apache License 2.0
4.45k stars 978 forks source link

Current span in Tracer lost after WebClient call - Span id differs in between Request vs Response #5472

Open pratapjai143 opened 6 days ago

pratapjai143 commented 6 days ago

Describe the bug

After sending request to downstream service using webclient the span id with the context getting lost while the response returned from the webclient.

I am logging the request response using a ExchangeFilterFunction and the span id in request vs response is different.

image

image

Environment

Local development environment

To Reproduce

https://github.com/pratapjai143/webclient-micrometer-demo

Please run this project and send request to http://localhost:7070/parent-service/parent-web and you can see in the log that the span id is different between request and response.

Expected behavior Span id should be retained in the response of web client

Additional context Related issues:-

(https://github.com/micrometer-metrics/micrometer/issues/3785)

marcingrzejszczak commented 6 days ago

cc @chemicL @bclozel

chemicL commented 5 days ago

The trace-id is the same, so both spans have the same parent. And it makes sense. The request span is on behalf of the server which issues the remote request. The response span is associated with the child span generated on behalf of the remote request. In case you'd want to see the same span, perhaps there's an answer how to hook in to WebClient after when the child Observation is spawned - perhaps there are some interceptors at that level?

jonatan-ivanov commented 5 days ago

In order to verify @chemicL's comment, please try to print out the parent spanId too, does that behavior makes sense to you?

pratapjai143 commented 5 days ago

Hi @chemicL @jonatan-ivanov Thanks for your response. For a single request it makes sense to track the request and response based on the trace id as it's returning in the response.

For an e.g. now I have a new endpoint http://localhost:7070/parent-service/parent-web-bulk which takes an array and iterate over the array and make a call to the downstream for each element within that array.

Request:-

image

End Point Code - ParentController :-

image

As for a single request to parent controller, we are sending multiple request to downstream Child Controller so how we can related the request response for each request that we send to child controller as the span id changes in the response.

Request & Response Log:- Span Id differs and same span id maintains for all the downstream request.

image

Would you please guide me how to resolve this issue as in nutshell we would like to relate the request and response that we send to downstream?

Thanks

chemicL commented 5 days ago

Consider logging in your ExchangeFilterFunction not during assembly but at the runtime of the reactive chain -> instead of e.g.

log.info(...)
return Mono.just(clientRequest);

do

return Mono.defer(() -> {
    log.info(...);
    return Mono.just(clientRequest);
});

I feel this might do the trick. It might also be something the framework could consider -> evaluating the user-provided filter functions at runtime instead of assembly time. @bclozel WDYT?

pratapjai143 commented 5 days ago

Thanks heaps @chemicL now the correct trace and span id are printed in the log and I can related the request and response.

image

Now there is another issue. As I am sending multiple request to child service for one incoming request to the parent service so I need to trace the request in the in child service using the span id and trace id combination as the trace id is same for all the request received by the child service.

Parent Log:-

image

Child Log:-

image

Grand Child Log:-

image

In real time scenario the span id which I am logging in my parent service that same span id should propagate to the downstream so that if I want to trace any request in case of any exception then I should be able to trace.

As now there is no way I can trace the different request in my downstream system so need to fix this.

Any help would be highly appreciated.

Thanks.

chemicL commented 4 days ago

I am getting confused a bit. The hierarchical nature of spans is something that you can use to your advantage, e.g. by viewing them in the GUI of the distributed collector of your choice, e.g. Jaeger. The trace id is the one that allows identifying a tree-like behaviour when delegating a server call to multiple, potentially cascading, downstream services. In case you really need what you described, it feels this SO answer by @jonatan-ivanov should help: https://stackoverflow.com/questions/77044612/should-micrometer-generate-a-new-span-each-time-a-spring-boot-server-receives-a

Consider consulting the documentation on distributed tracing a bit more. The fact that the server starts its own span as a child of the incoming one can be used to see where time is spent - whether the client spends more time processing or the server.

jonatan-ivanov commented 3 days ago

I'm confused too, I'm not sure I understand what you would like to achieve. On top of what Dariusz already wrote above, to me this seems working as intended, every single operation has its own spanId and the operations that belong together share the same traceId, please also see: https://docs.micrometer.io/tracing/reference/glossary.html

I also encourage you to look at the data using the GUI of a distributed tracing backend like Zipkin, Jaeger, Grafana/Tempo.

pratapjai143 commented 3 days ago

Thanks @jonatan-ivanov @chemicL for your response. Our case is very unique.

We have three services e.g. Parent, Child and Grand Child.

For a single request to my Parent service from upstream, I am sending multiple requests concurrently using WebClient to Child and in turn Child service is making a call to Grand Child service.

For an e.g. my Parent service receives a single request and for that single request to my parent service, we are triggering multiple requests to the Child service, so if the span id in the child will be different what I am printing in my log of Parent service, then there is no way I can trace the request in my child service as the trace id will be same for all the requests sent to the Child service for a single request to parent service.

So we need a way how to trace a request using the span id (Parent -> Child -> Grand Child) as the trace id remains same across all the services for a given request to my parent service so using trace id we can't do much in terms of tracing a request.

Hopefully the problem statement is clear now and will get some help in resolving this issue.

Thanks.

jonatan-ivanov commented 16 hours ago

For a single request to my Parent service from upstream, I am sending multiple requests concurrently using WebClient to Child and in turn Child service is making a call to Grand Child service.

Let's see if I understand your use-case, let's say you have these three services: A, B, C respectively and the following workflow:

In this scenario (this is not very unique), there is one trace (one traceId) and 4 spans (all with different ids). Also, the parent ids will reflect the parent child-relationships of the calls (see above). Your situation is a bit more complicated since there will be at least two spans for each http call (one for the client and one for the server) but the gist of it is the same.

so if the span id in the child will be different what I am printing in my log of Parent service, then there is no way I can trace the request in my child service as the trace id will be same for all the requests sent to the Child service for a single request to parent service.

You can ignore all the spanIds for this, the only thing you need to look at is the traceId to see which spans belong together. If you want the spanIds what you print out in the logs match, please see what @chemicL wrote above:

In case you'd want to see the same span, perhaps there's an answer how to hook in to WebClient after when the child Observation is spawned - perhaps there are some interceptors at that level?

Please notice that this is likely to be a WebClient feature and does not need a change in Micrometer, I would recommend opening am issue for Spring Framework.

You can also look at Brave's joint span feature that mimics the behavior you described (continues the client span on the server side): https://github.com/spring-projects/spring-boot/pull/35165