micrometer-metrics / context-propagation

Context Propagation API
Apache License 2.0
74 stars 21 forks source link

Propagation of Tracing Context to AWS SDK SqsAsyncClient #262

Open sondemar opened 3 days ago

sondemar commented 3 days ago

Hi, I am trying to propagate the tracing context with the Micrometer Observation API while using the AWS SDK SqsAsyncClient, which operates on the Netty Event Loop Model.

Even though I have registered ObservationThreadLocalAccessor and ObservationAwareSpanThreadLocalAccessor with ContextRegistry, and instrumented executors (configuration details of SqsAsyncClient I described in this discussion):

executor.setTaskDecorator(new ContextPropagatingTaskDecorator());

I am still unable to access the currently opened scope because the executors are invoked from the Netty EventLoop thread.

I am seeking any possible workaround until the issue is resolved.

The only solution I can think of involves using an implementation of ContextAccessor and a custom ObservationContextHolder:

executor.setTaskDecorator(runnable -> factory.captureAll(ObservationContextHolder.storedValues()).wrap(runnable));

where ObservationContextHolder properly stores values for the keys ObservationThreadLocalAccessor.KEY and ObservationAwareSpanThreadLocalAccessor.KEY.

However, this solution is not thread-safe.

Could you please advise on a proper workaround or solution?

jonatan-ivanov commented 2 days ago

For the sake of completeness, there is also https://github.com/awspring/spring-cloud-aws/issues/646 and https://github.com/netty/netty/issues/8546. What you can do as a user is adding a πŸ‘πŸΌ on the issue description and a comment that you need this. (I saw your comments on the spring-cloud-aws issue. πŸ‘πŸΌ)

Reactor Netty is instrumented, as far as I know it has support on the event loop/network level, maybe that's something you can enable and reuse?

Also, as far as I can understand, if you want to instrument SQS, you might be on the wrong level for that (network/http level). If you instrument the event loop and add tracing information to the HTTP request that you send to AWS, I don't think that information will be propagated to the client. I think what you should do instead is adding tracing information to the SQS message header (that will be sent to AWS over HTTP) and those are sent to the client when the SQS message is delivered. So instead of instrumenting the low-level HTTP client you should (wrap? and) instrument the SQS client itself. Does this make sense?

Don't get me wrong, instrumenting Netty could be also useful but I think that's step two and you might not need to do it eventually.

sondemar commented 2 days ago

For the sake of completeness, there is also awspring/spring-cloud-aws#646 and netty/netty#8546. What you can do as a user is adding a πŸ‘πŸΌ on the issue description and a comment that you need this. (I saw your comments on the spring-cloud-aws issue. πŸ‘πŸΌ)

I am already engaged in awspring/spring-cloud-aws#646 with this PR.

Reactor Netty is instrumented, as far as I know, it has support on the event loop/network level, maybe that's something you can enable and reuse.

In the discussion, I pointed out that the AWS SDK project manages the Reactor API under the hood and only exposes the SqsAsyncClient based on asynchronous CompletableFuture.

If you instrument the event loop and add tracing information to the HTTP request that you send to AWS, I don't think that information will be propagated to the client. I think what you should do instead is add tracing information to the SQS message header (that will be sent to AWS over HTTP) and those are sent to the client when the SQS message is delivered. So instead of instrumenting the low-level HTTP client you should (wrap? and) instrument the SQS client itself. Does this make sense?

I am already propagating tracing information to SQS with message headers (by using a custom version of SenderContext), but the key is that I would also like to maintain observability of the entire process of Spring Cloud AWS API invocation (including tracing, logging, and metrics).