opentracing / opentracing-java

OpenTracing API for Java. 🛑 This library is DEPRECATED! https://github.com/opentracing/specification/issues/163
http://opentracing.io
Apache License 2.0
1.68k stars 344 forks source link

Managed scope dies in an async chain. #318

Closed dzonekl closed 5 years ago

dzonekl commented 5 years ago

When doing requests in async fashion using Completable futures, the active span or Scope is not available throughout the subsequent steps of the async. chain.

So:

CompletableFuture<..>  future = ... (Produces a future with a custom pool... which sets an active span)

future.thenCompose( o -> {
...
      Span activeSpan = tracer.activeSpan();
      // activeSpan == null;
} ) 

I am sure this is a known limitation of the default ScopeManager, which is the ThreadLocalScopeManager, what are the alternatives?

yurishkuro commented 5 years ago

You need to configure the executor pools used by futures to be instrumented so that they propagate the span to new threads.

sjoerdtalsma commented 5 years ago

I believe the ‘thenCompose’ is called from the same background thread the span was propagated to but in a different completionstage where the new scope for the propagated span was already closed.

This problem could possibly be solved by using the ContextAwareCompletableFuture from my library in combination with the OpenTracing support jar.

See https://github.com/talsma-ict/context-propagation for more details. I believe there is also an issue filed (and mostly fixed) about the CompletionStage problem of CompletableFuture there.

Please ping me for further details.. (sorry, on my phone now)

Met vriendelijke groet, Sjoerd Talsma

Op 25 okt. 2018 om 22:46 heeft Yuri Shkuro notifications@github.com het volgende geschreven:

You need to configure the executor pools used by futures to be instrumented so that they propagate the span to new threads.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

sjoerdtalsma commented 5 years ago

Ah, here it is:

dzonekl commented 5 years ago

Thank You, @sjoerdtalsma @yurishkuro , An underlying library produces the futures, of which I have no immediate control. Out of curiosity will ContextAwareCompletableFuture end up in the opentracing code ? What I need is a practical solution, What I am trying now is to use a custom ScopeManager, which persists in the described case.

sjoerdtalsma commented 5 years ago

@dzonekl If you have no control over the CompletableFuture you got, you could make sure that the CompletableFuture you thenCompose with, is produced by an ExecutorService threadpool that is context aware (like @yurishkuro suggested)

sjoerdtalsma commented 5 years ago

Out of curiosity will ContextAwareCompletableFuture end up in the opentracing code

I currently have no plans to do so; they serve different purposes: OpenTracing provides an API, my context-propagation is a library that propagates (all kinds of) contexts.

dzonekl commented 5 years ago

...you could make sure that the CompletableFuture you thenCompose with, is produced by an ExecutorService threadpool that is context aware (like @yurishkuro suggested)

you mean ?

thenComposeAsync(
        Function<? super T, ? extends CompletionStage<U>> fn,
        Executor executor)

mmmh, my goal is to make this work without having to touch the calling code of this library, async flows are already hard to get cleanly readable. I guess my best bet is to actually instrument the library properly, or use a custom ScopeManager.

Thanks for your help.

dzonekl commented 5 years ago

In order to understand what is happening, I created a custom scope manager, which logs what is happening [1]. Prior to thread ault-executor-3, being started, we see the scope being unset (toRestore == null), done by the TracingHandlerInterceptor#afterConcurrentHandlingStarted, later on the original traceid is restored. Why is the active scope closed when the handler starts an async request? What is the idea?

[1] New traceId, when doing Async request.

2018-10-28 09:45:58.615  INFO 47257 --- [nio-8080-exec-1] e.b.e.config.AxonConfig$FutureScope      : New scope
2018-10-28 09:45:58.615  INFO 47257 --- [nio-8080-exec-1] b.e.c.AxonConfig$OpenApiApplicationScope : Setting scope: FutureScope{, wrapped=fba08ffe035cf5f3:fba08ffe035cf5f3:0:1 - POST, finishOnClose=false, toRestore=nothing to restore}
2018-10-28 09:45:58.617  INFO 47257 --- [nio-8080-exec-1] e.b.e.a.rest.AssetDefinitionController   : Creating asset definition: class Token {
    class Asset {
        symbol: KLH
        longName: x coin
        precision: 8
        minimumQuantity: 0.01
        active: true
    }
}
2018-10-28 09:45:58.617  INFO 47257 --- [nio-8080-exec-1] e.b.e.config.AxonConfig$FutureScope      : New scope
2018-10-28 09:45:58.617  INFO 47257 --- [nio-8080-exec-1] b.e.c.AxonConfig$OpenApiApplicationScope : Setting scope: FutureScope{, wrapped=fba08ffe035cf5f3:cd7d093fb5ae1814:fba08ffe035cf5f3:1 - query, finishOnClose=true, toRestore=fba08ffe035cf5f3:fba08ffe035cf5f3:0:1 - createAssetDefinition}
2018-10-28 09:45:58.618  INFO 47257 --- [nio-8080-exec-1] i.j.internal.reporters.LoggingReporter   : Span reported: fba08ffe035cf5f3:cd7d093fb5ae1814:fba08ffe035cf5f3:1 - query
2018-10-28 09:45:58.619  INFO 47257 --- [nio-8080-exec-1] e.b.e.config.AxonConfig$FutureScope      : Restoring scope
2018-10-28 09:45:58.619  INFO 47257 --- [nio-8080-exec-1] b.e.c.AxonConfig$OpenApiApplicationScope : Setting scope: FutureScope{, wrapped=fba08ffe035cf5f3:fba08ffe035cf5f3:0:1 - createAssetDefinition, finishOnClose=false, toRestore=nothing to restore}
2018-10-28 09:45:58.619  INFO 47257 --- [nio-8080-exec-1] e.b.e.config.AxonConfig$FutureScope      : Restoring scope
2018-10-28 09:45:58.619  INFO 47257 --- [nio-8080-exec-1] b.e.c.AxonConfig$OpenApiApplicationScope : Unsetting scope
2018-10-28 09:45:58.628  INFO 47257 --- [ault-executor-3] e.b.e.config.AxonConfig$FutureScope      : New scope
2018-10-28 09:45:58.628  INFO 47257 --- [ault-executor-3] b.e.c.AxonConfig$OpenApiApplicationScope : Setting scope: FutureScope{, wrapped=9c3dd7f238513fdc:9c3dd7f238513fdc:0:1 - command, finishOnClose=true, toRestore=nothing to restore}
2018-10-28 09:45:58.629  INFO 47257 --- [ault-executor-3] i.j.internal.reporters.LoggingReporter   : Span reported: 9c3dd7f238513fdc:9c3dd7f238513fdc:0:1 - command
2018-10-28 09:45:58.629  INFO 47257 --- [ault-executor-3] e.b.e.config.AxonConfig$FutureScope      : Restoring scope
2018-10-28 09:45:58.629  INFO 47257 --- [ault-executor-3] b.e.c.AxonConfig$OpenApiApplicationScope : Unsetting scope
2018-10-28 09:45:58.661  INFO 47257 --- [nio-8080-exec-2] e.b.e.config.AxonConfig$FutureScope      : New scope
2018-10-28 09:45:58.661  INFO 47257 --- [nio-8080-exec-2] b.e.c.AxonConfig$OpenApiApplicationScope : Setting scope: FutureScope{, wrapped=fba08ffe035cf5f3:fba08ffe035cf5f3:0:1 - createAssetDefinition, finishOnClose=false, toRestore=nothing to restore}
2018-10-28 09:45:58.662  INFO 47257 --- [nio-8080-exec-2] e.b.e.config.AxonConfig$FutureScope      : Restoring scope
2018-10-28 09:45:58.662  INFO 47257 --- [nio-8080-exec-2] b.e.c.AxonConfig$OpenApiApplicationScope : Unsetting scope
2018-10-28 09:45:58.662  INFO 47257 --- [nio-8080-exec-2] e.b.e.config.AxonConfig$FutureScope      : New scope
2018-10-28 09:45:58.662  INFO 47257 --- [nio-8080-exec-2] b.e.c.AxonConfig$OpenApiApplicationScope : Setting scope: FutureScope{, wrapped=fba08ffe035cf5f3:fba08ffe035cf5f3:0:1 - createAssetDefinition, finishOnClose=true, toRestore=nothing to restore}
2018-10-28 09:45:58.663  INFO 47257 --- [nio-8080-exec-2] i.j.internal.reporters.LoggingReporter   : Span reported: fba08ffe035cf5f3:fba08ffe035cf5f3:0:1 - createAssetDefinition
2018-10-28 09:45:58.663  INFO 47257 --- [nio-8080-exec-2] e.b.e.config.AxonConfig$FutureScope      : Restoring scope
2018-10-28 09:45:58.663  INFO 47257 --- [nio-8080-exec-2] b.e.c.AxonConfig$OpenApiApplicationScope : Unsetting scope
dzonekl commented 5 years ago

Hey guys, I digged further and stumbled upon opentracing, spring-cloud-core async support, which seems to deal with the same issue:

https://github.com/opentracing-contrib/java-spring-cloud/blob/master/instrument-starters/opentracing-spring-cloud-core/src/main/java/io/opentracing/contrib/spring/cloud/async/instrument/TracedThreadPoolTaskExecutor.java

sjoerdtalsma commented 5 years ago

Hey guys, I digged further and stumbled upon opentracing, spring-cloud-core async support, which seems to deal with the same issue:

..deal with, or suffers from the same issue?
I doubt the callbacks in the ListenableFuture are running with the correct active span..

dzonekl commented 5 years ago

@sjoerdtalsma are you saying the Async, code is not working correctly?

sjoerdtalsma commented 5 years ago

@sjoerdtalsma are you saying the Async, code is not working correctly?

I'm by no means a Spring expert, but what I gather is that a ListenableFuture is similar in concept to a CompletableFuture (probably in Spring before Java 8 existed). In that sense the callbacks might provide similar behaviour as the CompletionStage concept from the CompletableFuture (the then... methods).

The span is propagated into the 'main' execution body of the original Future, but I don't know where the callback is called from. If it's from 'within' the main execution body (don't see how, but ok) all is well. Otherwise, the code in the ListenableFuture callback will most likely not be aware of the active span. You'd have to test that though.

This becomes very complex very quick and it's hard to reason about things without concrete test cases. We haven't even touched upon undetermined codepaths like A.runAfterEither(B, Runnable) --> which active span should be active in the runnable, the active span from A or from B ?

dzonekl commented 5 years ago

@sjoerdtalsma I tried your context library [1], with logging on.

The executor is configured to create threads prefixed with trace-d, from the log I can see the ContextAwareExecutorService in action, and the snapshots being reactivated. Unfortunately the 2nd thread created, has a different traceid reported, so it seems the propagation didn't work.

I pass on this executor, to regular CompletableFuture.then..Async(...)methods.

Could it be the GlobalTracer needs to be configured for this to work, or am I not using the library properly? (I don't use your completablefuture , as these are produced by the library I call...).

[1]

2018-11-07 17:15:04.808 INFO 27441 --- [nio-8080-exec-1] e.b.e.a.rest.AssetDefinitionController : Creating asset definition: class Token { class Asset { symbol: MVB longName: x coin precision: 8 minimumQuantity: 0.01 maximumQuantity: 2 active: true } } 2018-11-07 17:15:04.818 TRACE 27441 --- [nio-8080-exec-1] n.t.context.ContextManagers : Active context of nl.talsmasoftware.context.opentracing.SpanManager@425bcbb4 added to new snapshot: nl.talsmasoftware.context.opentracing.SpanManager$ScopeContext@7c91894e. 2018-11-07 17:15:04.822 TRACE 27441 --- [nio-8080-exec-1] nl.talsmasoftware.context.Timing : nl.talsmasoftware.context.opentracing.SpanManager.getActiveContext: 693,490ns 2018-11-07 17:15:04.823 TRACE 27441 --- [nio-8080-exec-1] nl.talsmasoftware.context.Timing : nl.talsmasoftware.context.ContextManagers.createContextSnapshot: 7,080,102ns 2018-11-07 17:15:04.825 TRACE 27441 --- [ traced-1] n.t.context.ContextManagers : Context reactivated from snapshot by nl.talsmasoftware.context.opentracing.SpanManager@4fe3514b: nl.talsmasoftware.context.opentracing.SpanManager$ScopeContext@ae1c6d3. 2018-11-07 17:15:04.826 TRACE 27441 --- [ traced-1] nl.talsmasoftware.context.Timing : nl.talsmasoftware.context.opentracing.SpanManager.initializeNewContext: 994,711ns 2018-11-07 17:15:04.827 TRACE 27441 --- [ traced-1] nl.talsmasoftware.context.Timing : nl.talsmasoftware.context.ContextSnapshot.reactivate: 2,925,039ns 2018-11-07 17:15:05.834 INFO 27441 --- [ traced-1] i.j.internal.reporters.LoggingReporter : Span reported: 84a0ac2cb1d3159e:47245443fc1580a7:84a0ac2cb1d3159e:1 - query 2018-11-07 17:15:05.839 TRACE 27441 --- [ traced-1] n.t.context.delegation.RunnableAdapter : Call result ignored by RunnableAdapter: null 2018-11-07 17:15:05.863 TRACE 27441 --- [ault-executor-0] n.t.context.ContextManagers : Active context of nl.talsmasoftware.context.opentracing.SpanManager@79ed8aa7 added to new snapshot: nl.talsmasoftware.context.opentracing.SpanManager$ScopeContext@1a335116. 2018-11-07 17:15:05.863 TRACE 27441 --- [ault-executor-0] nl.talsmasoftware.context.Timing : nl.talsmasoftware.context.opentracing.SpanManager.getActiveContext: 159,219ns 2018-11-07 17:15:05.863 TRACE 27441 --- [ault-executor-0] nl.talsmasoftware.context.Timing : nl.talsmasoftware.context.ContextManagers.createContextSnapshot: 842,939ns 2018-11-07 17:15:05.864 TRACE 27441 --- [ traced-2] n.t.context.ContextManagers : Context reactivated from snapshot by nl.talsmasoftware.context.opentracing.SpanManager@6701fd74: nl.talsmasoftware.context.opentracing.SpanManager$ScopeContext@22037ff0. 2018-11-07 17:15:05.864 TRACE 27441 --- [ traced-2] nl.talsmasoftware.context.Timing : nl.talsmasoftware.context.opentracing.SpanManager.initializeNewContext: 111,767ns 2018-11-07 17:15:05.864 TRACE 27441 --- [ traced-2] nl.talsmasoftware.context.Timing : nl.talsmasoftware.context.ContextSnapshot.reactivate: 769,370ns 2018-11-07 17:15:06.117 INFO 27441 --- [ traced-2] i.j.internal.reporters.LoggingReporter : Span reported: 7d7aff76f16affbe:7d7aff76f16affbe:0:1 - command 2018-11-07 17:15:06.117 TRACE 27441 --- [ traced-2] n.t.context.delegation.RunnableAdapter : Call result ignored by RunnableAdapter: null 2018-11-07 17:15:06.223 INFO 27441 --- [nio-8080-exec-2] i.j.internal.reporters.LoggingReporter : Span reported: 84a0ac2cb1d3159e:84a0ac2cb1d3159e:0:1 - createAssetDefinition

sjoerdtalsma commented 5 years ago

@dzonekl Yes, my library uses the GlobalTracer.

note: This repository is about opentracing and maybe folks who are not interested in my library may appreciate it if we move our discussion there?

dzonekl commented 5 years ago

@sjoerdtalsma I feel this question is more appropriate to be discussed in the opentracing spring cloud community git repo. As you might have noticed from the log, the async request is actually invoking Grpc to communicate over the wire, where Netty is used and a default executor. My suspicion is that, this is where the context is not propagated. So the question is really how can I instrument Grpc with Opentracing (I believe Grpc uses opensensus). Thanks for all your help sofar, I learned a few things :-)

anthonyvk commented 4 years ago

You need to configure the executor pools used by futures to be instrumented so that they propagate the span to new threads.

May I know how to achieve that?

whiskeysierra commented 4 years ago

https://github.com/opentracing-contrib/java-concurrent

anthonyvk commented 4 years ago

You need to configure the executor pools used by futures to be instrumented so that they propagate the span to new threads.

May I know how to achieve that?

whiskeysierra commented 4 years ago

See my link above

DarthJonathan commented 2 years ago

Anyone figured how to use

https://github.com/opentracing-contrib/java-concurrent

?

So we need to specify the threadpool everytime we're running thenCompose / thenApply? I'm using aws async client to spawn new thread using the traced threadpool. Example

        ExecutorService tracedExecutorService = new TracedExecutorService(
                Executors.newFixedThreadPool(50),
                tracer
        );

        S3AsyncClient.builder()
                            .overrideConfiguration(builder -> builder.addExecutionInterceptor(new TracingExecutionInterceptor(this.tracer)))
                            .asyncConfiguration(builder -> builder.advancedOption(
                                    SdkAdvancedAsyncClientOption.FUTURE_COMPLETION_EXECUTOR,
                                    tracedExecutorService
                                )
                            )
...

Edit : even this does not work