opentracing-contrib / java-xray-tracer

Java OpenTracing implementation backed by AWS X-Ray
Apache License 2.0
23 stars 9 forks source link

AWSXRaySpan.finish cannot find subsegment #6

Closed will-sargent-eero closed 2 years ago

will-sargent-eero commented 5 years ago

Tried out the Opentracing API integration today -- it loaded successfully, but showed this on finishing the request:

[error] i.o.c.a.x.AWSXRaySpan - [] Failed to close underlying AWS trace Entity: Failed to end subsegment: subsegment cannot be found.
com.amazonaws.xray.exceptions.SubsegmentNotFoundException: Failed to end subsegment: subsegment cannot be found.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.amazonaws.xray.strategy.RuntimeErrorContextMissingStrategy.contextMissing(RuntimeErrorContextMissingStrategy.java:17)
    at com.amazonaws.xray.contexts.ThreadLocalSegmentContext.endSubsegment(ThreadLocalSegmentContext.java:54)
    at com.amazonaws.xray.AWSXRayRecorder.endSubsegment(AWSXRayRecorder.java:464)
    at com.amazonaws.xray.entities.SubsegmentImpl.close(SubsegmentImpl.java:122)
    at io.opentracing.contrib.aws.xray.AWSXRaySpan.finish(AWSXRaySpan.java:396)
    at io.opentracing.contrib.aws.xray.AWSXRaySpan.finish(AWSXRaySpan.java:377)
will-sargent-eero commented 5 years ago

I should mention that this is happening in the context of a Play application, which is an asynchronous webframework all the way down, so anything based on ThreadLocal (which it looks like ThreadLocalSegmentContext is based on) is not going to work...

ashleymercer commented 5 years ago

As it happens I'm also using Play, so hopefully I can help you out :) although in fact this should apply to any async / threaded code. (Also, forgive the very verbose answer, but I thought it might be useful to have a reference since this issue has been raised a couple of times).

I don't think this issue is specific to the X-Ray libraries, since the fundamental problem is that we need to somehow keep track of the current Span, and there are only really two ways of doing this: either explicitly, or implicitly [1]. The same would be true of any other tracing library.

Explicit passing

For single threaded code inside a single method, this is very easy: when the Span is activated you get back a corresponding Scope instance, and you can just call Scope.close().

For single threaded code which calls out to another method, and where the Span should be completed inside that other method, you're probably going to need to pass the Scope instance down as a method parameter.

For multi-threaded code you'll have to re-activate the current Span inside your other thread by doing something like (again, assuming Scala):

val tracer: Tracer = ...

val outerScope: Scope = tracer.startSpan("span-name").startActive(false)
val someFuture: Future[_] = ...

// We immediately close the "outer" scope but this *doesn't* finish
// the `Span` (since we used `startActive(false)`). This prevents
// our `Span` from leaking into other code if this current thread
// gets reused in e.g. a thread pool
outerScope.close()

// NB use `transform` here (rather than `onComplete`) since we want
// to enforce that the `Span` is closed *before* the `Future` returns
// This helps ensure that e.g start and stop times are realistic
someFuture transform { result =>

  // This sets re-activates the `Span` on the current thread which
  // might be different from the thread where it was created above
  val innerScope = tracer.scopeManager.activate(outerScope.span, true)

  // Maybe do something else with `result`

  // Since we called `activate` with `finishOnClose = true`, this will
  // automatically finish the span and send it to X-Ray
  innerScope.close()
  result
}

Implicit passing

Explicit passing has some drawbacks: even for vanilla code you end up passing a lot of Scope parameters around. This is why the AWS X-Ray library uses ThreadLocal by default [1]: it saves you having to pass the Scope around by assuming (implicitly) that you're always going to want to tie the current thread to a specific Span. Other similar solutions exist in other frameworks (e.g. request-scoped beans in spring).

For threaded code, the only viable solution I've found for Java / Scala is using aspect-oriented programming to automatically handle copying Span instances across threads: typically, you want to

For example, there's a Scala-specific tracing library called kamon which custom AOP library called kanela to do just this; their implementation of weaving scala Future automatically can be seen here.

I've had success with using Eclipse aspectj to do much the same, although that code is currently proprietary so I'm afraid I can't share it.

There's also the opentracing-contrib/java-specialagent library, although I haven't used that but it seems do something similar (runtime weaving of bytecode).

[1] Assuming you're using Scala, I mean "implicitly" in the more general sense rather than the Scala implicit sense [2] There's an exception to this in lambda where it does some additional magic, but that's a complication for another day...

will-sargent-eero commented 5 years ago

I'm using Lightbend Cinnamon as a backend: https://developer.lightbend.com/docs/telemetry/current/extensions/opentracing/opentracing.html

It's using instrumentation to pass the active span as a continuation and automatically instrumenting futures and actions using Byte Buddy, which has the advantage that the opentracing API doesn't need to be exposed. I think it should be possible to customize it, but first I need to get X-Ray to never use a threadlocal ever.

ashleymercer commented 5 years ago

Ahh so it sounds like you have Cinnamon / OpenTracing integration set up (to instrument your Play / akka / Future code), but using X-Ray as the tracer implementation? I've not used Cinnamon myself, but glancing through the docs it looks like:

This library mimics the ThreadLocal behaviour of the X-Ray library in order to make sure that the internal states of both are kept consistent. It works for me with the "vanilla" weaving approach described above, as long as Spans are always correctly activated and deactivated through the corresponding ScopeManager / Scope objects.

The problem here is that we don't know how Cinnamon goes about activating and deactivating Spans - if it's not calling ScopeManager then all bets are off. Having said that, I would hope it is using those methods, since that's the intended design of the API: maybe worth raising the issue with Lightbend support? Unfortunately Cinnamon is (I believe) closed source, so it's not easily inspectable.

first I need to get X-Ray to never use a threadlocal ever

I suspect what you'd need here is a custom SegmentContext implementation (along with the corresponding SegmentContextResolver) and wiring those into your AWSXRayRecorder instance: that would allow you to "resolve" the current context (i.e. the current span) from Cinnamon rather than from a ThreadLocal.

will-sargent-eero commented 5 years ago

Almost there, but I need an entity from AWSXRaySpan and I can't get at it because it's package private.

ashleymercer commented 5 years ago

It would be possible to make this library return AWS-specific types (e.g. AWSXRayTracer.activeSpan() would return an AWSXRaySpan, from which you could get the underlying Entity).

However, I wonder if this is a code smell? I've certainly changed my mind on this a couple of times, but my current thinking is this: surely the whole point of using something like OT is that you can switch out different implementations without having to change anything? If your code is explicitly relying on AWS-specific types, you lose that flexibility. At the very least switching to, say, Jaeger or Zipkin might break your build; or worse it might compile, but fail at runtime - or even just silently skip sections of code (if e.g. you're using reflection / isInstanceOf / pattern match).

I've found it useful to expose some of the AWS-specific operations through the OT interface, as per the documentation: for example, calling Span.setTag("error", true) actually calls Entity.setError(true) under the covers, without needing direct access to AWS-specific classes. This approach is IMHO better than accessing Entity directly: it will always compile, and will always run safely even with non-AWS implementations (although for non-AWS obviously these just revert to being "normal" tags).

Can I ask what you need to do directly on the Entity? Would it be sufficient to expose additional special tag names for your purposes?

ashleymercer commented 4 years ago

Hi @will-sargent-eero,

Prompted by https://github.com/opentracing-contrib/java-xray-tracer/issues/10 I realise this repo probably needs some TLC - did you manage to resolve this issue in the end? If so, would you mind documenting how here, for the benefit of any other Lightbend / Cinnamon users?

Thanks

will-sargent-eero commented 4 years ago

I could not resolve this issue.

On Mon, Nov 9, 2020 at 5:04 AM Ashley Mercer notifications@github.com wrote:

Hi @will-sargent-eero https://github.com/will-sargent-eero,

Prompted by #10 https://github.com/opentracing-contrib/java-xray-tracer/issues/10 I realise this repo probably needs some TLC - did you manage to resolve this issue in the end? If so, would you mind documenting how here, for the benefit of any other Lightbend / Cinnamon users?

Thanks

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/opentracing-contrib/java-xray-tracer/issues/6#issuecomment-723999222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMUFROT7PIEZANGW3KM75U3SO7SEXANCNFSM4I5VQCTA .