opentracing-contrib / java-jaxrs

OpenTracing Java JAX-RS instrumentation
Apache License 2.0
37 stars 33 forks source link

Spans/traces leak between requests if child spans not finished correctly #109

Closed rnorth closed 5 years ago

rnorth commented 5 years ago

I don't think this is a bug per se, but perhaps a missed opportunity: the current implementation relies on developers' span handling being bug-free, otherwise extremely undesirable traces are generated.

As an example, we had a service where a span had been started but not finished due to a bug in service code. This imbalance between start/finish meant that, when the server started handling a new request, there was already an unrelated span in on the threadlocal. Future requests served on the same thread would all pile on top of the same trace.

The effect for us was massive, useless traces, with many thousands of spans and lasting for minutes or hours.

Obviously 'don't forget to finish spans' would be good advice, but I think the JAX-RS filter could do more to limit the damage when such a bug inevitably occurs.

ServerTracingFilter seems to actively try and join any existing (thread local) spans - I can see that this was a deliberate move as part of #74. I'm wondering if this behaviour could be made configurable, though, so that users can opt out.

If it helps, I've created an example repo to demonstrate the issue:

We have forked ServerTracingFilter locally to do this, so could easily submit a PR if you'd like.

FWIW we did also try a few approaches to fix this in SpanFinishingFilter, but found it difficult to correctly clear out thread local state. In the end, an approach of ignoring the state in ServerTracingFilter was the only thing that worked for us.

I think this is similar to, but not the same as, #87.

pavolloffay commented 5 years ago

@rnorth thanks for reporting!

What is your fix? Add a parameter to jax-rs filter to skip any thread locals?

rnorth commented 5 years ago

So far (internally) I've just taken the crude approach of disabling it in code - but I'll happily submit as a PR where it is made configurable. I'll add a suitable with method to ServerTracingDynamicFeature.Builder.

I think the only thing I'm not sure of is what the default should be. I'm strongly inclined to not disrupt the current default behaviour, even though in my view it's a bit of a gotcha.

What do you think?

pavolloffay commented 5 years ago

I am fine with changing the default behavior. We could even set the parent active span as a second child-of. But I am not sure if it is supported by all tracing systems.

sjoerdtalsma commented 5 years ago

Isn't it better to do a 'hard' clear of the Scope after the request (before it is returned to the threadpool) using https://github.com/opentracing/opentracing-java/pull/313 ?

FWIW we did also try a few approaches to fix this in SpanFinishingFilter, but found it difficult to correctly clear out thread local state

Can you elaborate? Would OT-java PR 313 above help at all or do you need something else?