helidon-io / helidon

Java libraries for writing microservices
https://helidon.io
Apache License 2.0
3.44k stars 562 forks source link

Helidon tracing custom spans #8861

Open boris-senapt opened 3 weeks ago

boris-senapt commented 3 weeks ago

What is the correct way to create custom spans in Helidon Tracing? I see different places use different approaches to getting the current span, so I end up having to use a utility like the below

private <T> T runInSpan(final Span span, final Callable<T> callable) {
    final var context = Context.builder()
            .update(it -> Contexts.context().ifPresent(it::parent))
            .build();
    // OpenTelemetryDataPropagationProvider uses the span from the context
    context.register(span);
    // WebClientTracing uses the SpanContext from the context
    context.register(span.context());
    // Not sure about this one, but TracingFilter adds it
    context.register(TracingConfig.class, span.context());
    return Contexts.runInContext(context, () -> {
        // enables propagation on the underlying OTEL framework
        try (final var scope = span.activate()) {
            return callable.call();
        } catch (RuntimeException ex) {
            span.end(ex);
            throw ex;
        } finally {
            span.end();
        }
    });
}

If I miss any of the 3 above steps, my custom scope is only propagated "sometimes" and I end up with a mess


Environment Details

tjquinno commented 3 weeks ago

The issue description talks about creating custom spans but the code deals quite a bit with context propagation.

A few thoughts...

  1. Helidon's tracing implementations should automatically propagate the current tracing information when a new Context is created, without your code having to do that. There are some changes in this area in 4.0.10 (releasing soon) which fix problems with propagation of tracing information via Context.
  2. Do you see the same problems if you do not use GraalVM?
  3. In general, and specifically for both of the above questions, it would be great if you could share a simple project that shows in specific coding examples what you expect to work that does not. Your utility avoids for you (and therefore hides) which specific problem or problems might be present in particular coding situations.
  4. I'd like to better understand what "sometimes" and "a mess" mean for you! ;-)
boris-senapt commented 3 weeks ago

The issue description talks about creating custom spans but the code deals quite a bit with context propagation.

A few thoughts...

  1. Helidon's tracing implementations should automatically propagate the current tracing information when a new Context is created, without your code having to do that. There are some changes in this area in 4.0.10 (releasing soon) which fix problems with propagation of tracing information via Context.
  2. Do you see the same problems if you do not use GraalVM?
  3. In general, and specifically for both of the above questions, it would be great if you could share a simple project that shows in specific coding examples what you expect to work that does not. Your utility avoids for you (and therefore hides) which specific problem or problems might be present in particular coding situations.
  4. I'd like to better understand what "sometimes" and "a mess" mean for you! ;-)

Yes, you are right - my main concern is the "correct" current span being propagated in various ways; and the propagation for

is all different.

I'll find some time to build a demonstrator to show what I'm encountering - hopefully this week.

boris-senapt commented 3 weeks ago

I've been working on a reproducer, and you're right @tjquinno that 4.0.10 fixes the main issue I was having in that the context propagation was looking for the current span in the Helidon Context not in the ThreadLocal that OTEL uses.

I did find weirdness though - the code in io.helidon.common.context.ContextAwareExecutorImpl check for an active context before it propagates anything

Optional<Context> context = Contexts.context();
if (context.isPresent()) {

But OpenTelemetryDataPropagationProvider actually makes no use of the context - it propagates the Helidon Tracing active Scope - it doesn't really care about or reference the Helidon Context.

But what that means is that if you don't have a Helidon Context no propagation happens - even though one is not required.


Interestingly Log4jMdcPropagator has the same behaviour.

So I wonder if the problem isn't tracing at all, but the ContextAwareExecutorImpl?

tjquinno commented 1 week ago

TL;DR: As you mentioned, there are several types of "propagation" possibly in play in this discussion, some involving tracing specifically, some using Context and some not, etc. All types of propagation should "just work."

But if--even with the recent tracing improvements--you are still seeing specific problems, please report them here (or in other issues) so we can reproduce and track each one...preferably with a reproducer.


Longer discussion...

You pasted a bit of code from ContextAwareExecutorImpl (which actually appears in both wrap methods in that class). Do you observe a specific problem that you think is caused by that code?

As you point out, that code propagates nothing if there is no current context to propagate into. But I think--perhaps @tomas-langer can confirm--that there is almost always--perhaps always always--a current context, the global context if nothing else: https://github.com/helidon-io/helidon/blob/aa3ab1772be6fdffe09a5fa028eed01ef627a546/common/context/src/main/java/io/helidon/common/context/Contexts.java#L66-L77 So propagation should happen all the time as contexts come and go.

You mentioned that the Helidon OTel propagation provider does not use the context, but it does not really need to. The ContextAwareExecutorImpl#wrap methods do the following:

Any code can register data in a Context but it is the propagation providers that actually capture, set, and restore their own particular states in the thread.