open-telemetry / opentelemetry-java

OpenTelemetry Java SDK
https://opentelemetry.io
Apache License 2.0
1.96k stars 807 forks source link

Allow closing scopes out of order #5303

Open laurit opened 1 year ago

laurit commented 1 year ago

Currently closing scopes out of order is disallowed in https://github.com/open-telemetry/opentelemetry-java/blob/3d5424a54ae6930e721e8c4813ad956047009394/context/src/main/java/io/opentelemetry/context/ThreadLocalContextStorage.java#L48 This poses a challenge when implementing instrumentation for Zio. Zio instrumentation is similar to what we have for kotlin coroutines. Every time a fiber is restored we activate otel context, when fiber is suspended we close the scope and remember the current context so that it could be activated when the fiber is restored again. The problem is that the user code executing in fibers can also activate scopes. Fiber could get suspended when user code has started the span and activated the scope but has not closed them yet. In such situation attempt to close our scope fails because it is not the current one. As a workaround we can run Context.root().makeCurrent() on suspend to avoid leaking scope to the thread that executed the fiber. Kotlin coroutine instrumentation does not run into the same problem because there the programming model encourages users to keep the active scope in coroutine context so instead of context.makeCurrent() the user is expected to use withContext(context.asContextElement()) {...}. As an alternative to allowing scopes to close out of order we could continue using Context.root().makeCurrent() to reset the scope or see if we can, similarly to kotlin coroutine instrumentation, let users keep scope in some zio fiber specific context. I don't particularly like the last option as it will surely create problems when fibers call instrumentations that expect to get the scope from the thread local. If allowing closing scopes out of order is not desirable I'd welcome guidance on how such instrumentation should be implemented.

jack-berg commented 1 year ago

I wrote a little test to better understand the flow and behavior:

    ContextKey<String> fiberKey = ContextKey.named("fiber");
    ContextKey<String> userKey = ContextKey.named("user");

    Context threadContext = Context.root().with(fiberKey, "1");
    Context fiberContext = Context.root().with(fiberKey, "2");
    Context userContext = fiberContext.with(userKey, "a");

    // Establish thread context before fiber.
    Scope threadScope = threadContext.makeCurrent();
    Assertions.assertThat(Context.current().get(fiberKey)).isEqualTo("1");
    Assertions.assertThat(Context.current().get(userKey)).isEqualTo(null);

    // Restore fiber. Activate fiber context.
    Scope fiberScope = fiberContext.makeCurrent();
    Assertions.assertThat(Context.current().get(fiberKey)).isEqualTo("2");
    Assertions.assertThat(Context.current().get(userKey)).isEqualTo(null);

    // Set user context within fiber.
    Scope userScope = userContext.makeCurrent();
    Assertions.assertThat(Context.current().get(fiberKey)).isEqualTo("2");
    Assertions.assertThat(Context.current().get(userKey)).isEqualTo("a");

    // Suspend fiber. Close fiber scope, remember current context. Context should be restored to initial state.
    fiberContext = Context.current();
    fiberScope.close();
    // THIS ASSERTION FAILS with the current behavior of if (!closed && current() == toAttach)
    Assertions.assertThat(Context.current().get(fiberKey)).isEqualTo("1");
    Assertions.assertThat(Context.current().get(userKey)).isEqualTo(null);

    // Restore fiber. Activate context.
    fiberScope = fiberContext.makeCurrent();
    Assertions.assertThat(Context.current().get(fiberKey)).isEqualTo("2");
    Assertions.assertThat(Context.current().get(userKey)).isEqualTo("a");

Indeed closing the fiber scope fails and the fiber context is leaked into the thread when it should close.

Relaxing the if (!closed && current() == toAttach) condition to if (!closed) fixes the problem.

If I understand your suggestion to use Context.root().makeCurrent() is meant to reset the thread scope to the root after suspension rather than trying to close the fiber scope, which may fail. This means that when the fiber is suspended the thread scope becomes empty instead of attempting to restore it to the state before the fiber was restored. Hope I got that right?

If so, Context.root().makeCurrent() doesn't seem like a bad option if the threads running the zio fibers aren't expected to do anything else.