Unable to start span as current

IvannKurchenko commented 1 year ago

Reproducible example:

object Test extends IOApp {
  override def run(args: List[String]): IO[ExitCode] = {
    val ec = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())

    Resource
      .eval(IO(GlobalOpenTelemetry.get))
      .evalMap(OtelJava.forAsync[IO])
      .evalMap(_.tracerProvider.get("test"))
      .use { tracer =>
        for {
          _ <- tracer.spanBuilder("test").build.use { span =>
            println(s"java span context: ${Span.current().getSpanContext}")
            IO(println("otel4s span context: " + span.context))
          }
        } yield ()
      }
      .as(ExitCode.Success)
      .evalOn(ec)
  }
}

Would print following output:

java span context: ImmutableSpanContext{traceId=00000000000000000000000000000000, spanId=0000000000000000, traceFlags=00, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=false}
otel4s span context: WrappedSpanContext(ImmutableSpanContext{traceId=318854a5bd6ac0dd7b0a926f89c97ecb, spanId=925ad3a126cec272, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true})

Current otel4s API does not provide the ability to make the span current and started span is not current. Because of this, I assume, a combination with existing Java instrumentations does not work - because spans that are supposed to be child spans started by auto-instrumentations have no proper parent.

Happy to submit PR. But I am not sure which way is preferable:

Start all spans as current by default?
Provider def makeCurrent: F[Unit] wrapper method.
Inject configuration into TracerProvider to make started span current.

I assume a combination of second and third options is preferable.

Thanks.

rossabaker commented 1 year ago

This is a tricky problem and worth much more thought.

Our "current" span is based on a cats.mtl.Local[F, Vault], most typically backed by an IOLocal. The Java API is based on a ThreadLocal context. We could almost get a transparent integration with a Local instance based on ThreadLocal, but there are still two problems:

It won't work for asynchronous processes. This is tricky in the Java instrumentation, too, and involves taskWrapping.
We have a Vault, and they have a Context, and we can't transparently bridge them because the Java client's context keys are encapsulated. Our propagator code already does a translation of known keys, but it's best effort. We could perhaps make an implementation of Context that delegates to our Vault. We might need to look deeper at ContextStorage. 🤔

IvannKurchenko commented 1 year ago

@rossabaker Thanks, for the reply. Yeah, I totally agree with both points:

I understand that there are two ecosystems involved here, e.g. context should both to be propagated from Java native OTEL Span/Tracing context to IO's Local Vault and vice versa generally. As of now, the only case I see for context propagation is from Local[Vault] down to Java Span.context for instrumentation libs. I think, logical continuation is also to have it other way around from Span.context to Local[Vault], but can't think of use cases.
Application developer would need to make sure that all ExecutionContext's are wrapped into proxy that supports OTEL context propagation, for instance, though: Context.taskWrapping or similar. In this case one possible way is to have TelemetryApp extends IOApp or TelemetryResourceApp extends ResourceApp that will provide handy way to setup IORuntime with necessary underlying context propagation ExecutionContext's.

I just wanted to callout problem that as of now otel4s does not leverage of existing Java libs instrumentations, which is a big advantage of OTEL ecosystem. At current implementation, application developer will be forced to implement tracing and metrics middleware for common things like JDBC or Kafka client. From my point of view, it would be really nice to same OTEL experience like in other ecosystems, for instance in Akka or Java: just to plug instrumentation, configure it and around 80% of work it will do for you out of the box.

Happy to hear your thoughts.

rossabaker commented 1 year ago

Right now at work, we're tracing JDBC via Doobie and Natchez. The transactor gives us a hook into obtaining a Connection, and we can create decorators around the Connection - Statement - ResultSet triumvirate. Of course, it would have been much easier to use standard Java instrumentation, and we can only get as far as we get because Doobie had the correct hook. I think we should absolutely explore a deeper integration like this. The value in the Java SDK is not just that all the formats and propagation is written, but that it's already integrated with so many things we're already calling from Scala.

iRevive commented 1 year ago

We can still expose some utility methods from WrappedSpanContext to ease the interop pain. E.g. WrapperSpanContext#wrap and WrapperSpanContext#unwrap.

It will not solve the thread local's propagation issue, but at least an end user can extend the existing Java's span (and vice versa).

I agree that we should seamlessly integrate with Java thread-local propagation, but this topic is quite complicated. Especially taking into account the Scala-first implementation of the tracing logic.

IvannKurchenko commented 1 year ago

Guys, last question. I'm curious about overall instrumentation concept from otel4s perspective. Let's consider example of having Kafka (though fs2-kafka) or Elastic (though elastic4s) clients in codebase. What' the way to go to instrument common cases like those: are there will be set of custom instrumentations for higher level clients (e.g. elastic4s and fs2) or you are planning to leverage existing instrumentation? Or app devs supposed to make tracing middleware by themselves? Thanks.

bpholt commented 1 year ago

Throwing this out there, so take it for what it's worth, but we'll soon have a pure-Scala implementation as well, which means otel4s will work with Scala.js too. It'd be interesting to see how propagation works in that context and how best to enable interop with the JS ecosystem.

rossabaker commented 1 year ago

are there will bte set of custom instrumentations for higher level clients (e.g. elastic4s and fs2) or you are planning to leverage existing instrumentation

I think we'd want to leverage existing instrumentation when we can.

Things that are pure Scala, we're on our own. For example, [Skunk](Skunk](https://github.com/typelevel/skunk/commit/2a9b4692f48e09fae33ed66dd9780a3921c1fade).
Things where we are invoked by Java, I suspect all we need is existing instrumentation. For example, http4s-jetty.
Things where we wrap Java, take advantage of the existing instrumentation, but consider spanning it ourselves if the wrapper adds non-negligible timing or metadata. fs2-kafka would be an example in this category, but I'm not familiar enough with it to know whether the Scala bits merit their own instrumentation.

That third category is where we'd seriously need a smooth handoff between otel4s state and opentelemetry-java state.

rossabaker commented 1 year ago

I think we can replace the Java SDK's context storage with an implementation that wraps around our IOLocal Vault.

One way is to implement an SPI. SPIs need to have a no-arg constructor, which means we would need to do unutterable things with respect to IO.

Another way is a wrapper (we call these middlewares in various other Typelevel projects). That would give us the facility to create one in IO and invoke it before loading the SDK.

IvannKurchenko commented 1 year ago

@rossabaker Recently I thought about following case: what if we have two fibers running on top of one thread. For instance second fiber started before first one. Do I understand correctly that in this case second fiber span context would override first five span context, taking into account that Java's OTEL context stored in thread locals.

I assume there are much more corner cases like this can be found, which makes me think and agree with you, that due two runtimes are involved this would be pretty hard, or even feasible at all to have stable Java's OTEL interop.

So, as a conclusion, I can assume that perhaps IO native instrumentation, at least for the most popular libs, is the way to go for app or lib developers. Does it make sense?

armanbilge commented 1 year ago

I just put up a PR on Cats Effect that proof-of-concepts how we can safely expose IOLocals as ThreadLocals in delay(...), blocking(...), etc. If you are willing to eat a static global IOLocal key (not the end of the world, similar to how Vault keys are used) this should make it possible to implement a custom ContextStorageProvider SPI.

https://github.com/typelevel/cats-effect/pull/3636

rossabaker commented 1 year ago

Neat. One tricky thing about ContextStorage is that it while its attach and scope might implement a signature like local, their use in Java is non-local: there's no callback with automatic scope management, but rather the application calls attach and is responsible for closing the scope. I'm not yet sure how those calls should propagate back to the Scala code.

typelevel / otel4s

Unable to start span as current #202