typelevel / otel4s

An OpenTelemetry library for Scala based on Cats-Effect
https://typelevel.org/otel4s
Apache License 2.0
173 stars 35 forks source link

Unable to start span as current #202

Open IvannKurchenko opened 1 year ago

IvannKurchenko commented 1 year ago

Reproducible example:

object Test extends IOApp {
  override def run(args: List[String]): IO[ExitCode] = {
    val ec = ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())

    Resource
      .eval(IO(GlobalOpenTelemetry.get))
      .evalMap(OtelJava.forAsync[IO])
      .evalMap(_.tracerProvider.get("test"))
      .use { tracer =>
        for {
          _ <- tracer.spanBuilder("test").build.use { span =>
            println(s"java span context: ${Span.current().getSpanContext}")
            IO(println("otel4s span context: " + span.context))
          }
        } yield ()
      }
      .as(ExitCode.Success)
      .evalOn(ec)
  }
}

Would print following output:

java span context: ImmutableSpanContext{traceId=00000000000000000000000000000000, spanId=0000000000000000, traceFlags=00, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=false}
otel4s span context: WrappedSpanContext(ImmutableSpanContext{traceId=318854a5bd6ac0dd7b0a926f89c97ecb, spanId=925ad3a126cec272, traceFlags=01, traceState=ArrayBasedTraceState{entries=[]}, remote=false, valid=true})

Current otel4s API does not provide the ability to make the span current and started span is not current. Because of this, I assume, a combination with existing Java instrumentations does not work - because spans that are supposed to be child spans started by auto-instrumentations have no proper parent.

Happy to submit PR. But I am not sure which way is preferable:

I assume a combination of second and third options is preferable.

Thanks.

rossabaker commented 1 year ago

This is a tricky problem and worth much more thought.

Our "current" span is based on a cats.mtl.Local[F, Vault], most typically backed by an IOLocal. The Java API is based on a ThreadLocal context. We could almost get a transparent integration with a Local instance based on ThreadLocal, but there are still two problems:

  1. It won't work for asynchronous processes. This is tricky in the Java instrumentation, too, and involves taskWrapping.
  2. We have a Vault, and they have a Context, and we can't transparently bridge them because the Java client's context keys are encapsulated. Our propagator code already does a translation of known keys, but it's best effort. We could perhaps make an implementation of Context that delegates to our Vault. We might need to look deeper at ContextStorage. 🤔
IvannKurchenko commented 1 year ago

@rossabaker Thanks, for the reply. Yeah, I totally agree with both points:

I just wanted to callout problem that as of now otel4s does not leverage of existing Java libs instrumentations, which is a big advantage of OTEL ecosystem. At current implementation, application developer will be forced to implement tracing and metrics middleware for common things like JDBC or Kafka client. From my point of view, it would be really nice to same OTEL experience like in other ecosystems, for instance in Akka or Java: just to plug instrumentation, configure it and around 80% of work it will do for you out of the box.

Happy to hear your thoughts.

rossabaker commented 1 year ago

Right now at work, we're tracing JDBC via Doobie and Natchez. The transactor gives us a hook into obtaining a Connection, and we can create decorators around the Connection - Statement - ResultSet triumvirate. Of course, it would have been much easier to use standard Java instrumentation, and we can only get as far as we get because Doobie had the correct hook. I think we should absolutely explore a deeper integration like this. The value in the Java SDK is not just that all the formats and propagation is written, but that it's already integrated with so many things we're already calling from Scala.

iRevive commented 1 year ago

We can still expose some utility methods from WrappedSpanContext to ease the interop pain. E.g. WrapperSpanContext#wrap and WrapperSpanContext#unwrap.

It will not solve the thread local's propagation issue, but at least an end user can extend the existing Java's span (and vice versa).

I agree that we should seamlessly integrate with Java thread-local propagation, but this topic is quite complicated. Especially taking into account the Scala-first implementation of the tracing logic.

IvannKurchenko commented 1 year ago

Guys, last question. I'm curious about overall instrumentation concept from otel4s perspective. Let's consider example of having Kafka (though fs2-kafka) or Elastic (though elastic4s) clients in codebase. What' the way to go to instrument common cases like those: are there will be set of custom instrumentations for higher level clients (e.g. elastic4s and fs2) or you are planning to leverage existing instrumentation? Or app devs supposed to make tracing middleware by themselves? Thanks.

bpholt commented 1 year ago

Throwing this out there, so take it for what it's worth, but we'll soon have a pure-Scala implementation as well, which means otel4s will work with Scala.js too. It'd be interesting to see how propagation works in that context and how best to enable interop with the JS ecosystem.

rossabaker commented 1 year ago

are there will bte set of custom instrumentations for higher level clients (e.g. elastic4s and fs2) or you are planning to leverage existing instrumentation

I think we'd want to leverage existing instrumentation when we can.

That third category is where we'd seriously need a smooth handoff between otel4s state and opentelemetry-java state.

rossabaker commented 1 year ago

I think we can replace the Java SDK's context storage with an implementation that wraps around our IOLocal Vault.

One way is to implement an SPI. SPIs need to have a no-arg constructor, which means we would need to do unutterable things with respect to IO.

Another way is a wrapper (we call these middlewares in various other Typelevel projects). That would give us the facility to create one in IO and invoke it before loading the SDK.

IvannKurchenko commented 1 year ago

@rossabaker Recently I thought about following case: what if we have two fibers running on top of one thread. For instance second fiber started before first one. Do I understand correctly that in this case second fiber span context would override first five span context, taking into account that Java's OTEL context stored in thread locals.

I assume there are much more corner cases like this can be found, which makes me think and agree with you, that due two runtimes are involved this would be pretty hard, or even feasible at all to have stable Java's OTEL interop.

So, as a conclusion, I can assume that perhaps IO native instrumentation, at least for the most popular libs, is the way to go for app or lib developers. Does it make sense?

armanbilge commented 1 year ago

I just put up a PR on Cats Effect that proof-of-concepts how we can safely expose IOLocals as ThreadLocals in delay(...), blocking(...), etc. If you are willing to eat a static global IOLocal key (not the end of the world, similar to how Vault keys are used) this should make it possible to implement a custom ContextStorageProvider SPI.

rossabaker commented 1 year ago

Neat. One tricky thing about ContextStorage is that it while its attach and scope might implement a signature like local, their use in Java is non-local: there's no callback with automatic scope management, but rather the application calls attach and is responsible for closing the scope. I'm not yet sure how those calls should propagate back to the Scala code.