typelevel / otel4s

An OpenTelemetry library for Scala based on Cats-Effect
https://typelevel.org/otel4s
Apache License 2.0
170 stars 35 forks source link

Tracing a Resource While Staying in Resource #194

Open ChristopherDavenport opened 1 year ago

ChristopherDavenport commented 1 year ago

So here is an approach I found that was able to get correct resource timing.

https://github.com/ChristopherDavenport/natchez-http4s-otel/blob/otel2/core/src/main/scala/io/chrisdavenport/natchezhttp4sotel/ClientMiddleware.scala#L76-L87

I was wondering how I was supposed to trace for the lifetime of a resource while staying in Resource?

Client are Request[F] => Resource[F, Response[F]] so we dont want to directly use the resource in line.

I assumed I would use wrapResource on spanBuilder, but the use seems to still collapse to F rather than maintaining the resource that's necessary to preserve the live tcp connection.

Hoping that someone might have an approach to how to best preserve the resource while tracing a resource?

iRevive commented 1 year ago

This behavior was supported in the very first draft. Turned out it does not work the way we want with Local[F, Vault].

There are a few discussions regarding this topic: 1) https://github.com/typelevel/otel4s/pull/105 2) https://github.com/typelevel/otel4s/issues/88 3) https://github.com/typelevel/otel4s/pull/107#issuecomment-1414410325

ChristopherDavenport commented 1 year ago

So we can't trace the lifetime of an http4s client resource? I'm not sure I'm following.

iRevive commented 1 year ago

We will eliminate wrapResource in the upcoming release. A follow-up to #273.

To summarize: it's not possible to trace the resource now (e.g. different stages: acquire, use, release; and stay within the resource), and will not be possible in the future because the API is bound to the Local[F, Vault] semantics :(

iRevive commented 1 month ago

Local semantics doesn't go along with the Resource.

[!Warning] The code below provides a leaky abstraction, please don't use it

However, there is a workaround to make it work by manually manipulating the context:

class IOLocalTracer[F[_]: Monad: LiftIO, Ctx](underlying: Tracer[F], ioLocal: IOLocal[Ctx]) extends Tracer[F] {
  def meta: Tracer.Meta[F] = underlying.meta
  ... // forward all other methods
  def spanBuilder(name: String): SpanBuilder[F] = 
    DelegateSpanBuilder(underlying.spanBuilder(name), ioLocal)
}

case class DelegateSpanBuilder[F[_]: Monad: LiftIO, Ctx](
   builder: SpanBuilder[F],
   ioLocal: IOLocal[Ctx]
) extends SpanBuilder[F] {
  def addAttribute[A](attribute: Attribute[A]): SpanBuilder[F] = copy(builder.addAttribute(attribute))
  ... // forward all other methods
  def build: SpanOps[F] = {
    val b = builder.build
    new SpanOps[F] {
      def resource: Resource[F, SpanOps.Res[F]] =
        for {
          res <- b.resource
          // manually set the context
          _ <- Resource.make( 
            ioLocal.get.to[F] <* res.trace(ioLocal.get.to[F]).flatMap(ioLocal.set(_).to[F])
          )(ioLocal.set(_).to[F])
        } yield res
      def use[A](f: Span[F] => F[A]): F[A] = b.use(f)
      def use_ : F[Unit] = b.use_
    }
  }
}

And wire everything together:

IOLocal(Context.root).flatMap { implicit ioLocal =>
  OtelJava.autoConfigured[IO]().use { otel4s =>
    otel4s.tracerProvider.get("tracer").flatMap { t =>
      implicit val tracer: Tracer[IO] = new IOLocalTracer(t, ioLocal)
      Tracer[IO].span("test").resource.use { _ =>
        Tracer[IO].currentSpanContext.debug() // prints the current span
      }
    }
  }
}

Technically, we can do something similar under the hood. For example, if we detect that Local is backed by the IOLocal, we can apply custom propagation logic.

Downsides: 1) An unpredictable behavior in some cases: https://github.com/typelevel/cats-effect/issues/3100, https://github.com/typelevel/cats-effect/pull/3360 2) Different behavior for different effects: i.e. Kleisli[IO, Context, *] has a built-in Local[Kleisli[IO, Context, *], Context] instance that doesn't require IOLocal 3) Resource may start on one fiber and finalize on another one. Hence, the abstraction is leaking and may create more problems than solve