steffan-westcott / clj-otel

An idiomatic Clojure API for adding telemetry to your libraries and applications using OpenTelemetry.
https://cljdoc.org/d/com.github.steffan-westcott/clj-otel-api
Apache License 2.0
183 stars 12 forks source link

Suggestion: propagate context through Clojure bindings #2

Closed Limess closed 6 months ago

Limess commented 1 year ago

Thanks for creating this library - great stuff, and great documentation!

I'm not sure if this is a great idea, but I'll ask it anyway:

Would it be sensible to propagate span context onto a Clojure binding, in addition to the current context propagation on ThreadLocal context? New spans can then fetch context off this binding if it exists.

Many Clojure async constructs (futures, agents, I think go threads) propagate bindings - preventing the need to manually propagate context in these scenarios.

steffan-westcott commented 1 year ago

Thank you for making your suggestion. Someone asked about the same topic on the #clj-otel channel in Clojurians.

I think there may be an opportunity to use Clojure's dynamic scope as you suggest. Still, it is unclear how best to achieve this and retain harmony with the Java OpenTelemetry implementation. In the design of clj-otel so far, I have sought not to diverge too much from the design of the Java library.

The Java library implements a "current context" concept and a mechanism for push/pop contexts on local threads. This model only allows for a thread to process a single context. There is a Java OpenTelemetry library extension that supports Kotlin coroutines, where a thread can process several coroutines, and each coroutine works in a context (I'm not a Kotlin user, so I am not sure). I believe Kotlin coroutines are similar to async Clojure code in that processing is not necessarily performed on isolated threads. I need to study how this works as it may point the way to best support dynamic scope contexts when working with async code in Clojure.

I should also point out that I am unsure of the merits of dynamic scoped objects. This article by Stuart Sierra makes some excellent points about why it may be considered a bad idea.

danielcompton commented 1 year ago

Following up from the Honeycomb Pollinator's Slack, I shared this snippet

(defmacro with-span [span-opts & body]
  `(let [span-opts# ~span-opts
         source#    (into {:line ~(:line (meta &form))
                           :file ~*file*
                           :ns   ~(str *ns*)}
                          (:source span-opts#))
         span-opts# (assoc span-opts#
                           :source source#
                           :parent *current-context*)]
     (span/with-span-binding' [context# span-opts#]
       (binding [*current-context* context#]
         ~@body))))

We're using core.async very heavily, and so using binding seemed like the only sensible way to introduce the Java OpenTelemetry library without doing massive surgery on our whole codebase.

I don't know if it's really necessary for this library to propagate context through bindings, it's not a lot of code to do it yourself. If it was possible to do it in the library, I'd definitely want it to be opt-in.

I agree with Stuart Sierra's points that Dynamically Scoped Singleton Resources (e.g. DB connections) are an anti pattern. I think that OpenTelemetry Context falls under the Safe Dynamic Scope section:

Remember that dynamic scope in Clojure is really thread-local binding. Therefore, it’s best suited to operations that are confined to a single thread. [...] The entire operation happens on a single thread, in a single call stack. It has dynamic extent.

This is the intent of OpenTelemetry's Context object, it just doesn't work quite that way in core.async programs.

What would have helped me when adopting clj-otel was a clearer demarcation of which functions set Context into the ThreadLocal, and which ones were just returning a new Context. I can have a pass at adding some documentation where this was unclear.

steffan-westcott commented 1 year ago

@Limess @danielcompton I have implemented an opt-in solution in release 0.2.2 of clj-otel. If you can try this out, I would appreciate your feedback.