imandra-ai / ocaml-opentelemetry

Instrumentation for https://opentelemetry.io
http://docs.imandra.ai/ocaml-opentelemetry/
33 stars 7 forks source link

Use Lwt's sequence-associated storage instead of our own Thread_local #32

Closed ELLIOTTCABLE closed 1 year ago

ELLIOTTCABLE commented 1 year ago

This stores the implicit 'global' surrounding-scope off the callstack in Lwt's "sequence-associated storage" instead of our own Thread_local, as this makes more sense in the context of green threads.

I tried to match the code-style, let me know if you have any tweaks.

ELLIOTTCABLE commented 1 year ago

This actually has non-obvious and subtle performance consequences — we've been talking about this internally at @Ahrefs for a bit. To quote @mfp,

tldr

  • Lwt TLS works by associating a heterogeneous map to promises
  • If you naïvely spawn async threads with ignore (make_some_promise ()) or equivalent (which we do in multiple places, and is wrong for multiple reasons btw.), you're saving the whole map. Lwt has no way to know which keys will be used in subsequent lookups.
  • This can lead to leaks if you have long-running async threads that are launched in the middle of a promise with a heavy Lwt.with_value map.

Basically, it makes "using Lwt wrong in a mostly-harmless way, suddenly harmful." And 'it' in this case is a supposedly-transparent debugging and tracing library. Ew. I think we'll continue with this approach internally, but we have a more total control than an open-source library can have …

… unfortunately, not making this change is also problematic. (Actual)threads are complicated in a Lwt context … I'm not sure how to proceed.

If we merge this, it should be with some significant documentation, educating users up-front about how to be more principled and careful with their promises … idk. Thoughts?

ELLIOTTCABLE commented 1 year ago

One thing that might minimize the damage is if we can get away with only the span_id, and ditching the assumption of access to any other metadata about the surrounding context, possibly? At least in that case, any pathological cases are only leaking, idk, 8 bytes plus a little?

Or we can dislocate the rest of the metadata Scope.t into a process-or-thread global map, I suppose?

  type t = {
    trace_id: Trace_id.t;
    span_id: Span_id.t;
    mutable events: Event.t list;
    mutable attrs: key_value list;
  }
c-cube commented 1 year ago

Ah, so it's really a global map, I see. I thought it might have been stored in promises, like cancellation state, but this means this is a bit too tricky. Perhaps you could use an ahrefs overlay to experiment with this for a while?