A way to set thread-global tags?

brendanlong commented 6 years ago

In some cases, we want to tag every log message with things like a process ID, a request ID, request path, user name, etc. I was hoping I could do this by setting thread-global tags with Scheduler.with_local, then looking them up inside my custom Output. Unfortunately, it seems like outputs run in a different Async context so they can’t find the tags.

I can think of two ways Async.Log could support our use-case:

Pass the async context or thread-local universal map to log outputs
Add a way to mutate messages as soon as they're logged, independent of Output

(1) solves exactly our use-case, but (2) is probably simpler, more performant, and also more general-purpose.

As an example, it would be nice if I could do something like:

open Async

let tags_key = Univ_map.Key.create ~name:"tags" [%sexp_of: (string * string) list]

let () =
  Log.add_rewriter (fun msg ->
    match Scheduler.find_local tag_key with
    | None -> msg
    | Some tags ->
      Log.Message.add_tags message tags);
  Scheduler.with_local tags_key ["example", "tag"] ~f:(fun () ->
    Log.info "Some message") (* will log "Some message" with tags ["example, "tag"] *)

See https://discuss.ocaml.org/t/thread-global-tags-in-async-log/2739/3

samdk commented 6 years ago

Are the tags (and outputs) static, or do you expect them to change? If static, would adding a function something like the following get you what you want?:

val with_mapped_messages : Log.Output.t -> f:(Message.t -> Message.t) -> Log.Output.t

Then if I'm understanding you right you could do something like this:

let extra_tags = ... in
let original_outputs = Log.get_output log in
let outputs_with_tags =
  List.map original_outputs ~f:(fun output ->
    Log.Output.with_mapped_messages ~f:(fun message ->
      Log.Message.add_tags message extra_tags))
in
Log.set_output log outputs_with_tags

Which would be very easy to add, and fairly general.

(2) seems pretty reasonable as a thing to want, but would be a larger change in log code-wise, and doesn't fit quite as well philosophically: right now basically all processing (except for log level filtering) happens in the Output.ts, and this would change that. I'm not entirely opposed to it or anything, but I'd want to think harder about it.

(I'm also generally hesitant to make any guarantees about what async context things are going to run in.)

brendanlong commented 6 years ago

Unfortunately the tags depend on the Async context. For example, our servers handle HTTP requests in parallel, and we don't want to accidentally log something for one request using the tags from another (tags could include HTTP path, user, etc.).

samdk commented 6 years ago

Ah, okay, I think I understand a little better now.

I'm still thinking this through. I have no hard objections to (2) on its own, and the most obvious implementation would run the rewriter synchronously when the log call is made, which I think would get you what you want. We'd have to think hard before making guarantees about that in the interface though.

(If we did this I'm inclined to just have the rewriter just be a single function rather than a list of functions, because that simplifies the set of edge cases. I'm assuming that'd still work for you.)

One thing that's unclear to me is how much this actually solves this problem in the general case of wanting to attach context to log statements made in external libraries. It's relatively easy in async to run things in a different execution context without thinking very hard, so it's not clear to me that this is actually going to be applied in every case. Have you looked deeply enough into this to have a good sense one way or the other?

brendanlong commented 6 years ago

Are you thinking of cases like if we send data to a background worker? I agree that what we want is probably impossible in that case, but it would be nice to handle this in more common cases at least. It's not super important to us that this works 100%, since this is mostly just making logs easy to find, but going from "only works with our internal code" to "works with everything that isn't doing background processing" would be a pretty big jump I think.

(A more common case is a new async context which is a child of the current one, but I think the parent thread-locals are still available in that case?)

gtrak commented 6 years ago

Clojure's thread-local bindings and binding-conveyor-fn (called from any helper that delegates to another java thread in order to preserve bindings) is the functionality I was looking for (though synchronous) before seeing that Scheduler context's and Univ-map could be used for this.

Any high-level functions that switch threads take care to preserve the current thread-local context and pop it off after the stack unwinds.

An alternative might be a logger-specific context instead of using the top-level Scheduler context?

brendanlong commented 4 years ago

I think I found a fix for us, by passing the execution context around manually: https://github.com/arenadotio/ocaml-mssql/pull/33

I suspect it should be possible for Async to do this for us though. It looks like Thread_safe intentionally uses the main context: https://github.com/janestreet/async_unix/blob/v0.11.0/src/thread_safe.ml#L18

I wonder if there's some way to detect that the current context isn't random and should be preserved. I'm having trouble following it, but it looks like In_thread.run is trying to preserve the correct context already: https://github.com/janestreet/async_unix/blob/v0.11.0/src/in_thread.ml#L26

brendanlong commented 4 years ago

Actually using Scheduler.with_context didn't help, but Scheduler.preserve_context does. Unfortunately, Scheduler.preserve_context is a lot harder for us to use, but it gets us most of what we wanted: https://github.com/arenadotio/ocaml-mssql/pull/35

samdk commented 4 years ago

We recently also added a [transform] optional argument Log.create (and set/get functions for changing it on existing Log.ts) which let you do pre-async/output mutation, I think roughly solution (2) you described originally.

brendanlong commented 4 years ago

Thanks! I added this to our Async.Log integration and our tags are showing up correctly now:

  Log.Global.set_transform
    (Some
       (fun msg ->
         (* synchronously add global tags to the message. If we do this later, we will frequently
            get tags from the wrong async context.
            See https://github.com/janestreet/async_unix/issues/16#issuecomment-620830885 *)
         let open Log.Message in
         let level = level msg
         and raw_message = raw_message msg
         and tags = tags msg |> Logger0.merge_global_tags
         and time = time msg in
         Log.Message.create ?level ~tags ~time raw_message));

janestreet / async_unix

A way to set thread-global tags? #16