imandra-ai / ocaml-opentelemetry

Instrumentation for https://opentelemetry.io
http://docs.imandra.ai/ocaml-opentelemetry/
33 stars 7 forks source link

opentelemetry-cohttp: make sure to emit data before termination #41

Open tatchi opened 1 year ago

tatchi commented 1 year ago

I was playing around with this little example that uses opentelemetry_client_cohttp_lwt:

let run () =
  let open Lwt.Syntax in
  let* () =
    Opentelemetry_lwt.Trace.with_ "my trace" @@ fun _scope ->
    print_endline "bar" |> Lwt.return
  in
  Lwt.return_unit

let () =
  Opentelemetry_lwt.Globals.service_name := "my service";
  let config =
    Opentelemetry_client_cohttp_lwt.Config.make ~debug:true
      ~url:"http://localhost:4318" ()
  in
  Opentelemetry_client_cohttp_lwt.with_setup ~config () @@ fun () ->
  Lwt_main.run (run ())

And was surprised to not see any traces in the Jaeger UI (same example with ocurl works fine), especially because I could see a send span in logs

bar                                    
send spans { resource = ... }
opentelemetry: exiting…

Adding a little delay fix the issue

diff --git a/src/bin/main.ml b/src/bin/main.ml
index fc28bf1..1416c72 100644
--- a/src/bin/main.ml
+++ b/src/bin/main.ml
@@ -6,6 +6,7 @@ let run () =
     Opentelemetry_lwt.Trace.with_ "my trace" @@ fun _scope ->
     print_endline "bar" |> Lwt.return
   in
+  let* () = Lwt_unix.sleep 2.0 in
   Lwt.return_unit

 let () =

My understanding is that the cleanup function doesn't have the time to finish before the program terminate.

let cleanup () =
  if !debug_ then Printf.eprintf "opentelemetry: exiting…\n%!";
  Lwt.async (fun () ->
      let* () = emit_all_force httpc encoder in
      Httpc.cleanup httpc;
      Lwt.return ())

I see that we use a Lwt.async, which doesn't wait for the resolution of the promise. See for example this code:

let cleanup () =
  Lwt.async (fun () ->
      let open Lwt.Syntax in
      print_endline "start cleanup";
      (* simulate delay *)
      let* () = Lwt_unix.sleep 2.0 in
      print_endline "end cleanup";
      Lwt.return_unit)

let run () =
  print_endline "hello";
  cleanup () |> Lwt.return

let () = Lwt_main.run (run ())

And the output:

hello                                  
start cleanup

So in the case of opentelemetry_client_cohttp_lwt, the let* () = emit_all_force httpc encoder doesn't have time to complete before the program terminates.

Should we make sure that all data is emitted before the program terminates (both after "normal" completion and after explicit interruption)?

I guess one way to fix this would be to make the cleanup function return a unit Lw.t to be able to properly wait for its resolution. But the problem is that this is part of a common interface that different backends implement, so I don't think it's possible to force a lwt value. Maybe we can make it parametric or something?