ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.51k stars 1.11k forks source link

Domain.join loses backtrace #12362

Open talex5 opened 1 year ago

talex5 commented 1 year ago
let foo () =
  if true then failwith "Error"

let () =
  Printexc.record_backtrace true;
  let d = Domain.spawn (fun () ->
    foo ();
    assert false
  ) in
  Domain.join d

The above code prints:

$ ocaml test.ml
Exception: Failure "Error".
Raised at Stdlib__Domain.join in file "domain.ml", line 258, characters 16-24
Called from <unknown> in file "./test.ml", line 10, characters 2-15
Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-14

It would be good to include the backtrace from the new domain. Eio has code to do this (https://github.com/ocaml-multicore/eio/pull/571), but @avsm suggested reporting it here too, at least as a documentation issue.

I also noticed that Printexc.record_backtrace true doesn't seem to apply to new domains (but OCAMLRUNPARAM=b does). Is this intentional?

Also, ocaml, ocamlc, and ocamlopt all give quite different backtraces, which seems surprising:

$ cat test.ml
let run fn =
  Domain.join (Domain.spawn (fun () ->
      Printexc.record_backtrace true;
      fn ()
    ))

let run_with_bt f =
  let d = Domain.spawn (fun () ->
      Printexc.record_backtrace true;
      match f () with
      | x -> Ok x
      | exception ex ->
        let bt = Printexc.get_raw_backtrace () in
        Error (ex, bt)
    ) in
  match Domain.join d with
  | Ok x -> x
  | Error (ex, bt) -> Printexc.raise_with_backtrace ex bt

let foo () =
  if Sys.opaque_identity true then failwith "Error"

let () =
  Printexc.record_backtrace true;
  run_with_bt (fun () ->
      foo ();
      assert false
    )

$ ocamlopt -o test.exe test.ml && ./test.exe
Fatal error: exception Failure("Error")
Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33
Called from Test.foo in file "test.ml", line 21, characters 35-51
Called from Test.run_with_bt.(fun) in file "test.ml", line 10, characters 12-16

$ ocamlc -o test.exe test.ml && ./test.exe
Fatal error: exception Failure("Error")
Called from unknown location
Called from unknown location
Called from unknown location

$ ocaml test.ml
Exception: Failure "Error".
Raised at Stdlib.failwith in file "stdlib.ml", line 29, characters 17-33
Called from (fun) in file "./test.ml", line 26, characters 6-12
Called from run_with_bt.(fun) in file "./test.ml", line 10, characters 12-16
Re-raised at run_with_bt in file "./test.ml", line 18, characters 22-57
Called from <unknown> in file "./test.ml", line 25, characters 2-63
Called from Topeval.load_lambda in file "toplevel/byte/topeval.ml", line 89, characters 4-14

$ ocaml --version
The OCaml toplevel, version 5.1.0~beta1
gasche commented 1 year ago

I also noticed that Printexc.record_backtrace true doesn't seem to apply to new domains (but OCAMLRUNPARAM=b does). Is this intentional?

Thanks for the report! I created a separate issue to track this: #12363.

github-actions[bot] commented 4 months ago

This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.

OlivierNicole commented 4 months ago

I can still reproduce the behaviour signaled in this issue.

gasche commented 4 months ago

Propagating the backtrace from the spawnee to the spawner makes sense; there is a small performance cost, but it is fine if we think of domain creation/joins are rare events. I would implement this by tweaking the Domain.Raw.state type, changing Finished of ('a, exn) result into Finished of ('a, exn * Printexc.raw_backtrace option). This value is built from the C side in runtime/domain.c, so changing this (and reraising appropriately on Domain.join) needs a bit of C-level manipulation of OCaml values, that is never pleasant. (See also the conversation in https://github.com/ocaml/ocaml/pull/12409#discussion_r1271486676 )