Long story (below) short: local GC roots (in this case in an array) should be initialized with Val_unit instead of 0 like the CAMLlocal* macros do it.
In the case described below, while in the middle of passing multiple out arguments, GC may be triggered, which could see improperly initialized parts of the output tuple and segfault.
Currently this PR is a draft because I've made the fix in only one place in camlidl code generation that mattered for the specific case (where it fixes the segfault). If my understanding of the OCaml runtime is correct and such fix is indeed needed, I can try to extend it to other parts of the code generation.
This PR fixes all such places in camlidl I managed to find.
#0 caml_darken (v=0, p=0x7fffffffb728) at major_gc.c:285
#1 0x00005555577da51d in caml_do_local_roots_nat (f=f@entry=0x5555577dc930 <caml_darken>, bottom_of_stack=<optimized out>, last_retaddr=<optimized out>, gc_regs=<optimized out>, local_roots=<optimized out>)
at roots_nat.c:514
#2 0x00005555577da6e5 in caml_do_roots (f=0x5555577dc930 <caml_darken>, do_globals=<optimized out>) at roots_nat.c:432
#3 0x00005555577da732 in caml_darken_all_roots_start () at roots_nat.c:357
#4 0x00005555577dd250 in start_cycle () at major_gc.c:407
#5 caml_major_collection_slice (howmuch=howmuch@entry=-1) at major_gc.c:1089
#6 0x00005555577de741 in caml_gc_dispatch () at minor_gc.c:500
#7 0x00005555577de851 in caml_check_urgent_gc (extra_root=<optimized out>, extra_root@entry=1) at minor_gc.c:575
#8 0x00005555577de8e4 in caml_alloc_small_dispatch (wosize=2, flags=flags@entry=1, nallocs=nallocs@entry=1, encoded_alloc_lens=encoded_alloc_lens@entry=0x0) at minor_gc.c:524
#9 0x00005555577e00f9 in caml_alloc_small (wosize=2, tag=tag@entry=255) at alloc.c:68
#10 0x00005555577f4645 in alloc_custom_gen (ops=0x5555587e7c00 <camlidl_apron_custom_var_ptr>, bsz=<optimized out>, mem=0, max_major=1, mem_minor=0, max_minor=1) at custom.c:50
#11 0x000055555777d5d4 in camlidl_environment_ap_environment_vars ()
#12 0x0000555556dd5a0b in camlGoblint_lib__ApronDomain__join_2689 () at src/cdomains/apron/apronDomain.apron.ml:568
[...]
It looks like something to do with Apron, although that could be any heap corruption that just happens to crash from allocations from Apron bindings.
At that point, some inlined functions allocate on the OCaml heap and trigger GC, which scans the top-level local roots in _vres. The first one is fine, because a proper array was already allocated for it. The second one is the problem: it still has value 0 from its initialization. caml_darken checks that this is a block and tries to dereference stuff around the NULL pointer.
If this were Val_unit, like CAMLlocal* would initialize it (before adding a local root), everything would be fine.
Long story (below) short: local GC roots (in this case in an array) should be initialized with
Val_unit
instead of0
like theCAMLlocal*
macros do it. In the case described below, while in the middle of passing multiple out arguments, GC may be triggered, which could see improperly initialized parts of the output tuple and segfault.Currently this PR is a draft because I've made the fix in only one place in camlidl code generation that mattered for the specific case (where it fixes the segfault). If my understanding of the OCaml runtime is correct and such fix is indeed needed, I can try to extend it to other parts of the code generation.This PR fixes all such places in camlidl I managed to find.Long story
In Goblint I have been fighting with spurious segfaults (https://github.com/goblint/analyzer/issues/1520#issuecomment-2194283334) for a while. I finally managed to get a backtrace from gdb:
It looks like something to do with Apron, although that could be any heap corruption that just happens to crash from allocations from Apron bindings.
The top of this stack corresponds to this "Local C roots" code in the runtime: https://github.com/ocaml/ocaml/blob/8eb41f72ded84df884c3671734c947f612091f84/runtime/roots_nat.c#L509-L517. After prodding around with gdb for a while, I realized that the problematic pointer (
0x7fffffffb728
) corresponded exactly to the second item in a two-element top-level local C root.This comes from the following camlidl-generated code (with
SEGFAULT location
marked):At that point, some inlined functions allocate on the OCaml heap and trigger GC, which scans the top-level local roots in
_vres
. The first one is fine, because a proper array was already allocated for it. The second one is the problem: it still has value 0 from its initialization.caml_darken
checks that this is a block and tries to dereference stuff around the NULL pointer.If this were
Val_unit
, likeCAMLlocal*
would initialize it (before adding a local root), everything would be fine.