xavierleroy / camlidl

Stub code generator for OCaml/C interface
Other
34 stars 8 forks source link

Initialize local roots with `Val_unit` to fix segfault #29

Closed sim642 closed 2 months ago

sim642 commented 2 months ago

Long story (below) short: local GC roots (in this case in an array) should be initialized with Val_unit instead of 0 like the CAMLlocal* macros do it. In the case described below, while in the middle of passing multiple out arguments, GC may be triggered, which could see improperly initialized parts of the output tuple and segfault.

Currently this PR is a draft because I've made the fix in only one place in camlidl code generation that mattered for the specific case (where it fixes the segfault). If my understanding of the OCaml runtime is correct and such fix is indeed needed, I can try to extend it to other parts of the code generation. This PR fixes all such places in camlidl I managed to find.

Long story

In Goblint I have been fighting with spurious segfaults (https://github.com/goblint/analyzer/issues/1520#issuecomment-2194283334) for a while. I finally managed to get a backtrace from gdb:

#0  caml_darken (v=0, p=0x7fffffffb728) at major_gc.c:285
#1  0x00005555577da51d in caml_do_local_roots_nat (f=f@entry=0x5555577dc930 <caml_darken>, bottom_of_stack=<optimized out>, last_retaddr=<optimized out>, gc_regs=<optimized out>, local_roots=<optimized out>)
    at roots_nat.c:514
#2  0x00005555577da6e5 in caml_do_roots (f=0x5555577dc930 <caml_darken>, do_globals=<optimized out>) at roots_nat.c:432
#3  0x00005555577da732 in caml_darken_all_roots_start () at roots_nat.c:357
#4  0x00005555577dd250 in start_cycle () at major_gc.c:407
#5  caml_major_collection_slice (howmuch=howmuch@entry=-1) at major_gc.c:1089
#6  0x00005555577de741 in caml_gc_dispatch () at minor_gc.c:500
#7  0x00005555577de851 in caml_check_urgent_gc (extra_root=<optimized out>, extra_root@entry=1) at minor_gc.c:575
#8  0x00005555577de8e4 in caml_alloc_small_dispatch (wosize=2, flags=flags@entry=1, nallocs=nallocs@entry=1, encoded_alloc_lens=encoded_alloc_lens@entry=0x0) at minor_gc.c:524
#9  0x00005555577e00f9 in caml_alloc_small (wosize=2, tag=tag@entry=255) at alloc.c:68
#10 0x00005555577f4645 in alloc_custom_gen (ops=0x5555587e7c00 <camlidl_apron_custom_var_ptr>, bsz=<optimized out>, mem=0, max_major=1, mem_minor=0, max_minor=1) at custom.c:50
#11 0x000055555777d5d4 in camlidl_environment_ap_environment_vars ()
#12 0x0000555556dd5a0b in camlGoblint_lib__ApronDomain__join_2689 () at src/cdomains/apron/apronDomain.apron.ml:568
[...]

It looks like something to do with Apron, although that could be any heap corruption that just happens to crash from allocations from Apron bindings.

The top of this stack corresponds to this "Local C roots" code in the runtime: https://github.com/ocaml/ocaml/blob/8eb41f72ded84df884c3671734c947f612091f84/runtime/roots_nat.c#L509-L517. After prodding around with gdb for a while, I realized that the problematic pointer (0x7fffffffb728) corresponded exactly to the second item in a two-element top-level local C root.

This comes from the following camlidl-generated code (with SEGFAULT location marked):

value camlidl_environment_ap_environment_vars(
    value _v_e)
{
  ap_environment_ptr e; /*in*/
  ap_var_t *name_of_intdim; /*out*/
  ap_var_t *name_of_realdim; /*out*/
  struct camlidl_ctx_struct _ctxs = { CAMLIDL_TRANSIENT, NULL };
  camlidl_ctx _ctx = &_ctxs;
  mlsize_t _c1;
  value _v2;
  mlsize_t _c3;
  value _v4;
  value _vresult;
  value _vres[2] = { 0, 0, };

  camlidl_ml2c_environment_ap_environment_ptr(_v_e, &e, _ctx);
  name_of_intdim = camlidl_malloc((*e).intdim * sizeof(ap_var_t ), _ctx);
  name_of_realdim = camlidl_malloc((*e).realdim * sizeof(ap_var_t ), _ctx);
  /* begin user-supplied calling sequence */

{
size_t i;
for(i=0;i<e->intdim;i++) name_of_intdim[i] = ap_var_operations->copy(e->var_of_dim[i]);
 for(i=0;i<e->realdim;i++) name_of_realdim[i] = ap_var_operations->copy(e->var_of_dim[e->intdim+i]);
 }
  /* end user-supplied calling sequence */
  Begin_roots_block(_vres, 2)
    _vres[0] = camlidl_alloc((*e).intdim, 0);
    Begin_root(_vres[0])
      for (_c1 = 0; _c1 < (*e).intdim; _c1++) {
        _v2 = camlidl_c2ml_var_ap_var_t(&name_of_intdim[_c1], _ctx); // SEGFAULT location
        caml_modify(&Field(_vres[0], _c1), _v2);
      }
    End_roots()
    _vres[1] = camlidl_alloc((*e).realdim, 0);
    Begin_root(_vres[1])
      for (_c3 = 0; _c3 < (*e).realdim; _c3++) {
        _v4 = camlidl_c2ml_var_ap_var_t(&name_of_realdim[_c3], _ctx);
        caml_modify(&Field(_vres[1], _c3), _v4);
      }
    End_roots()
    _vresult = camlidl_alloc_small(2, 0);
    Field(_vresult, 0) = _vres[0];
    Field(_vresult, 1) = _vres[1];
  End_roots()
  camlidl_free(_ctx);
  return _vresult;
}

At that point, some inlined functions allocate on the OCaml heap and trigger GC, which scans the top-level local roots in _vres. The first one is fine, because a proper array was already allocated for it. The second one is the problem: it still has value 0 from its initialization. caml_darken checks that this is a block and tries to dereference stuff around the NULL pointer.

If this were Val_unit, like CAMLlocal* would initialize it (before adding a local root), everything would be fine.