ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.19k stars 1.06k forks source link

Assertion failure in `shared_heap.c` in linuxO0 CI run #13090

Open kayceesrk opened 1 month ago

kayceesrk commented 1 month ago

There is an unrelated assertion failure in #13079 in the linux-O0 CI run while testing the bytecode version of mctest.ml. See

https://github.com/ocaml/ocaml/actions/runs/8623935626/job/23638056562?pr=13079#step:7:77

> [00] file runtime/shared_heap.c; line 1398 ### Assertion failed: local->stats.pool_live_words == pool_stats.live

Copying the relevant snippet here as the CI logs will be culled eventually:

Process 12822 got signal 6(Aborted), core dumped
Could not find core file.
 ... testing 'mctest.ml' => failed
 ... testing 'mctest.ml' with line 3 (hasunix) => passed
 ... testing 'mctest.ml' with line 5 (bytecode) => failed (Running program /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte without any argument: command
/home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte 
failed with exit code -6)
 ... testing 'mctest.ml' with line 7 (native) => passed
> Specified modules: mctest.ml
> Source modules: mctest.ml
> Running test hasunix with 1 actions
> 
> Running action 1/1 (hasunix)
> Action 1/1 (hasunix) => passed (unix library available)
> Running test bytecode with 9 actions
> 
> Running action 1/9 (setup-ocamlc.byte-build-env)
> Action 1/9 (setup-ocamlc.byte-build-env) => passed
> 
> Running action 2/9 (ocamlc.byte)
> Compiling program /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte from modules  mctest.ml
> Commandline: /home/runner/work/ocaml/ocaml/runtime/ocamlrund /home/runner/work/ocaml/ocaml/ocamlc -use-runtime /home/runner/work/ocaml/ocaml/runtime/ocamlrund -I /home/runner/work/ocaml/ocaml/runtime -nostdlib -I /home/runner/work/ocaml/ocaml/stdlib -I /home/runner/work/ocaml/ocaml/otherlibs/unix unix.cma -o /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte mctest.ml
>   Redirecting stdout to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/ocamlc.byte.output 
>   Redirecting stderr to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/ocamlc.byte.output 
> Action 2/9 (ocamlc.byte) => passed
> 
> Running action 3/9 (check-ocamlc.byte-output)
> Comparing compiler output /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/ocamlc.byte.output to reference /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/mctest.compilers.reference
> Action 3/9 (check-ocamlc.byte-output) => passed
> 
> Running action 4/9 (run)
> Commandline: /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte
>   Redirecting stdout to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte.output 
>   Redirecting stderr to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte.output 
> ### begin stdout ###
> [00] file runtime/shared_heap.c; line 1398 ### Assertion failed: local->stats.pool_live_words == pool_stats.live
> ### end stdout ###
> Action 4/9 (run) => failed (Running program /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte without any argument: command
> /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlc.byte/mctest.byte 
> failed with exit code -6)
> Running test native with 8 actions
> 
> Running action 1/8 (setup-ocamlopt.byte-build-env)
> Action 1/8 (setup-ocamlopt.byte-build-env) => passed
> 
> Running action 2/8 (ocamlopt.byte)
> Compiling program /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/mctest.opt from modules  mctest.ml
> Commandline: /home/runner/work/ocaml/ocaml/runtime/ocamlrund /home/runner/work/ocaml/ocaml/ocamlopt -runtime-variant d -I /home/runner/work/ocaml/ocaml/runtime -nostdlib -I /home/runner/work/ocaml/ocaml/stdlib -I /home/runner/work/ocaml/ocaml/otherlibs/unix unix.cmxa -o /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/mctest.opt mctest.ml
>   Redirecting stdout to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/ocamlopt.byte.output 
>   Redirecting stderr to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/ocamlopt.byte.output 
> Action 2/8 (ocamlopt.byte) => passed
> 
> Running action 3/8 (check-ocamlopt.byte-output)
> Comparing compiler output /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/ocamlopt.byte.output to reference /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/mctest.compilers.reference
> Action 3/8 (check-ocamlopt.byte-output) => passed
> 
> Running action 4/8 (run)
> Commandline: /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/mctest.opt
>   Redirecting stdout to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/mctest.opt.output 
>   Redirecting stderr to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/mctest.opt.output 
> ### begin stdout ###
> done [100](https://github.com/ocaml/ocaml/actions/runs/8623935626/job/23638056562?pr=13079#step:7:101)000000
> done 100000000
> done 100000000
> done 100000000
> ### end stdout ###
> Action 4/8 (run) => passed
> 
> Running action 5/8 (check-program-output)
> Comparing program output /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.byte/mctest.opt.output to reference /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/mctest.reference
> Action 5/8 (check-program-output) => passed
> 
> Running action 6/8 (setup-ocamlopt.opt-build-env)
> Action 6/8 (setup-ocamlopt.opt-build-env) => passed
> 
> Running action 7/8 (ocamlopt.opt)
> Compiling program /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.opt/mctest.opt from modules  mctest.ml
> Commandline: /home/runner/work/ocaml/ocaml/ocamlopt.opt -runtime-variant d -I /home/runner/work/ocaml/ocaml/runtime -nostdlib -I /home/runner/work/ocaml/ocaml/stdlib -I /home/runner/work/ocaml/ocaml/otherlibs/unix unix.cmxa -o /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.opt/mctest.opt mctest.ml
>   Redirecting stdout to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.opt/ocamlopt.opt.output 
>   Redirecting stderr to /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.opt/ocamlopt.opt.output 
> Action 7/8 (ocamlopt.opt) => passed
> 
> Running action 8/8 (check-ocamlopt.opt-output)
> Comparing compiler output /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/_ocamltest/tests/parallel/mctest/ocamlopt.opt/ocamlopt.opt.output to reference /home/runner/work/ocaml/ocaml/testsuite/tests/parallel/mctest.compilers.reference
> Action 8/8 (check-ocamlopt.opt-output) => passed
gasche commented 1 month ago

Test source: https://github.com/ocaml/ocaml/blob/4c6a3849022ba19c23fb1860095f65eb09da157c/testsuite/tests/parallel/mctest.ml

I wondered if this could be related to an orphaning issue. Interestingly, this benchmark spawns several domains, but it never joins any of them. (We had a discussion recently on whether that was an okay thing to do.) It looks like the worker domains never terminate either.

kayceesrk commented 1 month ago

Unsure of what is the source of this failure. I've pinged the usual suspects at Tarides, and none of them have seen this assertion failure previously. The first step would be to recreate this failure.