ocaml-multicore / multicoretests

PBT testsuite and libraries for testing multicore OCaml
https://ocaml-multicore.github.io/multicoretests/
BSD 2-Clause "Simplified" License
37 stars 16 forks source link

[ocaml5-issue] Segfault or hang on macOS in STM Gc stress test parallel #480

Open jmid opened 1 month ago

jmid commented 1 month ago

The upcoming Gc test in #469 has surfaced a macOS segfault while running STM Gc stress test parallel.

This first happened on 5.3.0+trunk: https://github.com/ocaml-multicore/multicoretests/actions/runs/11010731348/job/30573309071

random seed: 464277843
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential (generating)
[✗]    9    0    1    8 / 1000     0.0s STM Gc test sequential

[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential in child domain
[✗]    9    0    1    8 / 1000     0.0s STM Gc test sequential in child domain

[ ]    0    0    0    0 / 1000     0.0s STM Gc test parallel
[✓]    2    0    1    1 / 1000     8.9s STM Gc test parallel

File "src/gc/dune", line 4, characters 7-16:
4 |  (name stm_tests)
           ^^^^^^^^^
(cd _build/default/src/gc && ./stm_tests.exe --verbose)
Command got signal SEGV.
[ ]    0    0    0    0 / 1000     0.0s STM Gc stress test parallel

and then on 5.2.0: https://github.com/ocaml-multicore/multicoretests/actions/runs/11037096487/job/30657143459?pr=469

random seed: 377633884
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential (generating)
[✓] 1000    0    0 1000 / 1000     0.2s STM Gc test sequential

[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential in child domain
[✓] 1000    0    0 1000 / 1000     3.3s STM Gc test sequential in child domain

[ ]    0    0    0    0 / 1000     0.0s STM Gc test parallel
[✓]    1    0    1    0 / 1000    25.9s STM Gc test parallel

[ ]    0    0    0    0 / 1000     0.0s STM Gc stress test parallel
File "src/gc/dune", line 13, characters 7-16:
13 |  (name stm_tests)
            ^^^^^^^^^
(cd _build/default/src/gc && ./stm_tests.exe --verbose)
Command got signal SEGV.
jmid commented 1 week ago

The latest CI run also triggered a Gc test crash on 5.2.0 under macOS with an Intel/amd64 CPU: https://github.com/ocaml-multicore/multicoretests/actions/runs/11479668195/job/31946444432

random seed: 302956495
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential (generating)
[✓] 1000    0    0 1000 / 1000     0.4s STM Gc test sequential

[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential in child domain
[✓] 1000    0    0 1000 / 1000     0.9s STM Gc test sequential in child domain

[ ]    0    0    0    0 / 1000     0.0s STM Gc test parallel
[✓]   21    0    1   20 / 1000     7.4s STM Gc test parallel

File "src/gc/dune", line 13, characters 7-16:
13 |  (name stm_tests)
            ^^^^^^^^^
(cd _build/default/src/gc && ./stm_tests.exe --verbose)
Command got signal BUS.
[ ]    0    0    0    0 / 1000     0.0s STM Gc stress test parallel
jmid commented 1 week ago

Realized that this issue may also cause hangs / infinite loops. Here's a fresh case on macOS-ARM64 testing trunk where the 18th repetition has taken ~1 hour to complete, whereas the previous 17 repetitions each completed in under 1min each: https://github.com/ocaml-multicore/multicoretests/actions/runs/11555386874/job/32160677095

Starting 18-th run
Page size: 16384

random seed: 253047476
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential (generating)
[✓] 1000    0    0 1000 / 1000     0.2s STM Gc test sequential

[ ]    0    0    0    0 / 1000     0.0s STM Gc test sequential in child domain
[✓] 1000    0    0 1000 / 1000     0.5s STM Gc test sequential in child domain

[ ]    0    0    0    0 / 1000     0.0s STM Gc test parallel
[✓]    2    0    1    1 / 1000     8.9s STM Gc test parallel

[hang]