ocaml-multicore / multicoretests

PBT testsuite and libraries for testing multicore OCaml
https://ocaml-multicore.github.io/multicoretests/
BSD 2-Clause "Simplified" License
37 stars 16 forks source link

[ocaml5-issue] Assertion failure `s->running` during parallel `STM` or `Lin` tests #447

Closed jmid closed 1 week ago

jmid commented 5 months ago

The merge of #445 to main triggered an assertion failure and abort on Linux trunk during STM Out_channel test parallel: https://github.com/ocaml-multicore/multicoretests/actions/runs/8441854686/job/23121952174

random seed: 115742799
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test sequential (generating)
[✓] 1000    0    0 1000 / 1000     3.7s STM Out_channel test sequential

[02] file runtime/domain.c; line 326 ### Assertion failed: s->running
File "src/io/dune", line 40, characters 7-16:
40 |  (name stm_tests)
            ^^^^^^^^^
(cd _build/default/src/io && ./stm_tests.exe --verbose)
Command got signal ABRT.
[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test parallel
jmid commented 3 months ago

Saw this again in focused tests on #304: Linux 5.3.0+trunk debug - this time on STM Sys test parallel https://github.com/ocaml-multicore/multicoretests/actions/runs/9131253128/job/25110039250?pr=304

Starting 6-th run

random seed: 357814880
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Sys test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Sys test sequential (generating)
[✓] 1000    0    0 1000 / 1000     9.4s STM Sys test sequential

[ ]    0    0    0    0 / 2500     0.0s STM Sys test parallel
[02] file runtime/domain.c; line 325 ### Assertion failed: s->running
/usr/bin/bash: line 1: 1943510 Aborted                 (core dumped) ./focusedtest.exe -v
[ ]  559    0    0  559 / 2500    50.7s STM Sys test parallel
jmid commented 3 weeks ago

I just observed this locally, on Linux running 5.2.0, trying a run with an extreme space_overhead o=20 and the debug runtime to see if it would reveal anything:

multicoretests$ OCAMLRUNPARAM="s=4096,o=20,v=0,V=1" dune build "@ci" -j1 --no-buffer --display=quiet --cache=disabled --error-reporting=twice --profile=debug-runtime src/
[...]
random seed: 446171203
generated error fail pass / total     time test name
[ ]    1    0    0    1 / 1000     0.5s Lin In_channel test with Domain (shrinking:   11.0003)[01] file runtime/domain.c; line 336 ### Assertion failed: s->running
File "src/io/dune", line 21, characters 7-23:
21 |  (name lin_tests_domain)
            ^^^^^^^^^^^^^^^^
Command got signal ABRT.
jmid commented 1 week ago

Closing as this as been fixed upstream and added in #475