Closed jmid closed 1 year ago
On the positive side, this roughly halves the Windows CI time.
On the negative side there was still a "failed to allocate domain"
error...
In principle we could just adjust the tested property to make this acceptable behaviour.
I believe the 5.0 x86_32-bit ocaml-ci error is due to using (modes native)
- which will not be supported.
Since dune
should choose the "best mode available" simply deleting these lines should be sufficient to avoid the error.
However I'm wondering whether the native
lines were added to disable the tests on non-native platforms? :thinking:
I believe the 5.0 x86_32-bit ocaml-ci error is due to using (modes native) - which will not be supported.
I now see 5.0-ppc64 and 5.0-s390x are going to fail for the same reason. Perhaps the proper fix would be to disable the tests (not the package) on bytecode-only architectures?
Thanks for taking a look at this @jmid!
However I'm wondering whether the native lines were added to disable the tests on non-native platforms?
I think native
modes were added for performance reasons (@kayceesrk can correct me if I'm wrong) - it should be okay to let dune decide what to run. Would that fix the failures on bytecode-only platforms?
I think native modes were added for performance reasons (@kayceesrk can correct me if I'm wrong) - it should be okay to let dune decide what to run. Would that fix the failures on bytecode-only platforms?
I don't think it is useful to run performance-sensitive workloads in bytecode mode. And, domainslib is for performance. I can't imagine bytecode runs for anything using domainslib being useful.
That said, if you just want compatibility, you can remove to (modes native)
.
Would that fix the failures on bytecode-only platforms?
It would at least fix the Build failed
errors due to Error: No rule found for test/off_by_one.exe
such as the ones in https://ocaml.ci.dev/github/ocaml-multicore/domainslib/commit/480029b010decd3e913cfcb3c9c2669327a4b9b3/variant/debian-11-5.0_s390x_opam-2.1
Now, with the s390x native backend just merged yesterday, that will save us from having to roll a new Domainslib release to support this reestablished native platform when it get released with 5.1 or 5.2.
It will not fix the Failed to allocate domain
error that we are still seeing.
I suspect this might be caused by dune runtest
running parallel tests in parallel :thinking:
There is something odd to me with all tests except the QCheck ones disabled in bytecode mode. Disabling all the tests (not the package) on bytecode-only architectures strikes me as more consistent and future proof. It should be possible to do so in the opam-file.
It will not fix the Failed to allocate domain error that we are still seeing. I suspect this might be caused by dune runtest running parallel tests in parallel
Hm, possibly. What you mean is running parallel tests in parallel ends up spawning more than the limit - wondering if this could happen in native runs too?
Disabling all the tests (not the package) on bytecode-only architectures strikes me as more consistent and future proof.
I'm not sure why we should support architectures where we know the testsuite doesn't go through. Is it for the sake of completeness or am I missing something?
If there is an easy way to disable 32-bit builds, ppc64, s390x in the ocaml-ci, we should do it. s390x will land in 5.1. ppc64 sometime in the future.
If there is an easy way to disable 32-bit builds, ppc64, s390x in the ocaml-ci, we should do it.
Great, I think fewer restrictions is key to enable broader adoption :+1:
I've made 3 changes:
Failure "failed to allocate domain"
from Domain.spawn
which helps on 32-bit architectures where fewer domains are available(modes native)
annotations from the other tests as it causes ocaml-ci to signal test failuresLet's see how this fares on ocaml-ci... :popcorn:
CI summary (while I wait for the ppc64 run to complete):
test/backtrace.ml
is failing on arm32, ppc64, s390x, x86_32
test/off_by_one.ml
is failing with Failure "failed to allocate domain"
on arm32, x86_32
test/LU_decomposition_multicore.ml
is failing with with a debug runtime assertion failure on arm32:
### OCaml runtime: debug mode ###
[00] file runtime/fiber.c; line 182 ### Assertion failed: Stack_high(stack) - Stack_base(stack) == wosize || Stack_high(stack) - Stack_base(stack) == wosize + 1
The backtrace
failure seems related to using bytecode. Here's some info from the option
-returning backtrace_slots
: https://v2.ocaml.org/releases/5.0/api/Printexc.html#1_Manipulationofbacktraceinformation I would have expected the test to be compiled with debug info -g
by default under dune :thinking:
The off_by_one
failure is natural on 32-bit platforms with Max_domains = 16
. It should be easy to accept this behaviour.
The decomposition
failure sounds more serious...
The ppc64 run ended up failing with a time out, so 0637fb1 also disables the two long running PBTs on ppc64 (64 bit bytecode for the moment).
07ca1ff adjusts test/off_by_one
to accept a Failure "failed to allocate domain"
.
This leaves the backtrace
option failure and the decomposition
assertion failure.
After confirming locally I've now opened an issue for the LU_decomposition assertion failure on arm32 here: https://github.com/ocaml/ocaml/issues/12267
I would welcome input on the backtrace
(bytecode) failure, where it seems Printexc.get_raw_backtrace
returns None
.
We don't know yet if the backtrace
test failure is a compiler bug. We have an issue for the LU decomposition failure. Anyways, I think we can disable backtrace
(and open an issue if needed) and LU_decompositon
tests on bytecode-only platforms and merge this PR.
I've now dug a bit further and it seems to be a dune issue on bytecode only switches: https://github.com/ocaml/dune/issues/7845 I've also disabled the backtrace test on bytecode platforms in b10075a.
Let's see how this fares on the CI :popcorn:
The CI is happy finally! Thanks for the work, Jan. Merging this.
I noticed the CI run of #110 had 32-bit
Failure("failed to allocate domain")
in test/task_one_dep.mlI believe this is because the test does not take into account the available cores on the target machine. This PR therefore adjusts the test to do so.
I hope this is sufficient to please the (32-bit) CI so that we can avoid disabling certain platforms like #111 proposes :crossed_fingers: