Open OlivierNicole opened 4 months ago
I removed the dependency to Domainslib to use Stdlib mutexes. Since @jmid mentioned in a discussion that he wasn’t fond of the user having to wrap the test runner in a call to Util.Domain_pair.run (fun pair -> ...)
, I also tried to reuse domains only for intra-test repetitions, as he suggested. I measured the following run times (s):
src/array/lin_tests.ml |
src/array/stm_tests.ml |
|
---|---|---|
main (c65989d) | 4.796 s ± 2.306 | 2.640 s ± 5.203 |
Intra-test reuse of domains | 2.543 s ± 0.751 (speedup 1.89) | 2.272 s ± 1.925 (speedup 1.16) |
Maximal reuse of domains | 716.9 ms ± 725.0 (speedup 6.67) | 1.270 s ± 1.004 (speedup 2.08) |
@jmid also argued that spawning a pair of Domains for each pair of parallel command sequences allows to test them for a clean state. Although it’s true that it means an initially empty minor heap, I can’t think of tests where an initial empty minor heap is a better test context.
(With some advice and help from @fabbing to switch to mutexes!)
Thanks Olivier!
Quick observation: why does this cause so many Failure("failed to allocate domain")
errors?
Overall, this should allocate less Domain
s - not more. Are there conditions under which the spawned Domain
s are not joined? (that could explain why several tests are running out of them)
Strange, it should indeed spawn strictly less domains.
Overall, this should allocate less
Domain
s - not more. Are there conditions under which the spawnedDomain
s are not joined?
Normally, no, the Util.Domain_pair.run
function joins the domains before returning. I’ll try to take a look the next time I have some time on my hands.
Thanks again for this! :pray:
Well spotted, thanks!
There seems to be a lot of failures still, I’ll have a look at it on my spare time at some point…
I think I figured this one out too: the hint was STM
tests with non-trivial precond
triggering failed to allocate domain
.
Preconditions are checked before Domains running, and if the precondition fails, it is signaled to QCheck ... with an exception, thereby causing takedown
not to run :shrug:
With that in mind, it makes sense to run the precondition check first, before attempting pool init.
The opam install workflows will still fail, due to util.ml
now requiring OCaml 5.
@jmid also argued that spawning a pair of Domains for each pair of parallel command sequences allows to test them for a clean state. Although it’s true that it means an initially empty minor heap, I can’t think of tests where an initial empty minor heap is a better test context.
There are a few, e.g., tests of Weak
and Ephemeron
where the state of the heap may affect the behaviour of a test run. Another example where Domain
-reuse may bite is in tests of Domain.DLS
, where the state is attached to the Domain
itself.
I think we misunderstood eachother however: Generally, I'm fond of self-contained properties, as they are central to reproduce errors and have shrinking work well. Starting to depend on the state of reused Domains, challenges this, if the state of the underlying pool may start to influence things. From this POV it is preferable to decouple the testing of one triple from the next.
For end-users, and assuming the runtime is bug-free, Domain reuse may be less of a worry. However, as we are foremost concerned with testing and helping ensure the correctness of the OCaml 5 runtime, I'm thinking of whether Domain-reuse should be optional... :thinking:
I see, it makes sense to me too, then, to make domain re-use an option.
Given that domains are costly to spawn and join, and parallel tests in multicoretests do it many thousands of times, I wanted to look at it more closely. A quick perf profile on
src/array/lin_tests.ml
shows that more than 50 % of the time is spent inDomain.spawn
. I wanted to try spawning domains only at the start of the test sequence and reusing them. Then I realised that this is exactly what Domainslib provides with domain pools!This proof of concept PR simply replaces
Domain.spawn
withDomainslib.Task.async
andDomain.join
withDomainslib.Task.await
. The approach implies to setup a domain pool and pass it to the tests. I have editedsrc/array/{lin,stm}_tests.ml
to reflect this.The results are rather encouraging:
src/array/lin_tests.ml
is cut down from about 4.5 s to 0.3 s! (With extremely far off statistical outliers due to bad luck in interleaving search.) However, this is a favourable case as the test’s commands are very cheap to run. The same comparison on my extensive Dynarray STM parallel test gives 24 s vs 12 s for the version with domain reuse, so merely a 2x speedup.If depending on Domainslib is an issue, the mechanism can be reimplemented using atomics, mutexes and condition variables easily enough. (A reimplementation would likely squeeze some more performance out by not requiring a work-stealing queue.)
It looks like about 20 % of the time is now taken by
Sys.cpu_relax
, so further improvement may be possible still.