Measure statistical significance of Domain setup

jmid commented 2 years ago

There's one remaining usage of cpu_relax in spinning the first domain while waiting for the second domain to start-up: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/lib/lin.ml#L122-L124 Now that we have statistics in place, it would be natural to give this Domain setup a run-down to see what aspects actually influence the bug-finding ability similar to what I did for Thread recently: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/src/statistics/README.md?plain=1#L129-L143

For Thread a wait loop had an significant effect. For Domain it would be nice to confirm - and also investigate whether there could be better ways to accomplish this. In the tests for the work-stealing deque that has now been pulled out of domainslib the spinning did not work at all to trigger issues on MacOSX, so I ended up going with a binary semaphore: https://github.com/jmid/multicoretests/blob/8a9a2327e06036f06ca5ef4b1321129ccff557d6/src/domainslib/ws_deque_test.ml#L131-L133 The simpler, the better. A combination of a Mutex and a Condition variable may also be sufficient.

Originally posted by @jmid in https://github.com/jmid/multicoretests/issues/43#issuecomment-1099991569

n-osborne commented 2 years ago

I've been trying to have some numbers comparing bug-triggering with cpu_relax and semaphore. I have some strange results (no buggy programs found over 10000 while CI is happy with 1000...) and I don't understand yet, but it seems that synchronization with a semaphore is a bit faster than with a cpu_relax:

$ dune exec -- src/neg_tests/conclist_stm_tests.exe
random seed: 138767447
generated error  fail  pass / total     time test name
[✓] 10000     0     0 10000 / 10000   103.1s STM int64 CList with cpu_relax
[✓] 10000     0     0 10000 / 10000    78.5s STM int64 CList with semaphore
================================================================================
success (ran 2 tests)
relax : 0 / 10000
semap : 0 / 10000

Code is here: https://github.com/n-osborne/multicoretests/blob/domain-stats/src/neg_tests/conclist_stm_tests.ml#L53 and here: https://github.com/n-osborne/multicoretests/blob/domain-stats/lib/STM.ml#L391

jmid commented 2 years ago

That's indeed interesting that the Semaphore is faster than the "Atomic waiting loop" :+1: :thinking:

I had a quick look:

When an exception is raised mk_prop does not increase the counter (I think it should)
I also noticed that the stats tests are not using repeat. To be comparable to the CI's 1000 iterations I would try to use it here too.

n-osborne commented 2 years ago

* When an exception is raised `mk_prop` does not increase the counter (I think it should)

Yes, that works better that way.

* I also noticed that the stats tests are not using `repeat`. To be comparable to the CI's 1000 iterations I would try to use it here too.

That was just to have something a bit more accurate for speed.

So Semaphore are indeed faster, but spot far less buggy programs:

This is with repeat 25 prop.

$ dune exec -- src/neg_tests/conclist_stm_tests.exe
random seed: 300478220
generated error  fail  pass / total     time test name
[✓] 10000     0     0 10000 / 10000  3302.4s STM int64 CList with cpu_relax
[✓] 10000     0     0 10000 / 10000  1970.2s STM int64 CList with semaphore
================================================================================
success (ran 2 tests)
relax : 36868 / 10000
semap : 8 / 10000

jmid commented 2 years ago

Ah, that is indeed quite a difference! :open_mouth:

i'm surprised by the number 36868 though! Because of the way Util.repeat is implemented it should stop early on the first failed property. I would thus expect it to increment the counter at most once for each of the 25 repetitions and hence reach at most 10000. :thinking:

ocaml-multicore / multicoretests

Measure statistical significance of Domain setup #47