ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.19k stars 1.06k forks source link

Fix some parallel tests that assume fairness. #13128

Closed damiendoligez closed 2 weeks ago

damiendoligez commented 2 weeks ago

INRIA CI is failing for Mac M1 because of tests/parallel/domain_parallel_spawn_burn_gc_set.ml. I'll explain why by looking at a very similar test that is more convenient to study because it takes less time to run.

Timings for tests/parallel/domain_parallel_spawn_burn.ml:

hardware OS 5.2 trunk #12579
Mac x64 macos 14 13.08 13.77 4.80
PC x64 Ubuntu 22.04 13.97 13.40 6.15
Mac M1 macos 14 60.26 52.90 4.33
Mac M2 macos 14 39.99 40.83 4.12
RasPi 5 Debian 12 75.85 210.01 16.87
RasPi 4B Debian 12 43.33 57.84 25.15

(trunk is at commit eabbb4002a0dabdbcf708f52a1db1d7a11618119)

As you can see, #12579 is a big improvement for this test. The improvement comes from commit 2992d123f7e76252355bb299de670a82911cfcdf which implements some fairness in the runtime.

The problem is that the test assumes fairness in the scheduler, which is not guaranteed by the language. The amount of work done by the test varies according to the decisions made by the scheduler, which explains the wide variations in timings.

To show this, I modified the program to print out the number of times it has requested minor and major GCs.

minor loops / major loops :

hardware OS 5.2 trunk #12579
Mac x86 macos 14 356966 / 172352 258478 / 125616 1938 / 726
PC x86 Ubuntu 22.04 171050 / 81554 163590 / 97013 3590 / 1704
Mac M1 macos 14 456924 / 289306 621432 / 390233 1644 / 672
Mac M2 macos 14 769771 / 297987 1179670 / 520992 1903 / 716
RasPi 5 Debian 12 64bit 370472 / 241345 124102 / 83500 2084 / 980
RasPi 4B Debian 12 64bit 113287 / 72835 227548 / 145320 2022 / 858

Note that these numbers are subject to large random variations.

The test is composed of two parts, one that does a bounded amount of work (allocations and spawning/joining threads) and the other does an unbounded amount of work (allocations and calling Gc.set) and stops when the first one is finished.

To solve this problem I propose to run both parts with no bounds, and stop everything after 3 seconds. This way we don't waste our time waiting for tests to run for a random number of minutes.

The following tests all have the same problem and are fixed by this PR:

gasche commented 2 weeks ago

(Of course your review of #12579 is also very nice and possibly even more impactful, thanks!)