ocurrent / solver-service

An OCluster service for solving opam dependencies
Apache License 2.0
12 stars 7 forks source link

Add --main-does-work hack #76

Closed talex5 closed 1 month ago

talex5 commented 1 month ago

OCaml currently requires all domains to synchronise for minor GCs. Idle domains must be woken by the OS, which is slow. As a work-around, this allows also running jobs in the main domain so it doesn't become idle. This makes the service less responsive (e.g. at reporting progress messages) but increases throughput.

See https://roscidus.com/blog/blog/2024/07/22/performance-2/

/cc @mtelvers

talex5 commented 1 month ago

On my 8-core machine, this is only a modest improvement:

image

This is generated by running e.g.

dune exec -- solver-service run-service \
                      --cache-dir=./cache \
                      --capnp-secret-key-file=server.pem \
                      --capnp-listen-address=tcp:127.0.0.1:7000 \
                      --cap-file=./capnp-secrets/solver.cap \
                      --internal-workers=0 --main-does-work

dune exec -- ./stress/stress.exe service ./capnp-secrets/solver.cap --count=3

Oddly, the case of 0 extra domains and doing everything in the main domain is consistently slightly slower, which is highly unexpected to me!

talex5 commented 1 month ago

This doesn't seem very useful, and having domain 0 respond slowly to requests from other domains is problematic. I've opened an issue to fix the root cause in OCaml instead: https://github.com/ocaml/ocaml/issues/13358

But, while looking at traces from this, I noticed that the main domain spawning git subprocesses was slowing everything down. #79 fixes that and gives a much larger speed-up!