Open padenot opened 4 years ago
Casually looking at
htop
on various machines that are part of the cluster, while a build is running usingsccache
, we see a dramatic under utilization of resources, and thesccache
process pinning a core (and a bit more, sometimes two), which seems to be the bottleneck.
Can you capture a perf
trace or somesuch? Are the client machines just not executing jobs to the fullest extent of their capacity, or the jobs that are executing on them are spending way too much time in sccache
itself?
Without a single machine giving rust priority we might be suffering from some of the known bad parallelism/prioritization of rust in the build.
I'll do some more tests later in the week, but I'll note also it's been ~years since I was able to do a full build in under 10 minutes, so I wonder if there's another factor here I'm not aware of.
I was able to observe under-utilization of servers in the SF cluster. We don't seem to be able to get jobs to them fast enough. I haven't pinpointed the bottleneck exactly, but the local daemon is doing a lot of work and spending a lot of time preprocessing -- https://github.com/mozilla/sccache/pull/545 should help with that. Other than that we're spending a lot of time compressing and hashing, both of which we can probably make faster.
Limiting rust to the local machine is slower, but only by a bit. We should reduce the overhead of distribution as much as possible, but we should probably have an option to allow the local machine to take some jobs as well.
It'd be a fairly big chunk of work but https://github.com/mozilla/sccache/issues/558 ought to improve things in terms of spending time compressing files.
I experimented with this some today and I found that, unlike icecc
, running sccache-dist server
and sccache --start-server
on the machine driving the compile was basically terrible: jobs almost never got distributed.
Running sccache --start-server
on the machine driving the compile and forcing all the jobs to go to a single remote machine performed very well. (I didn't try with two remote machines because sadly my kernel is too old on the other machine; maybe I'll try upgrading it.) Load on the local machine from sccache was about 3 or 4 CPUs worth, which seems shockingly high. The lone remote was not quite saturated, though, so there's probably some room for improvement.
We've set up
sccache
for buildingmozilla-unified
in the Paris office on a few machines where we hadicecc
set up. It seems to work, but it's not optimal. We're at around 8:30 or so minute for a clobber build, 3:30 when the cache is perfectly primed (mach build && mach clobber && mach build
type situation, but in practice this never happens, the cache hits quite rarely because the cache key is so complex). Build around 6 minutes are typical. For reference, a plain build on this machine (withouticecc
/sccache
) is 7:30.We have clobber build time for a debug non-opt build of around 4:30-5:00 with
icecc
(we have 4 machines with i9-7940X, or better, plus other machines), with 100 to 150 parallel jobs (roughly the number of threads in this cluster). This is without any caching whatsoever, and is the upper bound of any build (since it does not depend on any cache hit rate, etc.).Casually looking at
htop
on various machines that are part of the cluster, while a build is running usingsccache
, we see a dramatic under utilization of resources, and thesccache
process pinning a core (and a bit more, sometimes two), which seems to be the bottleneck.All the caches are local on NVMe drives (roughly 3GB/s read/write). We've tried sharing the cache using a redis instance, this makes the build roughly twice as slow.