mozilla / sccache

Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
Apache License 2.0
5.72k stars 540 forks source link

`sccache` under utilizes available resources #582

Open padenot opened 4 years ago

padenot commented 4 years ago

We've set up sccache for building mozilla-unified in the Paris office on a few machines where we had icecc set up. It seems to work, but it's not optimal. We're at around 8:30 or so minute for a clobber build, 3:30 when the cache is perfectly primed (mach build && mach clobber && mach build type situation, but in practice this never happens, the cache hits quite rarely because the cache key is so complex). Build around 6 minutes are typical. For reference, a plain build on this machine (without icecc / sccache) is 7:30.

We have clobber build time for a debug non-opt build of around 4:30-5:00 with icecc (we have 4 machines with i9-7940X, or better, plus other machines), with 100 to 150 parallel jobs (roughly the number of threads in this cluster). This is without any caching whatsoever, and is the upper bound of any build (since it does not depend on any cache hit rate, etc.).

Casually looking at htop on various machines that are part of the cluster, while a build is running using sccache, we see a dramatic under utilization of resources, and the sccache process pinning a core (and a bit more, sometimes two), which seems to be the bottleneck.

All the caches are local on NVMe drives (roughly 3GB/s read/write). We've tried sharing the cache using a redis instance, this makes the build roughly twice as slow.

froydnj commented 4 years ago

Casually looking at htop on various machines that are part of the cluster, while a build is running using sccache, we see a dramatic under utilization of resources, and the sccache process pinning a core (and a bit more, sometimes two), which seems to be the bottleneck.

Can you capture a perf trace or somesuch? Are the client machines just not executing jobs to the fullest extent of their capacity, or the jobs that are executing on them are spending way too much time in sccache itself?

chmanchester commented 4 years ago

Without a single machine giving rust priority we might be suffering from some of the known bad parallelism/prioritization of rust in the build.

I'll do some more tests later in the week, but I'll note also it's been ~years since I was able to do a full build in under 10 minutes, so I wonder if there's another factor here I'm not aware of.

chmanchester commented 4 years ago

I was able to observe under-utilization of servers in the SF cluster. We don't seem to be able to get jobs to them fast enough. I haven't pinpointed the bottleneck exactly, but the local daemon is doing a lot of work and spending a lot of time preprocessing -- https://github.com/mozilla/sccache/pull/545 should help with that. Other than that we're spending a lot of time compressing and hashing, both of which we can probably make faster.

Limiting rust to the local machine is slower, but only by a bit. We should reduce the overhead of distribution as much as possible, but we should probably have an option to allow the local machine to take some jobs as well.

luser commented 4 years ago

It'd be a fairly big chunk of work but https://github.com/mozilla/sccache/issues/558 ought to improve things in terms of spending time compressing files.

froydnj commented 4 years ago

I experimented with this some today and I found that, unlike icecc, running sccache-dist server and sccache --start-server on the machine driving the compile was basically terrible: jobs almost never got distributed.

Running sccache --start-server on the machine driving the compile and forcing all the jobs to go to a single remote machine performed very well. (I didn't try with two remote machines because sadly my kernel is too old on the other machine; maybe I'll try upgrading it.) Load on the local machine from sccache was about 3 or 4 CPUs worth, which seems shockingly high. The lone remote was not quite saturated, though, so there's probably some room for improvement.