mozilla / sccache

Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
Apache License 2.0
5.75k stars 544 forks source link

Content-addressed cache? #964

Open nikclayton opened 3 years ago

nikclayton commented 3 years ago

One of the caveats in the documentation is:

Absolute paths to files must match to get a cache hit. This means that even if you are using a shared cache, everyone will have to build at the same absolute path (i.e. not in $HOME) in order to benefit each other.

This is a problem for single-developer workflows that use git worktree. With git worktree I can check out multiple branches of the same repository to distinct local directories. But because they're distinct, there's no cache benefit.

For example, I have a complex project with hundreds of dependencies. Just building from a clean checkout takes about 3 minutes.

So if I do:

% cd /my/repo/root
% git worktree -b branch1 /my/worktree/branch1
% cd /my/worktree/branch1
% cargo build

This takes 3 minutes.

If I then do

% cd /my/repo/root
% git worktree -b branch2 /my/worktree/branch2
% cd /my/worktree/branch2
% cargo build

this also takes 3 minutes, and it's re-building everything I just built for branch1.

I went poking through the code, and it looks like https://github.com/mozilla/sccache/blob/master/src/compiler/rust.rs#L1458-L1462 is where the file names (source_files) are included in the computation of the hash key.

But earlier in that function, the file contents are also included in the hash key computation. So I think the filename is redundant here.

I changed that line in a local build to ignore source_files:

let inputs = abs_externs.into_iter().chain(abs_staticlibs).collect();

and now the example I gave above takes 3 minutes for a build with a cold cache (the branch1 example), but the branch2 example now takes 1m32s, because it's able to reuse the cached build artifacts.

And all the sccache tests pass with this change.

I imagine it can't be this simple to fix the "Absolute paths must match" restriction and use sccache as a semi- content-addressed cache, and there's some subtlety that I've missed.

Or is that all that's necessary?

I couldn't find anything else that discussed this -- if this has already been covered in another forum and I couldn't find it there's no need to repeat the issues here, just link to them and I'll go and do some more reading...

luser commented 3 years ago

There's some discussion in #35. Mostly it comes down to whether the pathnames wind up in the output files. If debug info is enabled, then they generally do. Compiler options that allow path remapping could be used to mitigate this.