mozilla / sccache

Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
Apache License 2.0
5.85k stars 552 forks source link

More coarse grind caching strategy on sccache #1762

Open NobodyXu opened 1 year ago

NobodyXu commented 1 year ago

Currently, sccache caches each compilation result as a separate artifacts on GHA, some might just be a few KB. However GHA is clearly designed for storing much larger artifacts and is not designed to be used like s3 object pool.

As #1756 has pointed out, sccache GHA backend is quite slow and sometimes does not yield any benefit compared to no cache at all.

I wonder if it's possible to group multiple compilation results together and upload to GHA as one artifact?

For example, sccache now uses keys like sccache/d/d/e/dde57313a931b3fd3e52afe3e6ccafd2f79e52f7fdc60bbb0db8357bc397c320 for each object.

Can we instead store any artifact with prefix sccache/d/d/e as one artifacts and give it cache key like sccache2/d/d/e/2023-05-08?

luser commented 1 year ago

This would break a bunch of assumptions around how sccache treats cache storage internally. Notably, it would need to have some sort of intermediate cache where it stores/fetches the rolled-up cache entries. But more importantly, I don't think this is a good idea! If you want this, just configure GitHub Actions to cache a directory and use that as sccache's cache. That should be functionally equivalent to what you're proposing. I don't think that's going to be great in practice, however, since you'll wind up fetching and storing lots of useless cache data for every build.

I think a better approach would be to ask GitHub if it's possible to make the GHA cache work better with small objects.

NobodyXu commented 1 year ago

I think a better approach would be to ask GitHub if it's possible to make the GHA cache work better with small objects.

Yeah improving the GHA cache will be a win for every CI running on GHA.

Fraccaman commented 1 year ago

instead of using sccache with GHA directly, would it be possible to start sccache with a local storage and then save the $SCCACHE_DIR folder under a default key (like a static cache key) ?

NobodyXu commented 1 year ago

Yes, though that would make sharing cache between builds with slightly different Cargo.lock/Cargo.toml a bit harder and would be more duplicate.

That's why I switched back to using https://github.com/Swatinem/rust-cache on GHA since this is basically what it does and it could also cache build.rs and proc-macro crates.

sylvestre commented 1 year ago

@ Xuanwo is that opendal can do ? :)

luser commented 1 year ago

instead of using sccache with GHA directly, would it be possible to start sccache with a local storage and then save the $SCCACHE_DIR folder under a default key (like a static cache key) ?

This would work but feels very inefficient. sccache is intended to work with object storage so that it only has to fetch objects for cache hits. If you store and fetch the entire cache contents for every build you're transferring a lot of unnecessary data.

Xuanwo commented 1 year ago

@ Xuanwo is that opendal can do ? :)

It's more like a logic similar to sccache that is not related to storage. We can store all the caches on local storage and then use opendal's API to upload them to GHAC.

If you store and fetch the entire cache contents for every build you're transferring a lot of unnecessary data.

Yes, I share the same concern. I'm not sure if we can find a suitable batch strategy that allows users to store cache in batches. For example, enabling us to upload 10 objects together if they always updated at the same time.