Closed huonw closed 2 years ago
Hey, yep, CI caches are not very easy to use effectively for the reasons you state. Cascading hashes are the way to squeeze something out of it, but it still doesn't solve the problem of the underlying directories (the cache values) growing without bound, until the save and restore become intolerably slow. And manually computing cache keys like that is a mess.
Really the best way to work with Pants is (as your # FIXME implies) a Pants-aware remote cache, which caches at a very fine-grained resolution (every process that runs is cached), and on the same hashes Pants already generates for every process for local caching.
Shameless plug: We at Toolchain (https://toolchain.com/), the lead sponsors of Pants, offers such a cache as a SaaS product. Happy to set you up with a trial!
Feel free to reach out to me on Pants Slack if you want to chat about this option.
Should this be closed as answered?
This tripped me up and had our system doing a bunch of pointless cache downloading and surprisingly slow builds that gave a poor early impression of pants (and took me a while to diagnose).
So, from my perspective, it'd be good to do at least one of:
key: pants-${{ hashFiles("**") }}
with restore-key: pants-
key
line specifically mentioning that the key is unlikely to result in useful behaviour and will need changing(I'd love 1, especially the 'easy' part 😄 I'm in discussion with @benjyw on Slack, thanks for the tip.)
Thanks for the update. It's definitely a good idea to have the demo repos demonstrate remote caching, since that is a strength of Pants, and we should not demonstrate obviously bad CI cache practices...
Let's leave this issue open until we resolve that.
Assigning this to @chrisjrn who is working on it.
See https://github.com/pantsbuild/example-python/pull/104 for improved comments and example.
Re-reading your initial comment @huonw , I missed the part whert the cache is never updated at all if the key doesn't change. But reading the GHA cache docs confirms this, and it makes sense that the cache should be immutable.
Addressed in #104 and https://github.com/pantsbuild/pants/pull/16503
Thanks for opening this issue!
Looks great, thanks for the explanation, that definitely clarifies things for me!
We've been setting up our pants system and cribbing off this repo, so thanks for publishing it. However, we noticed that our CI was quickly getting much slower and discovered it was due to the use of a very simple cache key:
${{ runner.os }}-
.It seems like if there's an exact match for a key, GitHub Actions will restore the cache, and never update it even if the cached files changed during the run. The "Post Run" step for
actions/cache
saysCache hit occurred on the primary key Linux-, not saving cache.
. This means that the first build with the cache will seed it, and all future builds will reuse that old one, and that cache will get less useful slowly (or not so slowly, in our case 😄 ) and builds longer.We're experimenting with an alternative that incorporates progressively more detailed hashes:
(This will result in... a lot of cache usage for a reasonably large project, but will be better than caches that are barely useful.)