pantsbuild / example-python

An example repo to demonstrate Python support in Pants
Apache License 2.0
102 stars 77 forks source link

CI's cache key results in a single snapshot being created in first build and never updated #102

Closed huonw closed 2 years ago

huonw commented 2 years ago

We've been setting up our pants system and cribbing off this repo, so thanks for publishing it. However, we noticed that our CI was quickly getting much slower and discovered it was due to the use of a very simple cache key: ${{ runner.os }}-.

It seems like if there's an exact match for a key, GitHub Actions will restore the cache, and never update it even if the cached files changed during the run. The "Post Run" step for actions/cache says Cache hit occurred on the primary key Linux-, not saving cache.. This means that the first build with the cache will seed it, and all future builds will reuse that old one, and that cache will get less useful slowly (or not so slowly, in our case 😄 ) and builds longer.

We're experimenting with an alternative that incorporates progressively more detailed hashes:

      # FIXME: use a remote cache
      - uses: actions/cache@v3
        with:
          path: |
            ~/.cache/pants/setup
            ~/.cache/pants/lmdb_store
            ~/.cache/pants/named_caches
          key: >-
            pants-${{ hashFiles('pants.toml') }}-${{ hashFiles('3rdparty/**') }}-${{ hashFiles('**/BUILD') }}-${{ hashFiles('source/**') }}
          # if we don't have an exact match, search for a similar one (the hashes are vaguely
          # ordered from slowest to fastest changing, i.e. longer prefix match ≈ more sharing)
          restore-keys: |
            pants-${{ hashFiles('pants.toml') }}-${{ hashFiles('3rdparty/**') }}-${{ hashFiles('**/BUILD') }}-
            pants-${{ hashFiles('pants.toml') }}-${{ hashFiles('3rdparty/**') }}-
            pants-${{ hashFiles('pants.toml') }}-
            pants-

(This will result in... a lot of cache usage for a reasonably large project, but will be better than caches that are barely useful.)

benjyw commented 2 years ago

Hey, yep, CI caches are not very easy to use effectively for the reasons you state. Cascading hashes are the way to squeeze something out of it, but it still doesn't solve the problem of the underlying directories (the cache values) growing without bound, until the save and restore become intolerably slow. And manually computing cache keys like that is a mess.

Really the best way to work with Pants is (as your # FIXME implies) a Pants-aware remote cache, which caches at a very fine-grained resolution (every process that runs is cached), and on the same hashes Pants already generates for every process for local caching.

Shameless plug: We at Toolchain (https://toolchain.com/), the lead sponsors of Pants, offers such a cache as a SaaS product. Happy to set you up with a trial!

benjyw commented 2 years ago

Feel free to reach out to me on Pants Slack if you want to chat about this option.

Eric-Arellano commented 2 years ago

Should this be closed as answered?

huonw commented 2 years ago

This tripped me up and had our system doing a bunch of pointless cache downloading and surprisingly slow builds that gave a poor early impression of pants (and took me a while to diagnose).

So, from my perspective, it'd be good to do at least one of:

  1. remove the caching entirely, have this repo use remote caching, once it's easy (e.g. your SaaS or https://github.com/pantsbuild/pants/issues/11149)
  2. set up the demo with cascading keys (or at least something like key: pants-${{ hashFiles("**") }} with restore-key: pants-
  3. add a comment near the key line specifically mentioning that the key is unlikely to result in useful behaviour and will need changing

(I'd love 1, especially the 'easy' part 😄 I'm in discussion with @benjyw on Slack, thanks for the tip.)

benjyw commented 2 years ago

Thanks for the update. It's definitely a good idea to have the demo repos demonstrate remote caching, since that is a strength of Pants, and we should not demonstrate obviously bad CI cache practices...

benjyw commented 2 years ago

Let's leave this issue open until we resolve that.

benjyw commented 2 years ago

Assigning this to @chrisjrn who is working on it.

benjyw commented 2 years ago

See https://github.com/pantsbuild/example-python/pull/104 for improved comments and example.

benjyw commented 2 years ago

Re-reading your initial comment @huonw , I missed the part whert the cache is never updated at all if the key doesn't change. But reading the GHA cache docs confirms this, and it makes sense that the cache should be immutable.

benjyw commented 2 years ago

Addressed in #104 and https://github.com/pantsbuild/pants/pull/16503

Thanks for opening this issue!

huonw commented 2 years ago

Looks great, thanks for the explanation, that definitely clarifies things for me!