haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.6k stars 689 forks source link

Our CI suffers from cache thrashing #9850

Open andreasabel opened 5 months ago

andreasabel commented 5 months ago

At this point in time, GH has evicted all our CI caches of our default branch except one:

$ gh actions-cache list --branch master
Showing 1 of 1 cache entries in haskell/cabal

Linux-8.8.4-1c1230ca228cc03a9ee68166243af358ab3992fc  237.45 MB  refs/heads/master  2 hours ago

$ date
Wed Mar 27 18:12:40 CET 2024

This effectively means our caching isn't working in a good way. Because if master is bare of caches, every PR has to start building all the dependencies from scratch.

Caching has to be engineered so that the latest master caches remain given the expected amount of PRs that are active in parallel (and generate their own caches).

In PRs #9845 and #9849 I have tried to do sensible cache engineering so that a new cache is only written if dependencies actually got updated. However, our main workhorse validate.yml simply creates a new cache on every run (for each OS and GHC it runs) by including the github.sha in the cache key.
https://github.com/haskell/cabal/blob/7e085faf42a0cdb3ba19e3756590bc1f952de05b/.github/workflows/validate.yml#L111 With each cache around 300MB, you can do simple arithmetics now to see that our 10GB limit is busted all the time (even 30 GB wouldn't make much of a difference).

E.g. these are the 46 caches by one currently active PR:

$ gh actions-cache list -L 100 --branch refs/pull/9718/merge
Showing 46 of 46 cache entries in haskell/cabal

Linux-8.8.4-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      257.98 MB  refs/pull/9718/merge  6 hours ago
macOS-8.6.5-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      212.56 MB  refs/pull/9718/merge  6 hours ago
macOS-9.2.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      204.08 MB  refs/pull/9718/merge  6 hours ago
macOS-8.10.7-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                     209.47 MB  refs/pull/9718/merge  6 hours ago
macOS-9.0.2-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      206.74 MB  refs/pull/9718/merge  6 hours ago
macOS-8.8.4-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      212.92 MB  refs/pull/9718/merge  6 hours ago
macOS-9.4.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      186.53 MB  refs/pull/9718/merge  6 hours ago
macOS-9.6.3-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      174.85 MB  refs/pull/9718/merge  6 hours ago
macOS-9.8.1-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      175.46 MB  refs/pull/9718/merge  6 hours ago
Windows-9.2.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                    276.43 MB  refs/pull/9718/merge  6 hours ago
Windows-9.0.2-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                    270.71 MB  refs/pull/9718/merge  6 hours ago
Windows-9.4.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                    241.21 MB  refs/pull/9718/merge  6 hours ago
Windows-9.6.3-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                    224.59 MB  refs/pull/9718/merge  6 hours ago
Windows-9.8.1-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                    223.40 MB  refs/pull/9718/merge  7 hours ago
Linux-9.2.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      229.11 MB  refs/pull/9718/merge  7 hours ago
Linux-8.10.7-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                     227.63 MB  refs/pull/9718/merge  7 hours ago
Linux-8.8.4-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      227.29 MB  refs/pull/9718/merge  7 hours ago
Linux-9.0.2-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      227.26 MB  refs/pull/9718/merge  7 hours ago
Linux-8.6.5-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      279.99 MB  refs/pull/9718/merge  7 hours ago
Linux-9.4.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      216.40 MB  refs/pull/9718/merge  7 hours ago
Linux-9.8.1-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      177.89 MB  refs/pull/9718/merge  7 hours ago
Linux-9.6.3-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      177.28 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-8.10.7-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3  279.37 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.0.2-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3   270.39 MB  refs/pull/9718/merge  7 hours ago
Linux-9.2.8-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3                      326.73 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.4.8-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3   267.49 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.2.8-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3   248.42 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.8.1-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3   201.04 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.6.4-20221115-aa5d3e49b1736ef89bfbc4f26c7875ad653264d3   202.54 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.8.1-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97   201.04 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.2.8-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97   248.41 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.6.4-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97   202.55 MB  refs/pull/9718/merge  7 hours ago
Linux-fix-whitespace-0.1                                                  3.01 MB    refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.4.8-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97   267.49 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.0.2-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97   270.39 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-8.10.7-20221115-5d6f3f3a6464c1ff23b625ac924fbb125d580a97  279.36 MB  refs/pull/9718/merge  7 hours ago
bootstrap-Linux-9.8.1-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b   201.06 MB  refs/pull/9718/merge  8 hours ago
bootstrap-Linux-9.6.4-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b   202.54 MB  refs/pull/9718/merge  8 hours ago
bootstrap-Linux-8.10.7-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b  279.37 MB  refs/pull/9718/merge  8 hours ago
bootstrap-Linux-9.2.8-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b   248.41 MB  refs/pull/9718/merge  8 hours ago
bootstrap-Linux-9.4.8-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b   267.48 MB  refs/pull/9718/merge  8 hours ago
bootstrap-Linux-9.0.2-20221115-016216c7e7e7f383ddd2c5497fa737640f0b792b   270.39 MB  refs/pull/9718/merge  8 hours ago
Linux-9.2.8-016216c7e7e7f383ddd2c5497fa737640f0b792b                      326.72 MB  refs/pull/9718/merge  a day ago
bootstrap-Linux-8.10.7-20221115-b171551765807a9594f9724ffec82dd49457491e  279.36 MB  refs/pull/9718/merge  a day ago
bootstrap-Linux-9.6.4-20221115-b171551765807a9594f9724ffec82dd49457491e   202.55 MB  refs/pull/9718/merge  a day ago
bootstrap-Linux-9.8.1-20221115-b171551765807a9594f9724ffec82dd49457491e   201.04 MB  refs/pull/9718/merge  a day ago

These alone bust the 10GB limit.

Including github.sha in the cache key is a nice solution to cache aging for small projects that have effectively infinite cache space (e.g. if your caches are 10MB and you have 10GB available), but for sizeable project like cabal this isn't a solution.
For Agda I spent days in cache engineering so that the goals are met (staying in the limit even with a couple of parallel PRs).

The current caching philosophy seems to originate from https://github.com/haskell/cabal/pull/7952/files#r807953232

andreabedini commented 5 months ago

@andreasabel

The problem is that cabal does a fair bit of caching (in the form or the store) and GHA caches cannot be updated. The "trick" I mention in the comment you link is just what GitHub recommends to do to update a cache. See https://github.com/actions/cache/blob/main/tips-and-workarounds.md#update-a-cache.

The line right after the one you linked is

          key: ${{ runner.os }}-${{ matrix.ghc }}-${{ github.sha }}
          restore-keys: ${{ runner.os }}-${{ matrix.ghc }}-

If a cache with the current gha does not exist (as is expected and intentional), GHA will pick the most recent one matching the restore key and create a new one with the current sha. The older cache entries will fall in disuse and will eventually be evicted.

If a cache key is present, GHA will restore that but it won't update it or save it again.

Of the top of my head, an improvement would be to use separate caches for the store and dist-newstyle. They are different thing that change at different speed.

What are you ideas?

PS: I don't think we can protect against GitHub (or someone) deleting all the caches. We just do too many builds. We should cull them.

andreasabel commented 5 months ago

Yes, I understand the mechanism with the ${{ github.sha }}, it is the theoretically best solution but only if this does not make you run out of cache space. Since our caches are big, we would need the order of magnitute of 100 GB or 1TB cache store rather than the 10 GB GitHub offers us.

I suggest the following strategy: Only write new big caches when CI runs on master. When we are running on a branch/PR, just reuse the caches from master. This way, we always have a recent enough cache to restore a significant portion of the dependencies, at least for the average PR.

In practice, we separate actions/cache/restore from actions/cache/save. The latter is only executed if the PR runs on master. (I have never implemented this strategy, so I don't know of the top of my head how the check for master would look like, but I believe/hope this is possible.)

andreabedini commented 4 months ago

In practice, we separate actions/cache/restore from actions/cache/save. The latter is only executed if the PR runs on master. (I have never implemented this strategy, so I don't know of the top of my head how the check for master would look like, but I believe/hope this is possible.)

It sounds like a good plan. I don't think we had separate actions/cache/restore and actions/cache/save at the time.

I still think store and dist-newstyle deserve separate caches. While the store can follow what you propose; dist-newstyle could be cached per branch? Would that make sense?