python / cpython

The Python programming language
https://www.python.org
Other
62.3k stars 29.93k forks source link

ccache GitHub action generates excessive caches #113858

Open encukou opened 8 months ago

encukou commented 8 months ago

The ccache action speeds up *nix builds considerably (though not enough that we could do without it). The issue with it is that every build uploads a new cache, filling up our quota very quickly. GitHub removes caches that weren't used recently, but ccache is aggressive enough that it's crowding out smaller, useful entries. Specifically, the action to mark stale-PRs (which only runs every once in a while), has its state removed before the next run.

Linked PRs

encukou commented 8 months ago

It's not possible to change the contents of an existing cache, so for this to be useful we need to save a new one occasionally. With https://github.com/python/cpython/pull/113859 it should save only on merged commits, not in-progress PRs. Hopefully that'll be enough.

hugovk commented 8 months ago

Another idea is to set a max-size, but the above PR looks like a good thing to try first πŸ‘

https://github.com/hendrikmuhs/ccache-action/blob/4cb35b2ff44ce57e1f5f5221e26bd8fefcfe41d0/action.yml#L15-L17

encukou commented 8 months ago

Some napkin math suggests that this should be enough, at least outside sprints. Which is good enough :) I don't think we want to delete the caches manually, so we'll be at GitHub's 10GiB limit regardless of max-size.

I plan to merge tomorrow morning (EU time) so I have the whole day to watch the effect (and revert if needed).

erlend-aasland commented 8 months ago

Is there more to do here? (I accidentally clicked "Close with comment" iso. "Comment"; so, please reopen if there is more to do here πŸ˜„)

encukou commented 8 months ago

Yup, there is :)

encukou commented 8 months ago

Please keep the issue open for another possible iteration. This can't really be tested locally :)

erlend-aasland commented 8 months ago

Please keep the issue open for another possible iteration. This can't really be tested locally :)

It's easy to forget to close it; how about re-opening if another iteration is needed?

encukou commented 8 months ago

But if I forget to close the issue, that'd mean I also forgot to check that the problem is fixed. We'll know in a day or two.

encukou commented 8 months ago

Currently, the oldest caches are from about 2 days ago, and tase up about 30GB. β€œNew” ccaches (from main) are 4.7GiB, β€œold” ccaches are 24.8G, others are 1.5G. On a few recent PRs, the main build has a 90+ % cache hit rate.

Looks like if I backport the 2 PRs, 10G will be enough for about two days of caches. We could go further by writing an Action or bot to remove old ccache, but IMO GitHub's automatic cleanup will be enough for now.

erlend-aasland commented 8 months ago

You mean megabytes, not gigabytes, right?

encukou commented 8 months ago

Gigabytes. These are sums of all such caches, not the individual entries.

encukou commented 8 months ago

I'd like to keep this open for a few more days, to check if everything is in order.

encukou commented 7 months ago

Unfortunately, 10G of caches is now only enough for about 12 hours. Much better than before, but, not enough for a task that runs every 24 hours.

The next step is writing code that uses GHA API to clear older ccaches. I'll keep it on my TODO list, but I'm not sure when I'll get to it.

hugovk commented 7 months ago

The cache size limit is now 200 MB.

Do we need it so big? On my fork, they were between 32-75 MB.

I guess my questions are:

If we halved it, that might be enough for around 24 hours, but I expect we'd still get some blips that mean the daily cron task loses its cache and has to start afresh.

encukou commented 7 months ago

Yeah, the current approach could barely get us to where we want to be. IMO it's time to cut the losses and try the other approach. IMO, the best direction is to put branch names in the cache names, and write logic to clear all ccaches but the freshest one for each branch.

hugovk commented 7 months ago

Sure, sounds good.

From the other side of the fence, we could adjust the stale.yml cron to run more often than daily: https://github.com/python/cpython/blob/main/.github/workflows/stale.yml

hugovk commented 6 months ago

Please see https://github.com/python/cpython/pull/116636 to process the stale issues twice per day, which hopefully will keep its cache file fresh and prevent deletion.

hugovk commented 6 months ago

Twice per day isn't enough, let's try 4x: https://github.com/python/cpython/pull/116857.

hugovk commented 6 months ago

4x doesn't help either. The problem now isn't that the stale cache has been deleted (for example, there's ~30 hours of caches available right now), but https://github.com/actions/stale only checks the first page of results from the cache API!

For example, https://api.github.com/repos/python/cpython/actions/caches only has ~3 hours of caches at the moment.

Upstream bug report is at https://github.com/actions/stale/issues/1136