Open lukasz-mitka opened 1 year ago
Can you show a comparison with and without this setting?
cache gets dropped a lot, thus increasing build times.
The setting controls how the ccache directory is up and dowloaded from github, not ccache itself. This has nothing todo with cache hits and misses in ccache.
All your questions are answered and discussed in the relevant issues and PRs, see https://github.com/hendrikmuhs/ccache-action/pull/126 and https://github.com/hendrikmuhs/ccache-action/issues/117
As stated in https://github.com/hendrikmuhs/ccache-action/pull/126#issuecomment-1384816326 changing the default will break most existing workflows. This new setting is only useful for workflows that use their own logic for creating the cache key.
Can you show a comparison with and without this setting?
No, sorry. To busy to play around with it. But scenario is: multiple builds, up to 1GB for each build, multiple branches/PRs active at the same time. Doing series of commits to a single branch creates new cache for each push, overflowing the GitHub cache (10GB limit). Without timestamp existing caches are overwritten filling the limit more slowly. Keep in mind each branch has a separate cache regardless of the key used.
The setting controls how the ccache directory is up and dowloaded from github, not ccache itself. This has nothing todo with cache hits and misses in ccache.
yeap, I'm talking about GitHub cache. Once over limit it will be removed resulting in lost potential.
All your questions are answered and discussed in the relevant issues and PRs, see https://github.com/hendrikmuhs/ccache-action/pull/126 and https://github.com/hendrikmuhs/ccache-action/issues/117
Thank you for linking them. I missed them somehow. So to summarize: timestamp is a result of someone on the internet using it for no apparent reason and you copying it? Unless there's a real justification I suggest removing it.
changing the default will break most existing workflows
Probably it won't brake anything. Anyway, that's why I suggested adding it and releasing under new major version.
No need to get rude.
I am always open for suggestions. I am not willing to change something based on gut feelings. I gave you the answer, it clearly stated that overwriting with the same key has a race condition:
As stated in https://github.com/hendrikmuhs/ccache-action/pull/126#issuecomment-1384816326 changing the default will break most existing workflows. This new setting is only useful for workflows that use their own logic for creating the cache key.
The official documentation contains:
You cannot change the contents of an existing cache. Instead, you can create a new cache with a new key.
Nevertheless I merged the option to disable time-stamping for users that want more control and know what they are doing.
Sorry, wasn't trying to be rude, just direct.
based on gut feelings
IMO using timestamp seems to be a 'gut feeling' of someone. I'd love to be proven wrong.
Switched all my workflows to append-timestamp: false
, I'll let you know if I run into any issues.
I merged the option to disable time-stamping
I appreciate that!
You cannot change the contents of an existing cache.
Yes, it's being overwritten instead.
Final note: Please include timestamp info in main README.md
.
Happy to hear about your experience.
@vadz you indicated problems in https://github.com/hendrikmuhs/ccache-action/pull/126#issuecomment-1384816326, maybe you can share your experience, too.
I understand the documentation that every entry is immutable, further on the documentation describes that in case of an exact match, the creation of a new entry doesn't execute. But we want to create a new entry on every run, see Cache Hits and Misses.
Adding documentation and writing about this repeatedly coming up concern is still on my list. I did not find the time for it yet.
That's very interesting, but it works a bit differently than described. Cache get saved even after cache hit. Log extract:
# SETUP
Run hendrikmuhs/ccache-action@fba817f3c0db4f854d7a3ee688241d6754da166e
/usr/bin/docker exec f8abbfaf3bce46645f87ab7480cb1017857e7cb8633b18ff88a9f0b54924156c sh -c "cat /etc/*release | grep ^ID"
Restore cache
Received 243269632 of 859941342 (28.3%), 230.4 MBs/sec
Received 595591168 of 859941342 (69.3%), 282.0 MBs/sec
Received 855747038 of 859941342 (99.5%), 270.6 MBs/sec
Received 859941342 of 859941342 (100.0%), 149.7 MBs/sec
Cache Size: ~820 MB (859941342 B)
/usr/bin/tar -xf /__w/_temp/7c7db733-b2e9-4c18-b01e-9f15ce7a0aa8/cache.tgz -P -C /__w/repo/repo -z
Cache restored successfully
Restored from cache key "ccache-Linux-key-".
# POST
Post job cleanup.
/usr/bin/docker exec f8abbfaf3bce46645f87ab7480cb1017857e7cb8633b18ff88a9f0b54924156c sh -c "cat /etc/*release | grep ^ID"
ccache stats
Save cache using key "ccache-Linux-key-".
/usr/bin/tar --posix -cf cache.tgz --exclude cache.tgz -P -C /__w/repo/repo --files-from manifest.txt -z
Cache Size: ~1093 MB (1146236412 B)
Cache saved successfully
Maybe something has changed since January, but saving the cache under the same name definitely failed for me back then. I'll try to retest it again but won't be able to do it immediately.
FWIW I do agree that if not using timestamps works it should be the default.
It's been a month since I'm running 'no timestamp' in a couple of repos. No issues noticed and no complaints from devs.
@hendrikmuhs please consider reopening this issue and making the change.
Maybe something has changed since January, but saving the cache under the same name definitely failed for me back then. I'll try to retest it again but won't be able to do it immediately.
I just tested turning off appending of timestamps and immediately got:
Failed to save: Unable to reserve cache with key ..., another job may be creating this cache. More details: Cache already exists
This was the first job after turning it off so there was no old existing cache without timestamp. However, looking at repo/actions/caches I can see a (new) cache without timestamp. It seems it was save after all.
Is there some other way we can fix this - that the cache space is used up too quickly? Maybe keep using timestamps but remove old caches (same key, older timestamp) automatically? Is that doable?
Added: A final thought- could the error message be wrong? @lukasz-mitka Could you check the output of the "Post Load ccache" step? Maybe you have the error message all the time, but it still works?
Thanks for letting us know! Seems the issue is still present.
Is there some other way we can fix this - that the cache space is used up too quickly? Maybe keep using timestamps but remove old caches (same key, older timestamp) automatically? Is that doable?
I would be very surprised if githubs cache does not handle that internally. Usually a cache works LRU-based, older cache entries should get evicted automatically when the cache space is full and a new entry is pushed. I think we should not try to optimize on that level.
However ccache has a lot of options besides the upper size, e.g. it is possible to evict entries on age, change compression, etc.
The reason I think we could do better than the default automatic eviction, is that as developers we know more about the selected workflow. It's not always true that the oldest cache should be evicted first. In a typical PR workflow, you would like to keep the cache for the PR as long as possible while multiple old caches for the master / main branch has no value. The point is that we are using up the space so fast that PR caches that we want to preserve is evicted.
I wonder if a new "replace-cache: true" setting could work? By replace, I mean after the new cache is saved with the new timestamp, older caches are deleted. I assume you can't delete the last cache for the same reason it failed to save when not appending timestamps.
Caches are separated by branch, however the resource budget is per repository. I hoped to find the eviction logic in action/cache
, but that seems to be just the client component. So the exact eviction logic seems githubs secret(if anyone has other information, please let us know). Ideally it is not just deleting the oldest entry, but takes the branch name into account.
However we don't know and I don't like implementing something based on vague assumptions. It would be nice if someone who faces the problem can instrument it's CI, e.g. by listing the caches as described here. This would allow us to reverse-engineer the logic by seeing which caches github choose when evicting entries.
It still sounds wrong to me to try fix cache eviction from the client. When we have proof that something works wrong w.r.t. cache evictions we can also contact github to fix it server side.
FWIW ccache itself has some interesting options as well, e.g. the cache could be pruned by age after every run to keep the individual ccaches smaller.
Just for the record: We still struggle with this, that "wrong" caches are evicted, when there is a lot of activity in our mono repo. I think what happens is that long living PR loose their cache, because a lot of other (short lived) PRs are merged to master. We just spent our entire included minutes quota on two days. It normally takes over 20 ...
Assuming a lot of people use the PR workflow, fixing this should be useful for other people - not just us. To repeat myself: Is there some way we could avoid saving all these unnecessary caches for the main / master branch?
Added:
To be honest: I don't understand what is going on. The action loads a cache, so it's not evicted, but ccache reports 100% miss. Any clues?
@hansfn do you limit the cache size?
This might help to prevent eviction. In addition I wonder if the --evict-older-than
ccache option could help here. It should be possible to try this as part of your workflows and if it turns out useful integrate it into the action as another option.
The eviction itself isn't controlled by the action, so I don't see how this can be fixed in this repository.
To be honest: I don't understand what is going on. The action loads a cache, so it's not evicted, but ccache reports 100% miss. Any clues?
Is it possible that the compiler suite/binary changed due to an update of the base image?
Thanks a lot for replying, @hendrikmuhs. Yes, we limit cache size as much as we can. (Typically 2-300 MB.)
Is it possible that the compiler suite/binary changed due to an update of the base image?
No, we compile using our own Docker image.
Turns out for busy repos the default timestamp costs quite a lot since cache gets dropped a lot, thus increasing build times. Please consider setting
append-timestamp
tofalse
by default. I would expect a major release with it to save people any issues.Also what was the purpose of the timestamp? Does it give any benefit I'm missing?