GHA caching behaviour with many branches

mlr-org / mlr3book

Online version of Bischl, B., Sonabend, R., Kotthoff, L., & Lang, M. (Eds.). (2024). "Applied Machine Learning Using mlr3 in R". CRC Press.

https://mlr3book.mlr-org.com/

MIT License

254 stars 59 forks source link

GHA caching behaviour with many branches #751

Closed sebffischer closed 1 year ago

sebffischer commented 1 year ago

The way we have set up our CI can cause the same cache to be uploaded multiple times for different PRs:

Screenshot 2023-10-31 at 11-53-48 Pull requests · mlr-org_mlr3book

Once there are enough of these caches, the Linux-openml- cache from the main branch will be deleted to make room for the identical caches. This then means that the openml data is not cached, which can cause runs to time out.

I think we should only upload the cache in case there was no exact cache hit.

sebffischer commented 1 year ago

I think that the problem here is once again when we have too many branches. In this case, the branches modified solutions.qmd which change the hash for the cache. Because of this, there was no direct cache hit, and instead one of the restore keys was used. Because of this. In this case, the post-cache step of the github actions workflow uploads the identical cache with the same cache-keys. I don't think we need to do something about this, because this should happen very rarely