operator-framework / operator-controller

Apache License 2.0
29 stars 47 forks source link

Improve GitHub actions caching by taking bingo modules into account #323

Open m1kola opened 11 months ago

m1kola commented 11 months ago

Need to add cache-dependency-path configuration option into setup-go GitHub action to improve caching.

Background: we introduced bingo in OLMv1 repos for managing tools and it created a bunch of modules in the .bingo directory in each repo. It works well, but on every job we download these dependencies from internet instead of using cache. This happens becuase setup-go GitHub action only looks into project root for go.sum and builds cache based on it.

cache-dependency-path needs to be added to all OLMv1 repos and it should speed up our builds (we've seen decrease in build time of ~3 minutes in https://github.com/operator-framework/olm-docs).

m1kola commented 11 months ago

This requires more effort than I initially anticipated becuase in OLVv1 repos have multiple workflows. If we simply add something like this to jobs (what I did in olm-docs):

cache-dependency-path: |
  go.sum
  .bingo/**.sum

If we do the above - workflows will be racing to create cache. Once cache is created - they all will be re-using it. Sounds good, but doesn't work quite well given that each workflows installs only a subset of bingo managed tools. As a result - cache will be incomplete and only one workflow (the one who managed to create the cache) will have full benefits of the cache.

We might need to do one of the following:

  1. Make sure that all workflows install all the tool. This way we will have a complete cache in all of the workflows. But the downside is that cold path becomes slower for all of them too. And workflow defenitions will contain steps which are not necessary for the workflow jobs.
  2. We can use a local composite action to deal with readability (unnecessary steps will be hidden in the composite action). Bit we still run the steps (cold path problem still exists).
  3. We can consider running a small workflow to warm up cache before running all other jobs. workflow_run trigger might help with that. This way we can make workflows share the same cache and keep them clean. But other implications are not clear to me at the moment.
  4. Get rid of bingo and include tools into something like tools.go (example). This way we will have everything in the main go.mod and, I think, all of our workflows will be downloading all the modules. This will require Makefile to be updated. This might be a good option given that bingo seems to be not very actively maintained.

The 3rd option theoretically sounds good, but implications are not very clear. Need more time to research it.

ncdc commented 11 months ago

Can we change the cache key per workflow?

m1kola commented 11 months ago

@ncdc we can look into this, but there is 10 gig limit for all caches per repo and we are already nearing it. Limit seems to be soft-ish (last week I manually deleted few older entires becasue we were over 10 GB), but I suspect GitHub starts garbage collecting when you are certain amount over the limit.

If we give each workflow own key - it will worsen the situation and we most likely will exceed the limit (our cache is 360-410 mb). So there is a chance that we will be running cold more often due to caches evictions.

Also we will likely have to manage cache ourselves using actions/cache actions instead of relying on actions/setup-godoing it most of the work for us (it uses actions/cache under the hood, but deals with hashing sum fields, etc). It is not a big deal, just another additional step in each workflow.

I'm leaning towards number 3 and 4 at the moment.

ncdc commented 11 months ago

I am fine with the amount of time the jobs currently take. Do you think it's a problem?

m1kola commented 11 months ago

It is not a problem (for now at least), but I was hoping to improve it quickly. Quickly didn't work - so I deprioritised this in favour of other work.

I'm still hoping to come back to this at some point becuase if we can reduce wait by few minutes - that would be great. I, for example, rely heavily on CI: I do not run all the jobs locally before pushing. I normally let GitHub actions do it for me in a draft PR. Faster feedback is always good in this use case.