gradle / gradle-build-action

Execute your Gradle build and trigger dependency submission
https://github.com/marketplace/actions/gradle-build-action
MIT License
678 stars 97 forks source link

Add support for periodically clearing the dependencies cache. #977

Closed yogurtearl closed 9 months ago

yogurtearl commented 11 months ago

I would like to be able to clear out the dependencies cache every N days.

bigdaz commented 11 months ago

Your current options are:

Does either of these satisfy your requirements? If not, can you please explain your use case?

yogurtearl commented 11 months ago

I don't think either of those will fit my use case.

If a specific dependency version gets quarantined (i.e. blocked at the maven repo proxy), I want the build to fail within some limited time frame ("N days") after it is quarantined.

Currently, if I understand how it works, a quarantined dependency could survive in the cache indefinitely and the build will never re-fetch it from the server.

One way of doing this would be to have the Julian Day divided by N (integer division) included as part of the cache key for dependencies, so the cache key changes every N days, even if nothing else changed.

I want the build to automatically fail on quarantined dependencies no more than N days after the deps was first quarantined, in an automated way, with no additional action needed by the developers.

Enable cache-cleanup so that any unused files are purged from Gradle User Home. This should include removing any dependency files that weren't referenced in the current workflow step.

The dependency is always used so this wouldn't cause it to be cleaned up.

Periodically enable cache-write-only so that you start the workflow with a clean Gradle User Home

After cache-write-only was disabled again, I don't think this would prevent builds from accessing an older cache from a previous build before cache-write-only was enabled .

bigdaz commented 11 months ago

So your goal is to verify that all of the dependencies in the build are still available in the remote repository? Then I think the best solution would be to periodically execute a build using --refresh-dependencies. This will be the most efficient, since it will use HTTP HEAD to check if the artifact in the repository exactly matches what is in the cache, and will only re-download if it's changed. But this action WILL fail if the dependency no longer exists in the remote repository.

bigdaz commented 11 months ago

After cache-write-only was disabled again, I don't think this would prevent builds from accessing an older cache from a previous build before cache-write-only was enabled .

When you configure a GitHub Actions Job with cache-write-only, then end result is a new cache entry that will be used by all subsequent builds for that workflow Job. The older cache entry won't be used.

yogurtearl commented 11 months ago

When you configure a GitHub Actions Job with cache-write-only, then end result is a new cache entry that will be used by all subsequent builds for that workflow Job. The older cache entry won't be used.

Even if the build fails it will create a new cache entry that will be used in all future jobs?

yogurtearl commented 11 months ago

Then I think the best solution would be to periodically execute a build using --refresh-dependencies.

If any build fails because of a dependency that is no longer available, I want ALL subsequent builds to always fail for that version of the dependency.

if I run with --refresh-dependencies only on every Nth day, but nothing changes, the build may fail on the Nth day, but might start passing again on the (N+1) day when there is no --refresh-dependencies and the artifact is still in the cache.

yogurtearl commented 11 months ago

A more general solution would probably be to put this in Gradle itself. I.e. have a org.gradle.refreshDependenciesAfterDays=N property.

yogurtearl commented 11 months ago

I filed this: https://github.com/gradle/gradle/issues/27410

bigdaz commented 10 months ago

When you configure a GitHub Actions Job with cache-write-only, then end result is a new cache entry that will be used by all subsequent builds for that workflow Job. The older cache entry won't be used.

Even if the build fails it will create a new cache entry that will be used in all future jobs?

Yes, that's correct. The action will save the Gradle User Home state even if the build fails. Any subsequent build for the same job will prefer this new cache entry. You can read more about the way cache entries are matched here.

Now that I fully understand your requirements, here are some options you could try:

  1. Run the workflow with cache-write-only: true every N days. This will effectively run the build with a clean Gradle User Home, and all subsequent workflow runs will use the resulting Gradle User Home cache entry.
  2. Run the workflow with gradle-home-cache-excludes: caches/modules-2/files-2.1 every N days. This will delete any downloaded dependencies before saving the Gradle User Home state. This won't impact the current build, but the next build will need to re-download dependencies.
  3. Add a step to the workflow that runs every N days, that purges the ~/.gradle/modules-2/files-2.1 directory after the Gradle User Home has been restored but before Gradle is executed. This is a cross between 1 & 2: only the downloaded dependencies are removed, but they will need to be re-downloaded in the current workflow execution.
yogurtearl commented 10 months ago

Presumably, all 3 of those options would need to happen on a build from the default branch (e.g. main ) ?

bigdaz commented 10 months ago

Presumably, all 3 of those options would need to happen on a build from the default branch (e.g. main ) ?

Presumably this would need to happen in every Job that can produce a cache entry, on each branch that the Job can run on. Cache entries written for a branch will take precedent over cache entries written for main, but these entries are scoped to be only visible to the branch that wrote them. By understanding the way that cache entries are matched (see here) you may be able to model a system that works for you.

I can see that this isn't trivial. If instead you want to brute-force the expiry of all cache entries every N days, you have a few options:

  1. Delete all cache entries for the repository every N days (using the GitHub API or GitHub CLI)
  2. As per 1), but restrict cache entry deletion to entries named dependencies-*. These are the entries that Gradle saves from caches/modules-2/files-2.1
  3. Use the undocumented GRADLE_BUILD_ACTION_CACHE_KEY_PREFIX environment variable to provide a new cache entry prefix every N days. This will effectively expire all gradle-build-action cache entries that use an older prefix.
bigdaz commented 9 months ago

To avoid complicating the action, we don't plan to add support for periodically clearing the cache. Instead, we suggest that the existing GitHub APIs are used to remove cache entries periodically.