gradle / gradle-build-action

Execute your Gradle build and trigger dependency submission
https://github.com/marketplace/actions/gradle-build-action
MIT License
671 stars 97 forks source link

Is it possible to share depenedencies / `caches/modules-2` between multiple jobs? #1033

Closed scana closed 8 months ago

scana commented 8 months ago

Hi! Thank you for this GitHub Action, it's been a blast using it so far 😄

I am currently running a single workflow with 4 separate jobs structured this way:

ktLint -> 
       unit-tests
       android-lint
       screenshot-test

it seems that each of those jobs would publish it's own dependencies cache (and instrumentation jars):

Screenshot 2024-01-11 at 12 26 06

Is there a way to let only one of those jobs publish those, while the rest would be able to reuse them? I tried the following:

  ktlint:
    name: KtLint
      ...
      - name: Run ktlint
        uses: gradle/gradle-build-action@v2
        with:
          gradle-home-cache-cleanup: true
          gradle-home-cache-excludes: |
            caches/jars-9
            caches/modules-2
          arguments: ktlintCheck

  unit-tests:
    name: Unit Tests
      ...
      - name: Run tests
        uses: gradle/gradle-build-action@v2
        with:
          gradle-home-cache-cleanup: true
          arguments: test

but then it seems like those cache entries are omitted / not downloaded at all:

Entry: Gradle User Home
    Requested Key : v8-gradle|Linux|checks-ktlint[37a6259cc0c1dae299a7866489dff0bd]-e17cbd2bab4030bc45ad8460cf3d2318bd6bc322
    Restored  Key : v8-gradle|Linux|checks-ktlint[37a6259cc0c1dae299a7866489dff0bd]-6ceb06cc32243f25912bd344b7d0c2b9a1455409
              Size: 293 MB (307078181 B)
              (Entry restored: partial match found)
    Saved     Key : v8-gradle|Linux|checks-ktlint[37a6259cc0c1dae299a7866489dff0bd]-e17cbd2bab4030bc45ad8460cf3d2318bd6bc322
              Size: 293 MB (307121161 B)
              (Entry saved)
---
Entry: /home/runner/.gradle/caches/8.4/generated-gradle-jars/gradle-api-8.4.jar
    Requested Key : generated-gradle-jars-1a83665c481822ee3817bdf1f75747ae
    Restored  Key : generated-gradle-jars-1a83665c481822ee3817bdf1f75747ae
              Size: 35 MB (36850579 B)
              (Entry restored: exact match found)
    Saved     Key : 
              Size: 
              (Entry not saved: contents unchanged)
---
Entry: /home/runner/.gradle/wrapper/dists/gradle-8.4-bin/1w5dpkrfk8irigvoxmyhowfim
    Requested Key : wrapper-zips-f82d94dd741812d9afc7aad8d4ab116c
    Restored  Key : wrapper-zips-f82d94dd741812d9afc7aad8d4ab116c
              Size: 124 MB (130236794 B)
              (Entry restored: exact match found)
    Saved     Key : 
              Size: 
              (Entry not saved: contents unchanged)
bigdaz commented 8 months ago

If these builds resolve exactly the same set of dependencies, then they will share the dependency-* cache entry. (You can likely see this with them sharing the wrapper-zips-* entry.) Otherwise, these builds will have distinct cache entries. Cache entries in GitHub are immutable once written, so it's not possible to have a single dependencies entry that is appended to by each build.

There are ways this could potentially be improved, by having the action extract cache entries for common sets of shared dependencies, or separating "stable" sets of dependencies from ones that change more often. I haven't found time to work on this yet.

Presently, the only way to have these Jobs share the dependencies entry is for them to share a "top-level" Gradle User Home cache entry. So you could have the ktlint Job write a cache entry, and have the other jobs configured with cache-read-only: true. They should then reuse the Gradle User Home entry from the ktlint job, which would also mean sharing the dependencies. The downside to this setup is that any dependencies that are not downloaded by the ktlint job will need to be re-downloaded each time.

May I ask what is prompting this question? Are you actually running out of space in you GitHub Actions cache, or just looking at ways to optimized the process?

scana commented 8 months ago

Thank you @bigdaz for such an extensive answer. It's much more clear to me now. Could you please point me to a place where hash for dependencies-* is calculated?

May I ask what is prompting this question? Are you actually running out of space in you GitHub Actions cache, or just looking at ways to optimized the process?

I just figured that having multiple dependencies sets being stored will stop scaling if I add more jobs to the project. Meanwhile seeing that dependencies sets are of different size so it gave me an idea that the biggest one will be a superset for all of them.

bigdaz commented 8 months ago

The dependencies-* entry is defined here. Since the file-names are sufficient to uniquely identify the dependencies contained therein, we construct a hash of the file names here.

A couple of things to note: