leanprover / lean-action

GitHub action for standard CI in Lean projects
Apache License 2.0
16 stars 3 forks source link

Problem about caching strategy for Lean #64

Closed SnO2WMaN closed 4 months ago

SnO2WMaN commented 5 months ago

Note: This is based on my assumptions and might not be entirely accurate.

This issue occurs when we update our project from v4.9.0-rc1 to v4.9.0-rc2. https://github.com/iehality/lean4-logic/pull/85 And it seems to be happens same problem branch in the nightly branch of the import-graph.

In our project, we set the GitHub Action cache strategy as follows

path: .lake
key: lake-${{ runner.os }}-${{ hashFiles('lake-manifest.json') }}-${{ github.ref }}
restore-keys: |
    lake-${{ runner.os }}-${{ hashFiles('lake-manifest.json') }}-${{ github.ref }}
    lake-${{ runner.os }}-${{ hashFiles('lake-manifest.json') }}
    lake-${{ runner.os }}

The problem we encountered was with the restore-keys we specified. The issue is that when updating the Lean version itself, the previous version's cache also becomes a fallback candidate. In our project, it seems we referenced the v4.9.0-rc1 cache, which caused the build to fail.

Solution is remove fallbacks of restore-keys to lake-${{ runner.os }}-${{ hashFiles('lake-manifest.json') }}-${{ github.ref }} only.

Currently, the cache strategy of this action is similar (i.e., it doesn't avoid using the cache when the Lean version itself is updated).

https://github.com/leanprover/lean-action/blob/52906d40616a3ed5a566f5f01db39858c0f6eee2/action.yml#L93-L99

This seems to be why the nightly branch of the import-graph is broken. As a solution, specifying ${{ hashFiles('lean-toolchain') }} as the key should work.

SnO2WMaN commented 5 months ago

Maybe the cache key should be like this, specifying lean-toolchain and lake-manifest.json

key: lake-${{ runner.os }}-${{ hashFiles('lean-toolchain') }}-${{ hashFiles('lake-manifest.json') }}-${{ github.ref }}
restore-keys: lake-${{ runner.os }}-${{ hashFiles('lean-toolchain') }}-${{ hashFiles('lake-manifest.json') }}-${{ github.ref }}
digama0 commented 5 months ago

duplicate of #62

austinletson commented 5 months ago

Even though the failure is the same as #62, I think the suggestion to improve the key for the caching is still valid.

Maybe the cache key should be like this, specifying lean-toolchain and lake-manifest.json

key: lake-${{ runner.os }}-${{ hashFiles('lean-toolchain') }}-${{ hashFiles('lake-manifest.json') }}-${{ github.ref }}
restore-keys: lake-${{ runner.os }}-${{ hashFiles('lean-toolchain') }}-${{ hashFiles('lake-manifest.json') }}-${{ github.ref }}

I think this is an improvement over the existing key input for the cache

However, it seems to me that the restore-keys need to be a prefix match of whatever existing cache key is hit. If you have the same value for the key and restore-key input it seems redundant, although I could be missing something here.

For the restore-key input what about:

restore-keys: lake-${{ runner.os }}-${{ hashFiles('lean-toolchain') }}-${{ hashFiles('lake-manifest.json') }}

With this lean-action can restore caches where lean-toolchain and lake-manifest.json remain unchanged.

@SnO2WMaN what do you think?

Also for reference, the initial discussion of what key to use for the hash was here.

SnO2WMaN commented 5 months ago

Thanks for your advice.

Once considered, we added ${{ github.ref }} to prevent build errors caused by cache from different Lean versions, but now we have switch to use lean-toolchain. It might no longer necessary so we might be able to remove ${{ github.ref }} from both the key and restore-key.

austinletson commented 4 months ago

Thank you @SnO2WMaN for raising this issue and coming up with suggestions for improved caching keys! #71 implements these changes.