kevincobain2000 / cache-http

action/cache temporary alternative to get dependency cache on GHES for self-hosted runners
https://medium.com/web-developer/github-actions-solving-actions-cache-v2-for-self-hosted-runners-on-github-enterprise-663f22caeee3
MIT License
17 stars 7 forks source link

No guards against cache corruption via multiple clients uploading the same file at the same time. #11

Open markstos opened 3 weeks ago

markstos commented 3 weeks ago

From looking at the code, it seems nothing prevents two jobs from attempting to upload the same file in parallel. It appears this could result in a corrupted file.

We have recently experienced a couple cases of file corruption of our cache managed by this server. While we have not pinpointed the cause and don't have an exact reproduction, it looks like nothing in the code guards against this.

kevincobain2000 commented 3 weeks ago

We have experienced that too.

markstos commented 3 weeks ago

The solution we are looking at is evaluating the "Actions Cache Server" is an alternative. It is a drop-in replacement for the Github-hosted caching service:

https://gha-cache-server.falcondev.io

Another alternative is service which directly access the file system: https://github.com/corca-ai/local-cache

Because it uses mv, as long as the cache is on the same file system, as the code tree is hosted, there should not be a race condition with it.

kevincobain2000 commented 3 weeks ago

Thanks for the alternatives. My solution was a quick solution to the problem long time ago. But I have no time to maintain this. We internally should move to the alternative too, the first one looks neat.

kevincobain2000 commented 3 weeks ago

Just another reason why we stopped using cache was because we started taking all kpis from https://coveritup.app/ and wanted to track the real numbers.

Something you might be interested in.

kevincobain2000 commented 3 weeks ago

Just a note. Action cache works normally now on GHEA. There is no need for any other extra workarounds.

markstos commented 2 weeks ago

Where is the cache located? On the self-hosted runner on Github's servers?

kevincobain2000 commented 2 weeks ago

Investigated by @gizumon

"Not working actions/cache action on our GHE" issue has been solved on the current GHES version. https://github.com/actions/cache (an official action provided by GitHub)

Actually, the speed won't change so much by comparing to kevincobain2000/action-cache-http action that we are using as a workaround (just 3 - 5 sec diff), but we can get the following benefit.

No need to host/maintain self-hosted cache API server Manage caches on GHE UI etc... (less parameters to be required, official support actions)

@markstos - Yes the cache is located on self-hosted. Just worked upon previous GHES update I believe?

      - name: Make cache key
        run: echo "HASH_KEY=redacted-${{ hashFiles('**/yarn.lock') }}" >> $GITHUB_ENV

      - name: Restore Cached node_modules
        if: ${{ inputs.action == 'deploy' }}
        id: cache-restore
        uses: actions/cache/restore@v4
        with:
          path: node_modules
          key: ${{ env.HASH_KEY }}

      - name: Node dependencies install
        if: ${{ inputs.action == 'deploy' && steps.cache-restore.outputs.cache-hit != 'true' }}
        run: npm install -g yarn && yarn install --prod --frozen-lockfile

      - name: Store Cache node_modules
        if: ${{ inputs.action == 'deploy' && steps.cache-restore.outputs.cache-hit != 'true' }}
        uses: actions/cache@v4
        with:
          path: node_modules
          key: ${{ env.HASH_KEY }}