Process multiple targets in single action call and support S3 backend

strophy commented 8 months ago

Hi @AkihiroSuda thank you for picking up maintenance of this important action!

We have added two features on a fork over at https://github.com/dcginfra/buildkit-cache-dance and I wonder if you would be interested in PRs to add these features to v2 of the action, now that its use is recommended in the official Docker documentation. We have two main changes:

Process multiple cache mounts in a single pass by specifying an ID for each mount
Support AWS S3 as an alternative cache storage backend

The changes require the user's Dockerfile to be modified with cache IDs like this:

FROM ubuntu:22.04
RUN \
  --mount=type=cache,target=/var/cache/apt,sharing=locked,id=apt-cache \
  --mount=type=cache,target=/var/lib/apt,sharing=locked,id=apt-lib \
  apt-get update && apt-get install -y gcc

And the action is called something like this:

- name: inject cache mounts into docker
  uses: reproducible-containers/buildkit-cache-dance@mount-id-example
  with:
    mounts: |
      apt-cache
      apt-lib

The main change is in the Dancefile, which is generated on the fly with as many mounts and copy operations as necessary. There is no need to pass the cache-source and cache-target separately anymore because the cache is identified by its unique ID instead, like this:

- name: Prepare list of cache mounts for Dancefile
  uses: actions/github-script@v6
  id: mounts
  with:
    script: |
      const mountIds = `${{ inputs.mounts }}`.split(/[\r\n,]+/)
        .map((mount) => mount.trim())
        .filter((mount) => mount.length > 0);

      const cacheMountArgs = mountIds.map((mount) => (
        `--mount=type=cache,sharing=shared,id=${mount},target=/cache-mounts/${mount}`
      )).join(' ');

      const s3commands = mountIds.map((mount) => (
        `aws s3 sync --no-follow-symlinks --quiet s3://${{inputs.bucket}}/cache-mounts/${mount} /cache-mounts/${mount}`
      )).join('\n');

      core.setOutput('cacheMountArgs', cacheMountArgs);
      core.setOutput('s3commands', s3commands);

- name: Inject cache data into buildx context
  shell: bash
  run: |
    docker build ${{ inputs.cache-source }} --file - <<EOF
    FROM amazon/aws-cli:2.13.17
    COPY buildstamp buildstamp
    RUN ${{ steps.mounts.outputs.cacheMountArgs }} <<EOT
        echo -e '${{ steps.mounts.outputs.s3commands }}' | sh && \
        chmod 777 -R /cache-mounts || true
    EOT
    EOF

The code is currently still written in JS, and is quite tightly bound to S3 (since that is what we need) but I'd love to see features like this supported in the maintained version of the action, since there has been a lot of discussion about this (as I'm sure you're aware). Thoughts?

AkihiroSuda commented 8 months ago

Thanks for proposal, SGTM

How will “mounts” work with actions/cache?
Do we really need to execute the awscli inside Dockerfile?
Probably, composite actions such as github-script cannot be used: https://github.com/reproducible-containers/buildkit-cache-dance/pull/4

strophy commented 8 months ago

I'm not sure about this, but I think we can call the GH cache API directly? The action would therefore require two inputs:
- list of mount ids
- (optional) cache backend (default to using GHA cache, if using S3 then bucket name is also needed)
Executing the cache call directly inside the Dockerfile results in a significant speedup with large cache by removing one of the copy operations, and uses less drive space because there is no need to store the cache in an intermediate step, so the copy operation cache mount -> runner local storage -> external cache becomes cache mount -> external cache directly
Yes, this would need to be rewritten in bash

We could probably even go a step further for point 1 and implement Apache OpenDAL as the backend, immediately adding support for a wide range of cloud storage. See https://github.com/everpcpc/actions-cache for an existing implementation of this.

AkihiroSuda commented 8 months ago

OpenDAL

What about rclone? https://github.com/rclone/rclone

strophy commented 8 months ago

rclone looks perfect!

reproducible-containers / buildkit-cache-dance

Process multiple targets in single action call and support S3 backend #10