reproducible-containers / buildkit-cache-dance

Save `RUN --mount=type=cache` caches on GitHub Actions ( Forked from https://github.com/overmindtech/buildkit-cache-dance )
Apache License 2.0
143 stars 29 forks source link

Allow setting id of mount #12

Closed alex-statsig closed 5 months ago

alex-statsig commented 11 months ago

When using docker --mount=type=cache, there is an optional argument of id. This can be used to cache the same directory under a different volume to prevent collisions. I'm not sure that it makes a ton of sense to use in this context, but when I was following a guide they provided an id for the mount. However, this action does not accept an id. This was confusing because it led to this action not working (as its injection/extraction steps dont set the id, so it wont match during the real run). I'm not sure how important it is to add, but could be nice for consistency and preventing confusion

adam-vessey commented 9 months ago

Varying the id would be useful to be able to work around things like https://github.com/docker/buildx/issues/549#issuecomment-1788297892 (adding the $TARGETARCH to the id to avoid issues sharing caches between architectures in multi-platform builds); however, unsure how things would get configured.


Though, on second thought, might be handleable w/o the needing to actually vary the id by instead segmenting the mounted directory (as per https://github.com/docker/buildx/issues/549#issuecomment-786897821), assuming --source can interpolate variables (containing the arch), such that the id can be the same?

ruffsl commented 5 months ago

@AkihiroSuda , after the recent refactor, do you this suppose this something that could more easily added? I'd be willing to take a stab at it, but my typescript chops are a bit rusty, Just wanted to first ask if this was on someone else's rode-map already.


My use case is that I'm using this action for persisting bulkit's internal mount cache for CI jobs that build and push to a image registry. Those images in turn are use by developers via cache-from when rebuilding locally. To avoid cache collisions or cross-talk between other local builds and projects that use similar target paths, a unique cache id is delegated to each project. Thus, to maximize build instruction cache preservation/sharing, both CI must utilize the same cache id as developers, rather than simply leaving it unspecified.

AkihiroSuda commented 5 months ago

cc @aminya (https://github.com/reproducible-containers/buildkit-cache-dance/pull/25) , how should the yaml look like for id?

aminya commented 5 months ago

If we want to share the same cache-id for all the paths in a single call, we can add a cache_id argument and use cache-id-<path> as the actual id for each call.

      - name: inject cache into docker
        uses: reproducible-containers/buildkit-cache-dance@v3.0.0
        with:
          cache-map: |
            {
              "var-cache-apt": "/var/cache/apt",
              "var-lib-apt": "/var/lib/apt"
            }
          cache-id: cach1
          skip-extraction: ${{ steps.cache.outputs.cache-hit }}

Potentially #21 can be reworked for this.

ruffsl commented 5 months ago

@aminya , admittedly that would probably work for fine my own use case. And if did end up using multiple cache mounts mounts, I suppose I could go back to what I did before like with reproducible-containers/buildkit-cache-dance@v2 and just invoke the action multiple times for each different cache id as I did for target paths previously.


However, given the motivation of the refactor, it might be worth thinking extending the json so folks could pass in arbitrary fields, even as buildkit itself evolves, or as complexity of the end users cache mounts increases. E.g.

      - name: inject cache into docker
        uses: reproducible-containers/buildkit-cache-dance@v3.0.0
        with:
          cache-map: |
            {
              "var-cache-apt": {"id": "foo", "target": "/var/cache/apt", "readonly": "", "sharing": "", "mode": ""},
              "alternatively": "--mount=type=cache,id=foo,target='/foo space bar/spam',mode=,uid=1000", 
              "var-lib-apt": "/var/lib/apt"
            }
          cache-id: cach1
          skip-extraction: ${{ steps.cache.outputs.cache-hit }}

Admittedly, I don't know of the full ramifications that also tweaking the readonly, from, source, mode, uid, gid options would have for CI or this action, but might as well expose the lot of them if easily subsurfaced. Could also parse the both ways simulations to keep backwards compatibility or support the current abbreviation of json elements.


Some would make less sense to expose such as readonly, but then again, some folks may only ever want to extract vs inject the buildkit cache and never accidentally write to it. This kind of touches on my preference for fine control and separate action steps for extracting vs injection, much like github actions thankfully added with the separate save/restore actions, rather than only one cache action. This make the two step much simpler to use with conditional flow control via constitutive outputs from later results, such as a test failure status or runtime errors.

aminya commented 5 months ago

The beauty of the current JSON approach I took is that it is extensible and type-safe. So we can make the cache map object more complex to support any arbitrary value, and default to the target path if it is only a string.

type TargetPath = string
type CacheOptions = TargetPath | {
  target: TargetPath
  id?: string
  readonly?: boolean
  // etc
} 
type CacheMap = Record<string, CacheOptions> 

What are the known cache options that could be used as the properties? Could you send the documentation on these properties?

Alternatively, instead of making the options type-safe, we can accept arbitrary properties, and expect the user to use valid props.

aminya commented 5 months ago

I made a pull request (#12) to support any arbitrary options for --mount including id.