Open ruffsl opened 6 months ago
For my use case, I am utilizing the digest of the last layer in an image to deterministically derive a cache key used for cache storage in a CI workflow. Given that image ID and digest are non-deterministic due to timestamps in the image config or OCI labels, I am relying on layer digest of the final layer inside the image to determine if a CI cache can be safely restored into the same exact image environment which was used to spawn a container that generated the initial CI cache artifacts. Although quite conservative, this does helps to capture any entropy in the CI build environment and avoid contaminating the CI cache from incompatible upstream changes.
For example, here is a composite action and example workflow that demonstrate this use case:
name: "Get Layer Metadata"
description: "GitHub Action to get layer metadata from Docker Buildx Bake output result"
branding:
icon: 'layers'
color: 'blue'
inputs:
metadata:
description: 'Build result metadata'
required: true
load:
description: "Load is a shorthand to use local registry"
required: false
default: 'false'
outputs:
metadata:
description: 'Layer result metadata'
value: ${{ steps.iterate_metadata.outputs.metadata }}
runs:
using: "composite"
steps:
- name: Iterate Metadata
id: iterate_metadata
env:
METADATA_INPUT: ${{ inputs.metadata }}
LOAD: ${{ inputs.load }}
shell: bash
run: |
set -eo pipefail
metadata_output=$METADATA_INPUT
for target in $(jq -r 'keys[]' <<< $METADATA_INPUT); do
data=$(jq -r ".${target}" <<< $METADATA_INPUT)
if [[ $LOAD == 'true' ]]; then
image_digest=$(jq -r '."containerimage.config.digest"' <<< $data)
layer_digest=$(docker inspect $image_digest | jq -r '.[0].RootFS.Layers[-1]')
else
image_digest=$(jq -r '."containerimage.digest"' <<< $data)
image_name=$(jq -r '."image.name"' <<< $data)
layer_digest=$(docker buildx imagetools inspect --raw $image_name@$image_digest | jq -r '.layers[-1].digest')
fi
metadata_output=$(jq ".${target}.\"layer.digest\" = \"$layer_digest\"" <<< $metadata_output)
done
{
echo "metadata<<EOF"
echo $metadata_output
echo "EOF"
} >> $GITHUB_OUTPUT
The action above is used to get the layer digest of the last layer in the targets generated by a buildkit bake action. The last layer digest of the tooler
target is used here to derive a cache key for the overlay-ws
cache storage. When workflow call inputs push
and load
are set to true
and false
respectively, the workflow will often complete much faster, avoid the need to download layers when possible, as well as avoid the need to output the image to the local docker daemon. But when pushing is not possible not intended, the workflow will always be much slower even when no layer cache miss occurs.
name: Build and Test
on:
workflow_call:
inputs:
push:
required: false
type: string
default: 'false'
description: Push resulting images to registry
load:
required: false
type: string
default: 'true'
description: Load resulting images to local docker daemon
jobs:
build_and_test:
name: Build and Test
runs-on: ubuntu-latest
steps:
- name: Checkout default ref
id: checkout_default_ref
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Docker tooler meta
id: docker_meta_tooler
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}
flavor: suffix=-tooler
bake-target: tooler
tags: type=raw,value=${{ github.head_ref || github.ref_name }}
- name: Build tooler stage
id: docker_bake_tooler
uses: docker/bake-action@v4
with:
pull: true
push: ${{ inputs.push }}
load: ${{ inputs.load }}
provenance: false
no-cache: false
targets: |
tooler
set: |
*.cache-from=type=registry,ref=ghcr.io/${{ github.repository }}:${{ github.head_ref }}-tooler
*.cache-from=type=registry,ref=ghcr.io/${{ github.repository }}:${{ github.base_ref }}-tooler
*.cache-from=type=registry,ref=ghcr.io/${{ github.repository }}:${{ github.event.repository.default_branch }}-tooler
files: |
./docker-bake.hcl
${{ steps.docker_meta_tooler.outputs.bake-file }}
- name: Get Layer Metadata
id: get_layer_metadata
uses: ./.github/actions/get-layer-metadata@main
with:
metadata: ${{ steps.docker_bake_tooler.outputs.metadata }}
load: ${{ inputs.load }}
- name: Layer metadata
id: layer_metadata
run: |
set -eo pipefail
tooler_digest="${{ fromJSON(steps.get_layer_metadata.outputs.metadata)['tooler']['layer.digest'] }}"
echo "tooler_digest=${tooler_digest#sha256:}" >> $GITHUB_OUTPUT
- name: Cache overlay
id: cache-overlay
uses: actions/cache@v4
with:
save-always: true
path: overlay-ws
key: overlay-v1-${{ github.ref }}-${{ steps.layer_metadata.outputs.tooler_digest }}-${{ github.run_id }}
restore-keys: |
overlay-v1-${{ github.ref }}-${{ steps.layer_metadata.outputs.tooler_digest }}
overlay-v1-refs/heads/${{ github.head_ref }}-${{ steps.layer_metadata.outputs.tooler_digest }}
overlay-v1-refs/heads/${{ github.base_ref }}-${{ steps.layer_metadata.outputs.tooler_digest }}
- name: Inject overlay
uses: reproducible-containers/buildkit-cache-dance@v3.1.0
with:
cache-map: |
{
"overlay-ws": {
"id": "tooler",
"sharing": "private",
"target": "/opt/overlay_ws"
}
}
skip-extraction: false
- name: Docker exporter meta
id: docker_meta_exporter
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}
flavor: suffix=-exporter
bake-target: exporter
tags: type=raw,value=${{ github.head_ref || github.ref_name }}
- name: Build exporter stage
id: docker_bake_exporter
uses: docker/bake-action@v4
with:
pull: true
push: ${{ inputs.push }}
load: ${{ inputs.load }}
provenance: false
no-cache: false
targets: exporter
files: |
./docker-bake.hcl
${{ steps.docker_meta_exporter.outputs.bake-file }}
I'd like to be able to inspect the manifest for built images generated by a buildkit builder without needing to export the image to an external registry, or load the image to a local docker daemon, given that former implies intent-to-distribute and write-access-permission in pushing the image, while the later always incurs the overhead in both downloading remote cached layers and uploading any add layer to local docker daemon.
If an image can be constructed entirely from cached layer available from a registry, buildkit can lazily build a target without downloading any image layers, saving time as well as bandwidth and storage resources. If pushing to the same registry, this is also quite efficient, resulting in only a few kilobytes of image manifest json files crossing the wire. This pushed manifest can then quickly be inspected by queering the registry using the image digest provided by resulting metadata from the buildkit build process and client CLI (e.g.
docker buildx imagetools
).However, if write access to the same registry is unavailable, or if redistribution of the built image is not intended, then (AFAIK) the only alternative to access the resulting manifest is by loading the image into the docker daemon that is hosting the buildkit builder container in order to use the client CLI (e.g.
docker image inspect
). This necessitates that all prerequisite layer are always downloaded so that the image may be output-ed to local docker engine. This can of course result in GB in data transfer, all for the purpose of inspecting a few KB of json strings.Is there a better way of inspecting manifests of image inside a buildkit builder? Perhaps this is related to: