moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.19k stars 1.16k forks source link

Inspect image manifest without pushing to registry or load to local docker daemon #4854

Open ruffsl opened 6 months ago

ruffsl commented 6 months ago

I'd like to be able to inspect the manifest for built images generated by a buildkit builder without needing to export the image to an external registry, or load the image to a local docker daemon, given that former implies intent-to-distribute and write-access-permission in pushing the image, while the later always incurs the overhead in both downloading remote cached layers and uploading any add layer to local docker daemon.

If an image can be constructed entirely from cached layer available from a registry, buildkit can lazily build a target without downloading any image layers, saving time as well as bandwidth and storage resources. If pushing to the same registry, this is also quite efficient, resulting in only a few kilobytes of image manifest json files crossing the wire. This pushed manifest can then quickly be inspected by queering the registry using the image digest provided by resulting metadata from the buildkit build process and client CLI (e.g. docker buildx imagetools).

However, if write access to the same registry is unavailable, or if redistribution of the built image is not intended, then (AFAIK) the only alternative to access the resulting manifest is by loading the image into the docker daemon that is hosting the buildkit builder container in order to use the client CLI (e.g. docker image inspect). This necessitates that all prerequisite layer are always downloaded so that the image may be output-ed to local docker engine. This can of course result in GB in data transfer, all for the purpose of inspecting a few KB of json strings.

Is there a better way of inspecting manifests of image inside a buildkit builder? Perhaps this is related to:

ruffsl commented 6 months ago

For my use case, I am utilizing the digest of the last layer in an image to deterministically derive a cache key used for cache storage in a CI workflow. Given that image ID and digest are non-deterministic due to timestamps in the image config or OCI labels, I am relying on layer digest of the final layer inside the image to determine if a CI cache can be safely restored into the same exact image environment which was used to spawn a container that generated the initial CI cache artifacts. Although quite conservative, this does helps to capture any entropy in the CI build environment and avoid contaminating the CI cache from incompatible upstream changes.

For example, here is a composite action and example workflow that demonstrate this use case:

name: "Get Layer Metadata"
description: "GitHub Action to get layer metadata from Docker Buildx Bake output result"
branding:
  icon: 'layers'
  color: 'blue'

inputs:
  metadata:
    description: 'Build result metadata'
    required: true
  load:
    description: "Load is a shorthand to use local registry"
    required: false
    default: 'false'

outputs:
  metadata:
    description: 'Layer result metadata'
    value: ${{ steps.iterate_metadata.outputs.metadata }}

runs:
  using: "composite"
  steps:
    - name: Iterate Metadata
      id: iterate_metadata
      env:
        METADATA_INPUT: ${{ inputs.metadata }}
        LOAD: ${{ inputs.load }}
      shell: bash
      run: |
        set -eo pipefail
        metadata_output=$METADATA_INPUT
        for target in $(jq -r 'keys[]' <<< $METADATA_INPUT); do
          data=$(jq -r ".${target}" <<< $METADATA_INPUT)
          if [[ $LOAD == 'true' ]]; then
            image_digest=$(jq -r '."containerimage.config.digest"' <<< $data)
            layer_digest=$(docker inspect $image_digest | jq -r '.[0].RootFS.Layers[-1]')
          else
            image_digest=$(jq -r '."containerimage.digest"' <<< $data)
            image_name=$(jq -r '."image.name"' <<< $data)
            layer_digest=$(docker buildx imagetools inspect --raw $image_name@$image_digest | jq -r '.layers[-1].digest')
          fi
          metadata_output=$(jq ".${target}.\"layer.digest\" = \"$layer_digest\"" <<< $metadata_output)
        done
        {
          echo "metadata<<EOF"
          echo $metadata_output
          echo "EOF"
        } >> $GITHUB_OUTPUT

The action above is used to get the layer digest of the last layer in the targets generated by a buildkit bake action. The last layer digest of the tooler target is used here to derive a cache key for the overlay-ws cache storage. When workflow call inputs push and load are set to true and false respectively, the workflow will often complete much faster, avoid the need to download layers when possible, as well as avoid the need to output the image to the local docker daemon. But when pushing is not possible not intended, the workflow will always be much slower even when no layer cache miss occurs.

name: Build and Test

on:
  workflow_call:
    inputs:
      push:
        required: false
        type: string
        default: 'false'
        description: Push resulting images to registry
      load:
        required: false
        type: string
        default: 'true'
        description: Load resulting images to local docker daemon

jobs:
  build_and_test:
    name: Build and Test
    runs-on: ubuntu-latest
    steps:
      - name: Checkout default ref
        id: checkout_default_ref
        uses: actions/checkout@v4
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      - name: Login to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Docker tooler meta
        id: docker_meta_tooler
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}
          flavor: suffix=-tooler
          bake-target: tooler
          tags: type=raw,value=${{ github.head_ref || github.ref_name }}
      - name: Build tooler stage
        id: docker_bake_tooler
        uses: docker/bake-action@v4
        with:
          pull: true
          push: ${{ inputs.push }}
          load: ${{ inputs.load }}
          provenance: false
          no-cache: false
          targets: |
            tooler
          set: |
            *.cache-from=type=registry,ref=ghcr.io/${{ github.repository }}:${{ github.head_ref }}-tooler
            *.cache-from=type=registry,ref=ghcr.io/${{ github.repository }}:${{ github.base_ref }}-tooler
            *.cache-from=type=registry,ref=ghcr.io/${{ github.repository }}:${{ github.event.repository.default_branch }}-tooler
          files: |
            ./docker-bake.hcl
            ${{ steps.docker_meta_tooler.outputs.bake-file }}

      - name: Get Layer Metadata
        id: get_layer_metadata
        uses: ./.github/actions/get-layer-metadata@main
        with:
          metadata: ${{ steps.docker_bake_tooler.outputs.metadata }}
          load: ${{ inputs.load }}
      - name: Layer metadata
        id: layer_metadata
        run: |
          set -eo pipefail
          tooler_digest="${{ fromJSON(steps.get_layer_metadata.outputs.metadata)['tooler']['layer.digest'] }}"
          echo "tooler_digest=${tooler_digest#sha256:}" >> $GITHUB_OUTPUT

      - name: Cache overlay
        id: cache-overlay
        uses: actions/cache@v4
        with:
          save-always: true
          path: overlay-ws
          key: overlay-v1-${{ github.ref }}-${{ steps.layer_metadata.outputs.tooler_digest }}-${{ github.run_id }}
          restore-keys: |
            overlay-v1-${{ github.ref }}-${{ steps.layer_metadata.outputs.tooler_digest }}
            overlay-v1-refs/heads/${{ github.head_ref }}-${{ steps.layer_metadata.outputs.tooler_digest }}
            overlay-v1-refs/heads/${{ github.base_ref }}-${{ steps.layer_metadata.outputs.tooler_digest }}
      - name: Inject overlay
        uses: reproducible-containers/buildkit-cache-dance@v3.1.0
        with:
          cache-map: |
            {
              "overlay-ws": {
                "id": "tooler",
                "sharing": "private",
                "target": "/opt/overlay_ws"
              }
            }
          skip-extraction: false

      - name: Docker exporter meta
        id: docker_meta_exporter
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}
          flavor: suffix=-exporter
          bake-target: exporter
          tags: type=raw,value=${{ github.head_ref || github.ref_name }}
      - name: Build exporter stage
        id: docker_bake_exporter
        uses: docker/bake-action@v4
        with:
          pull: true
          push: ${{ inputs.push }}
          load: ${{ inputs.load }}
          provenance: false
          no-cache: false
          targets: exporter
          files: |
            ./docker-bake.hcl
            ${{ steps.docker_meta_exporter.outputs.bake-file }}