moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.21k stars 1.16k forks source link

improve cache for labels changes #696

Open FernandoMiguel opened 6 years ago

FernandoMiguel commented 6 years ago

We have a use case where a dependentant image gets its cache invalidated by the upstream image adding a new label.

We are wondering if there is anything that can be done at buildkit level (since there is so much work done to improve caching) that could help us improve this use case.

We first build a php docker (using docker-compose)

FROM php:7.0-fpm-alpine AS php70

LABEL maintainer "https://github.com/FernandoMiguel"

RUN adduser -DHSu 100 nginx -s /sbin/nologin
COPY --from=composer /usr/bin/composer /usr/bin/composer

COPY ./opcache.ini /usr/local/etc/php/conf.d/opcache.ini

RUN apk add --no-cache --virtual build-dependencies \
    $PHPIZE_DEPS \
    autoconf \
    automake \
    build-base \
    cmake \
    curl-dev \
    file \
    g++ \
    gcc \
    gettext-dev \
    git \
    icu-dev \
    libc-dev \
    libmcrypt-dev \
    libpng-dev \
    libressl-dev \
    libtool \
    libxml2-dev \
    libxslt-dev \
    make \
    nasm \
    pcre-dev \
    pkgconf \
    re2c \
    sqlite-dev \
    wget \
    zlib-dev

[....]

WORKDIR /var/www

docker-compose tags the labels.

version: "3.6"
services:
  php70:
    image: xxx.dkr.ecr.eu-west-1.amazonaws.com/php:release-production-php70
    build: 
      context: ../infrastructure-as-code/Docker/php
      dockerfile: Dockerfile-7.0
      labels:
        COMMIT_ID.KIMI: ${COMMIT_ID_KIMI}
        COMMIT_ID.SELF_SUBMODULE: ${COMMIT_ID_SELF_infrastructure_as_code}

We then build images that use that as the FROM image

ARG PHPV=-php70
FROM XXXX.dkr.ecr.eu-west-1.amazonaws.com/php:release-production${PHPV} AS lumen-builder

LABEL maintainer "https://github.com/FernandoMiguel"

COPY . /src

RUN cd /src && \
    composer -v install --no-dev --no-interaction

Here's the bit from docker-compose

  ms-builder:
    build:
      context: ../ms-xxx
      dockerfile: ../.Docker-lumen/Dockerfile-builder
      args:
        - PHPV=${PHPV}
      labels:
        COMMIT_ID.KIMI: ${COMMIT_ID_KIMI}
        COMMIT_ID.DOCKERFILE_SUBMODULE: ${COMMIT_ID_SELF__Docker_lumen}
    image: ms-builder:release-${PHPV}

Sadly even with no code change in any of these two code bases, if anything changes in the repo, the HEAD will change, making the label be updated. What happens is that the dependant image will now bust cache cause the upstream image updated the label, and as such the image id.

tonistiigi commented 6 years ago

Are you saying that a base image in a dockerfile changed but the change was only in labels metadata and all the layers were still the same? Or that you expect every command (eg. RUN) to do content-based caching same way as COPY does.

FernandoMiguel commented 6 years ago

The base images only changed metadata between consecutive builds as everything else in its rebuild was cached. So far so good.

The issue is with the dependant images. From their point of view the base image is new (which is true) but the actual change for both base and child images was just metadata, all the source code is the same. But since the child sees a new build id for the base image, and the first layer is a newish FROM, all further cache is being invalidated.

I am seeking a better support in dealing with these scenarios, so that Docker/buildkit will avoid the cache invalidation if the base image significant layers are the same

Hope that makes sense.

FernandoMiguel commented 5 years ago

bumping this, is there interest short term or long term to address this or shall i look at alternatives?

tonistiigi commented 5 years ago

Sorry for inactivity on this.

One of the cache keys for an image source(at least in schema2 when this is possible) is the checksum of the layers. So if the only thing that changed in the base image was metadata it should not invalidate the next commands (except for example env cause they would change the runtime context of the subsequent RUN commands).

If you think this is not the case please post an example we could try. For example, 2 public images that show this.

FernandoMiguel commented 5 years ago

Thanks for looking into this again. I'm away this week, but I'll try to provide an example as soon as possible

Cheers