moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.21k stars 1.16k forks source link

Cache with mode=max not correctly used with Multi-stage build #1515

Open bgaillard opened 4 years ago

bgaillard commented 4 years ago

Hi, first sorry to open a new issue (I already opened one few days ago for an other subject) but I definitely do not understand the caching behavior with Multi-stage builds.

I created a simple Docker Multi-stage build here to allow you to reproduce the problem easily : https://github.com/bgaillard/moby-cache

I use the following command to build a Docker image and import / export a caching image with all the intermediate stages.

docker run \
    --rm \
    --privileged \
    -v $(realpath ./):/tmp/src \
    -v $(realpath ./):/tmp/dockerfile \
    -v $HOME/.docker:/root/.docker \
    --entrypoint buildctl-daemonless.sh \
    moby/buildkit:master \
    build \
    --frontend dockerfile.v0 \
    --local context=/tmp/src \
    --local dockerfile=/tmp/dockerfile \
    --output type=image,name=${DOCKER_REPO}/test:1,push=true \
    --export-cache type=registry,ref=${DOCKER_REPO}/test:cache,mod=max,push=true \
    --import-cache type=registry,ref=${DOCKER_REPO}/test:cache

As described in the documentation here https://github.com/moby/buildkit#--export-cache-options i'm expecting the cache to work because I use mode=max.

mode=max: export all the layers of all intermediate steps. Not supported for inline cache exporter.

When I build a first time it's ok everything is build and I have 2 Docker images pushed on my Docker registry (the output image and the cache image).

If I build a second time without modifying anything I correctly observe that the import cache works and the cache is correctly used (all steps have a CACHED output in the console output).

If I update the js/file3.js file and I rebuild then the behavior is correct, all steps have the CACHED output except the one associated to COPY js js (last line of Dockerfile).

But, if I update the assets/js/react/file1.js file or the assets/js/react/sub/file2.js and I rebuild then every steps are rebuilt.

Instead of this i'm expecting to observe this.

Why the cache is not used for steps which do not have any files updated ?

Thanks for your help

bgaillard commented 4 years ago

I finally found what was the problem, don't know if the behavior I observed is normal or not.

You'll find a detailed description of my observations in the README.md file of the https://github.com/bgaillard/moby-cache project.

So, to summarize if I install Buildx and then create a builder with docker buildx create --name baptiste --use the command described in the issue works correctly and exports all the caching layers of the Multi-Stage Dockerfile. Then sub-subsequent builds correctly use the cache.

If the builder is not created then the commands seems to only export a sub-set of the caching layers (I suppose those associated to final stages and not the ones associated to intermediate stages).

Is this behavior expected ?

tonistiigi commented 4 years ago

dockerd (same as buildx with Docker driver) currently only supports inline cache export https://github.com/docker/buildx#--cache-tonametypetypekeyvalue . I'm surprised you made it that far at all, the whole export phase definitely should not happen at all. In your original example you use buildkit image directly so no dockerd nor buildx and that is full buildkit with mode=max support.

macropin commented 3 years ago

I think I've encountered the same or similar issue. Using cache-to/from and type=local,mode=max. I would expect that I can reuse the cache between builds. But what I'm seeing is that the preceding build is partly invalidating or evicting objects from the cache when the layers are not required for that build target.

It appears as if mode=min is actually being applied. But I suspect the real issue is that the usage / eviction policy is just counter-intuitive to what is required when used with multi-stage builds...

Hopefully this explains the issue and how using cache with multistage builds is pretty broken:

# simple dockerfile with two targets

FROM x AS a
# stuff 

FROM a AS b
# stuff

(simplified commands to keep it simple

$ buildx build --target b using cache from prior target b build, nothing is done (cached)

$ buildx build --target b using cache from prior target a build, this rebuilds step b

$ buildx build --target a using cache from prior target b build, nothing is done (cached), however cache save-to will remove the b target from the cache!

$ buildx build --target a using cache from prior target a, nothing is done (cached)

Because of the cache eviction in the third example it's not possible to blindly use --cache-to/--cache-from with multistage builds. This makes the caching functionality very difficult to use.

RunsFor commented 3 years ago

Because of the cache eviction in the third example it's not possible to blindly use --cache-to/--cache-from with multistage builds. This makes the caching functionality very difficult to use.

Recently ran into the same issue.

Intuitively, I've treated buildkit cache the same as docker layers cache, and was building different targets with the same cache-to/cache-from path.

As I figured it out, parts of cache may be removed after recent target export, I'm thinking about different cache destination for each target, but it seems not very efficient.

tonistiigi commented 3 years ago

If you think cache is not used properly you need to provide a runnable reproducer. And explain there what is unexpected.

fmmoret commented 1 year ago

Probably a dead ticket but in the pasted example you used mod=max instead of mode=max