Not possible to mount paths that are excluded by dockerignore

thaJeztah commented 3 years ago

This may be (somewhat) expected, but thought I'd open a ticket, because I can see use-cases where this functionality would be useful.

Description

I'm trying to exclude paths in the build-context (through .dockerignore), to prevent those paths from being included in the image that is built. However, some steps make use of the excluded files, and to provide access, I'm using RUN --mount, to "overlay" the excluded files.

Prepare

mkdir excluded_mount && cd mkdir excluded_mount

mkdir -p assets src
touch assets/some-file.txt src/some-source-file.txt

cat > Dockerfile <<EOF
#syntax=docker/dockerfile:1.2

FROM busybox
WORKDIR /project
COPY . .

# Mount the assets directory, and recursively show all files in the project
# directory. Exit with a non-zero exit code, so that the results are printed.
RUN --mount=source=/assets,target=/project/assets ls -lR && exit 1
EOF

Without dockerignore

Build the Dockerfile, and notice that the assets directory is successfully mounted

$ DOCKER_BUILDKIT=1 docker build --no-cache .

[+] Building 2.7s (10/10) FINISHED
 => [internal] load build definition from Dockerfile                                     0.2s
 => => transferring dockerfile: 181B                                                     0.0s
 => [internal] load .dockerignore                                                        0.2s
 => => transferring context: 2B                                                          0.0s
 => resolve image config for docker.io/docker/dockerfile:1.2                             1.1s
 => CACHED docker-image://docker.io/docker/dockerfile:1.2@sha256:e2a8561e419ab1ba6b2f... 0.0s
 => [internal] load metadata for docker.io/library/busybox:latest                        0.0s
 => [1/4] FROM docker.io/library/busybox                                                 0.0s
 => [internal] load build context                                                        0.1s
 => => transferring context: 303B                                                        0.0s
 => CACHED [2/4] WORKDIR /project                                                        0.0s
 => [3/4] COPY . .                                                                       0.2s
 => ERROR [4/4] RUN --mount=source=/assets,target=/project/assets ls -lR && exit 1       0.5s
------
 > [4/4] RUN --mount=source=/assets,target=/project/assets ls -lR && exit 1:
#10 0.353 .:
#10 0.353 total 12
#10 0.353 -rw-r--r--    1 root     root           137 Jan 13 12:21 Dockerfile
#10 0.353 drwxr-xr-x    2 root     root          4096 Jan 13 12:20 assets
#10 0.353 drwxr-xr-x    2 root     root          4096 Jan 13 12:20 src
#10 0.353
#10 0.353 ./assets:
#10 0.353 total 0
#10 0.353 -rw-r--r--    1 root     root             0 Jan 13 12:19 some-file.txt
#10 0.353
#10 0.353 ./src:
#10 0.353 total 0
#10 0.353 -rw-r--r--    1 root     root             0 Jan 13 12:19 some-source-file.txt
------
executor failed running [/bin/sh -c ls -lR && exit 1]: exit code: 1

With a `.dockerignore`

Create a .dockerignore to exclude the assets directory from COPY:

echo "/assets/" > Dockerfile.dockerignore

Build the image again;

$ DOCKER_BUILDKIT=1 docker build --no-cache .

[+] Building 2.3s (10/10) FINISHED
 => [internal] load build definition from Dockerfile                                     0.2s
 => => transferring dockerfile: 103B                                                     0.0s
 => [internal] load .dockerignore                                                        0.2s
 => => transferring context: 2B                                                          0.0s
 => resolve image config for docker.io/docker/dockerfile:1.2                             1.2s
 => CACHED docker-image://docker.io/docker/dockerfile:1.2@sha256:e2a8561e419ab1ba6b2f... 0.0s
 => [internal] load metadata for docker.io/library/busybox:latest                        0.0s
 => [internal] load build context                                                        0.1s
 => => transferring context: 157B                                                        0.0s
 => [1/4] FROM docker.io/library/busybox                                                 0.0s
 => CACHED [2/4] WORKDIR /project                                                        0.0s
 => CANCELED [3/4] COPY . .                                                              0.3s
 => ERROR [4/4] RUN --mount=source=/assets,target=/project/assets ls -lR && exit 1       0.0s
------
 > [4/4] RUN --mount=source=/assets,target=/project/assets ls -lR && exit 1:
------
failed to compute cache key: "/assets" not found: not found

What I expected

the .dockerignore to exclude the files when using COPY / ADD, but RUN --mount to have access to files in the build-context.
a clearer error in case of a failure;
- "failed to compute cache key" is confusing, and feels like an implementation detail that's not of interest to the end-user
- "/assets" not found: not found; "not found" is included twice in the error
- "/assets" not found: not found; "not found" does not mention that the /assets path is excluded

thaJeztah commented 3 years ago

/cc @tonistiigi @tiborvass

thaJeztah commented 3 years ago

Probably somewhat related; https://github.com/moby/moby/issues/15771 / https://github.com/moby/moby/issues/37333

tonistiigi commented 3 years ago

Yes, this is expected. Mounts with type=bind and no from default to build context. Build context is the same as the source for COPY and applies to .dockerignore rules.

thaJeztah commented 3 years ago

Build context is the same as the source for COPY and applies to .dockerignore rules.

Yup, I understand, and I was somewhat expecting that to be the case. The devil is in the details there;

For "classic" builder, COPY did not use a session, so the only way to prevent sending unused files/directories to the daemon was to use a .dockerignore. BuildKit uses sessions, which for many situations makes .dockerignore redundant (if your COPY / ADD instructions are specific enough).

Unfortunately, there's still situations where "being specific" is either hard, or "impossible"; in situations where "most" files are needed (whole project, except for some paths). For cases where those paths are never needed, using a .dockerignore works, but in situations where (e.g.) some stages don't need the files, but other stages do, it's difficult.

I was hoping --mount would be "smart" here, and because I explicitly picked a path that's excluded (but not the "root"), that it would use a separate context/session for that, and allow me to access those files. (Thinking if that would be problematic, because that would also mean that the --mount could potentially use a snapshot of the build-context that was created at a different time than the build-context used for the COPY; perhaps I'm over-thinking that).

What would be the best way to address these scenarios?

COPY --exclude (or similar); allow excluding file for individual COPY statements? Do we want these scoped for each COPY, or have some notion of "per stage excludes"? Something like;
```
FROM foo AS mystage
EXCLUDE *.foo 
EXCLUDE --ignore-file=/.dockerignore
```
ignore / exclude option for --mount (possibly allow overriding .dockerignore)?
support for multiple build-contexts (https://github.com/moby/moby/issues/37129)?
other ideas?

tonistiigi commented 3 years ago

.dockerignore should be really used like a .gitignore, for ignoring files that are just completely unnecessary for docker tracking, not to make decisions based on target/build configuration. .dockerignore is also not applied to the builds from remote sources (tar/git) which adds to confusion if misused.

So yeah, buildkit ignores the directories that are not used anyway, even without .dockerignore . The rules are the same for the COPY path and for --mount. Internally they are exactly the same thing and that consistency also makes sense for the user.

Having more complicated exclusion filters on COPY or setting default filters in Dockerfile (per stage) is something that can be discussed (likely already an issue).

slmjy commented 1 year ago

Have just stumbled across this issue, and I would suggest that this behavior is very contraintuitive, as --mount in the context of .gitignore is often used to mount the source code inside the container, including temporary files that you may not want in the final container. This feature of the buildkit breaks this usage.

ddelange commented 1 month ago

another common use case is bind mounting .git for one RUN to determine which git tag is currently being baked (e.g. for python packages using setuptools_scm).

obviously we don't want the entire .git folder in the layer, so we have to add it to .dockerignore because we have to COPY . before the RUN

but then you can't bind mount it anymore, a catch 22 that defeats the purpose of bind mounting imo.

thaJeztah commented 1 month ago

obviously we don't want the entire .git folder in the layer, so we have to add it to .dockerignore because we have to COPY . before the RUN

@ddelange for that last part, there's a feature being worked on to allow excluding files for a specific COPY through an --exclude option. That option is not yet in the stable dockerfile syntax (only in the labs variant), so requires you to set a syntax-directive in your Dockerfile; for example;

# syntax=docker/dockerfile:1-labs

FROM alpine
WORKDIR /example

# copy everything, except for the `.git` directory
# and files in ".dockerignore"
COPY --exclude=/.git . .

See the documentation here; https://docs.docker.com/reference/dockerfile/#copy---exclude

FedericoBiccheddu commented 4 weeks ago

Another use case is caching node_modules and other caches (.pnpm-store, .terraform, etc) using --cache-from while building in multi-stage builds.

Using --mount=type=cache is not suitable as docker layers are not shared between runners in CIs like GitLab.

moby / buildkit