moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.21k stars 1.16k forks source link

COPY command changes the directory permissions to change. #3602

Open veritas9872 opened 1 year ago

veritas9872 commented 1 year ago

Description

In the latest v23.0.0 update, I found that using COPY --link FILE.txt /tmp/FILE.txt caused the permissions of the /tmp directory to change from 777 to 755. This caused problems downstream because apt no longer had write permissions to the /tmp directory. Though this can be fixed by changing the directory that the file was copied to, this is obviously a bug.

Reproduce

git clone https://github.com/cresset-template/cresset.git
cd cresset
git checkout 2fc0889
make env
make build

Expected behavior

I tried adding RUN ls -alh /tmp to the Dockerfile before and after the COPY command in line 418 of the Cresset Dockerfile and the file permissions that I found were different. The COPY command should not change the permissions of the directory.

docker version

Client: Docker Engine - Community
 Version:           23.0.0
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        e92dd87
 Built:             Wed Feb  1 17:49:08 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.0
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       d7573ab
  Built:            Wed Feb  1 17:49:08 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.16
  GitCommit:        31aa4358a36870b21a992d3ad2bef29e1d693bec
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  compose: Docker Compose (Docker Inc.)
    Version:  v2.15.1
    Path:     /home/veritas/.docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.17.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 4
  Running: 3
  Paused: 0
  Stopped: 1
 Images: 19
 Server Version: 23.0.0
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: nvidia runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 31aa4358a36870b21a992d3ad2bef29e1d693bec
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 5.4.0-137-generic
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 80
 Total Memory: 251.6GiB
 Name: blu3-001
 ID: 5633:5QID:26UF:2LQP:2JJE:5EQO:KXYX:KNGY:5FLX:DEFQ:NWUK:LUWJ
 Docker Root Dir: /data1/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional Info

Before COPY:

=> [train  2/12] RUN ls -alh /tmp && sleep 20                                                                                                                    8.0s
 => => # total 8.0K
 => => # drwxrwxrwt 2 root root 4.0K Jan 26 02:06 .
 => => # drwxr-xr-x 1 root root 4.0K Feb  8 06:56 ..

After COPY:

 => [train  4/12] RUN ls -alh /tmp && sleep 20                                                                                                                    9.3s
 => => # total 12K
 => => # drwxr-xr-x 2 root root 4.0K Feb  8 05:28 .
 => => # drwxr-xr-x 1 root root 4.0K Feb  8 06:57 ..
thaJeztah commented 1 year ago

Thanks for reporting. I think this may be a combination of things, and I think some of this was discussed on https://github.com/moby/buildkit/issues/2414#issuecomment-943325055;

While that may explain what's happening, I agree this is somewhat confusing, and it definitely feels unexpected that the permissions change.

So, if the above is the reason, I wonder what the options are for this 🤔;

One option would be to copy the directory permissions of the parent layer

However, this could potentially defeat the purpose of --link, as it now becomes dependent on the parent layer?

Effectively, this would be making --parents option (not yet available) the default if --link is used. Possibly the COPY --parents option (once available) would take away that ambiguity (i.e., it would allow for the user to opt-in to have the parent directory permissions be "baked" into the layer.

thaJeztah commented 1 year ago

@jedevc @tonistiigi @crazy-max PTAL (should this be moved to the BuildKit issue tracker?)

crazy-max commented 1 year ago

BuildKit 0.11.2

FROM alpine AS base
RUN apk add --no-cache tree
RUN tree -apufi /tmp
COPY foo.txt /tmp/foo.txt
RUN tree -apufi /tmp
COPY --link foo.txt /tmp/foo.txt
RUN tree -apufi /tmp
...
moby/moby#7 [3/7] RUN tree -apufi /tmp
moby/moby#7 0.136 [drwxrwxrwt root    ]  /tmp
moby/moby#7 0.136
moby/moby#7 0.136 0 directories, 0 files
moby/moby#7 DONE 0.2s

moby/moby#8 [4/7] COPY foo.txt /tmp/foo.txt
moby/moby#8 DONE 0.1s

moby/moby#9 [5/7] RUN tree -apufi /tmp
#0 0.134 [drwxrwxrwt root    ]  /tmp
#0 0.134 [-rwxrwxrwx root    ]  /tmp/foo.txt
#0 0.134
#0 0.134 0 directories, 1 file
moby/moby#9 DONE 0.2s

moby/moby#10 [6/7] COPY --link foo.txt /tmp/foo.txt
moby/moby#10 merging 0.0s done
moby/moby#10 DONE 0.1s

moby/moby#11 [7/7] RUN tree -apufi /tmp
#0 0.125 [drwxr-xr-x root    ]  /tmp
#0 0.125 [-rwxrwxrwx root    ]  /tmp/foo.txt
#0 0.125
#0 0.125 0 directories, 1 file
moby/moby#11 DONE 0.1s

should this be moved to the BuildKit issue tracker?

Yes please

thaJeztah commented 1 year ago

I transferred the ticket to the BuildKit issue tracker 👍

tonistiigi commented 1 year ago

As @thaJeztah wrote, --link implies that the full path is linked over the existing source, it is not merged with the previous one. This is one of the reasons --link is a new opt-in flag. You need to make sure that your source is prepared so that it doesn't contain invalid permissions.

You can read the background of this at https://github.com/moby/buildkit/issues/2414 . It is possible that in some future release, we can improve this a bit by removing the parent directory record from the layer tarball when we create the image, but there will always be specific cases (symlinks, chown etc) where doing a linked copy requires these extra precaucions.

veritas9872 commented 1 year ago

This code used to function properly in previous Docker engine versions, is there any reason why it suddenly failed in the latest update?

thaJeztah commented 1 year ago

I wonder (but @tonistiigi may be able to confirm) if that's where these come into play;

If the COPY --link step in this case does not add a directory entry for /tmp, and the "parent" layer is FROM scratch then the permissions for that directory are "undefined"; versions before 23.0.0 of the daemon (see https://github.com/moby/moby/issues/44106) used 0777 and depended on umask, which resulted in undefined behavior (depending on the umask and storage driver used, this could either result in 0755 permissions or 0600).

tonistiigi commented 1 year ago

This code used to function properly in previous Docker engine versions, is there any reason why it suddenly failed in the latest update?

COPY --link was first implemented in BuildKit v0.10 that was added to Moby with v23.0 release. If you used --link in an older release, that didn't do any linking behavior then you get the previous semantics exactly as without any --link flag.

Dockerfiles that opt-in to --link are compatible with previous copy semantics. But if you just take old Dockerfile and add --link to all COPY commands you can get slightly different semantics depending on what source parameters and flags are used.

thaJeztah commented 1 year ago

Oh! Good one, I assumed it depended on the Dockerfile frontend, but didn't consider the option would be previously silently ignored.

I guess it no longer helps for this specific case (now that 23.0 is out), but do you know if warnings are still planned for cases like this? (Dockerfile uses option that's not supported by a specific version of BuildKit) also see https://github.com/moby/buildkit/issues/1952#issuecomment-764816376

Kaiyang-Chen commented 1 year ago

I also encounter some unexpected directory permission changes using 23.0.0 with moby worker in buildkit. Instead of using COPY op, what i did is File(llb.Mkdir("/var/log/xxx", 0777, llb.WithParents(true)), llb.WithCustomNamef("[internal] mkdir for xxx log" )) But after enter the container, i find the permission of the directory to be 0755, which prevent me for adding log files under the directory. Is there any possible reason or undefined behaviors that can caused this?

thaJeztah commented 1 year ago

@Kaiyang-Chen perhaps default umask (022) being applied to 0777 ?

Kaiyang-Chen commented 1 year ago

@Kaiyang-Chen perhaps default umask (022) being applied to 0777 ?

@thaJeztah In your previous reply, version before 23.0.0 used 0777 for undefined directory, and after it there will be a default umask apply to it? It seems for version before 23.0.0, the directory permission is indeed 0777 using container-based buildkit in above implementation, but when using moby worker, the permission are changed to 0755. Are there any difference between built-in moby worker from buildx and container-based buildkit in such behavious? What's more, in my case, when creating the directory, i explicitly set the permission to 0777, why there is still umask apply to it? Thanks for your help!

tonistiigi commented 1 year ago

Is there any possible reason or undefined behaviors that can caused this?

Are you sure the directory didn't exist before? Mkdir does not overwrite perms for existing directories.

The umask should not apply. If it does, then it is a bug.

The last issue is completely different from the initial report. If you have some reproducers for it, then create a new issue with all info.

veritas9872 commented 1 year ago

The /tmp directory existed before the COPY --link command was executed, I am certain because I ran ls -alh on the /tmp directory to get the outputs. It therefore seems to be a bug to me.

crazy-max commented 5 months ago

Someone reported a similar issue but without --link: https://github.com/docker/build-push-action/issues/1130#issuecomment-2163185050

tonistiigi commented 5 months ago

The /tmp directory existed before the COPY --link command was executed

This does not matter. --link does not have a dependency on destination directory (and therefore does not invalidate cache if destination changes).

Link is equivalent to:

FROM scratch AS layerX
COPY files /some/dest/
polarathene commented 2 months ago

One option would be to copy the directory permissions of the parent layer

However, this could potentially defeat the purpose of --link, as it now becomes dependent on the parent layer?

Has the findings below with --chmod made --link dependent in some way? Or by parent layer were you referring to the ownership/permissions of the source path segments?

I notice you mentioned permissions specifically, was that also applicable to ownership expectations with --chown?


Link is equivalent to

FROM scratch AS layerX
COPY files /some/dest/

What about when --chown / --chmod are used?

From this reference we have this minimal example:

FROM dest AS with-chmod-fix
COPY --link --chown=300 --chmod=null --from=src /foo/bar/baz /foo/bar/baz

Ownership and permissions of the dest stage remains intact:

Octal User Group Name
0770  200  200   /foo
0770  200  200   └── bar
0770  200  200      └── baz
0777  300  300         ├── a
0777  300  300         ├── b
0777  300  300         └── c

If the --chmod=null was removed however.. --chown=300 now also modifies parent directories ownership and the their permissions are modified to 0755:

Octal User Group Name
0755  300  300   /foo
0755  300  300   └── bar
0755  300  300      └── baz
0777  300  300         ├── a
0777  300  300         ├── b
0777  300  300         └── c

Observations:

Seems like this inconsistency is a bug? (UPDATE: Full overview documented here)


there will always be specific cases (symlinks, chown etc) where doing a linked copy requires these extra precautions.

Does the --chown or --chmod flags apply here? How can I confirm if the --link functionality is actually working still vs being treated as --link=false?