moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.05k stars 1.13k forks source link

Cannot copy from source with character device #1624

Open hinshun opened 4 years ago

hinshun commented 4 years ago

Reproduction steps

hinshun/whiteout-repro was created using HLB (which is valid LLB): https://gist.github.com/hinshun/5488224d10090c841a47935a5404419e However, the problem only occurs after you pushed it and then FROM <image> after clearing the BuildKit cache.

FROM hinshun/whiteout-repro@sha256:b53fa3c4e366fb641d81656474cd56d379df7ae518a3e5a7f81045d5c50476db AS repro

FROM scratch
COPY --from=repro / /
buildctl build --frontend=dockerfile.v0 --local context=. --local dockerfile=.
 => ERROR [stage-1 1/1] COPY --from=repro / /                                                                                                                         0.0s
------
 > [stage-1 1/1] COPY --from=repro / /:
------
Dockerfile:4
--------------------
   2 |
   3 |     FROM scratch
   4 | >>> COPY --from=repro / /
   5 |
--------------------
error: failed to solve: rpc error: code = Unknown desc = failed to compute cache key: failed to walk /tmp/buildkit-mount347886175/upperdir/root/.config/configstore/nodemon.json.1809259495: lstat /tmp/buildkit-mount347886175/upperdir/root/.config/configstore/nodemon.json.1809259495: no such file or directory

Inside the buildkitd container

Mounts

# mount | grep buildkit
/dev/nvme0n1p3 on /var/lib/buildkit type ext4 (rw,relatime,errors=remount-ro,data=ordered)
overlay on /tmp/buildkit-mount212065180 type overlay (ro,relatime,lowerdir=/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/10/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/9/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/8/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/7/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/6/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/5/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/4/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/3/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/2/fs:/var/lib/buildkit/runc-overlayfs/snapshots/snapshots/1/fs)

The directory containing the whiteout for overlay (a character device)

# ls -la /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/4/fs/root/.config/configstore
total 12
drwx------    2 root     root          4096 Aug  4 02:14 .
drwx------    3 root     root          4096 Aug  4 02:14 ..
-rw-------    1 root     root            31 Aug  4 02:14 nodemon.json
c---------    1 root     root        0,   0 Aug  4 04:33 nodemon.json.316330199

Cannot interact with device:

# cd /var/lib/buildkit/runc-overlayfs/snapshots/snapshots/4/fs/root/.config/configstore
# cat nodemon.json.3163301993
cat: can't open 'nodemon.json.3163301993': No such device or address

Running ls from merged directory of overlay:

# ls -la /tmp/buildkit-mount212065180/root/.config/configstore/
ls: /tmp/buildkit-mount212065180/root/.config/configstore/nodemon.json.3163301993: No such file or directory
total 12
drwx------    2 root     root          4096 Aug  4 02:14 .
drwx------    3 root     root          4096 Aug  4 02:14 ..
-rw-------    1 root     root            31 Aug  4 02:14 nodemon.json
sipsma commented 4 years ago

I've been looking into this along with @hinshun and @tonistiigi on Slack, here's what I found:

  1. The whiteout device nodemon.json.3163301993 is showing up inside the container because of some strangeness with overlay.
    • Basically, if overlay determines that a directory is not "merged" (that is, the path of directory only appears once across all lowers+upper), then it will pass syscalls such as getdents on it directly to the underlying dir. /upperdir/root/.config/configstore/ only appears in one lowerdir, so when you ls on it, the whiteout device shows up as a dirent.
    • However, overlay prevents you from seeing any whiteout devices, so that's why you get ENOENT errors when trying to make any syscalls involving nodemon.json.3163301993's path.
    • Example of creating a simpler version of this situation: https://gist.github.com/sipsma/f113c578f3a8a86a93cd047f26bfa273
  2. This weirdness obviously can cause problems when you, for example, create an upper layer that has a whiteout device (because a file got deleted) and then later use that upper layer as lowerdir in another mount.

However, that fix obviously isn't working in hinshun's case. Not sure, but I am wondering if fuse-overlayfs is not setting the expected xattr. When I pulled down hinshun's image I could reproduce the issue within a docker container and saw that the /upperdir/root/.config/configstore/ dir didn't have trusted.overlay.origin set, but instead had trusted.overlay.opaque:

root@bincastle-dev:/home/sipsma# xattr /var/lib/docker/overlay2/l/X263JCQLFIOQYTTIOXHADDPJV5/upperdir/root/.config/configstore/
trusted.overlay.opaque

When I changed it to have trusted.overlay.origin set (xattr -w trusted.overlay.origin "" /var/lib/docker/overlay2/l/X263JCQLFIOQYTTIOXHADDPJV5/upperdir/root/.config/configstore), I was no longer able to reproduce the issue and the whiteout didn't appear inside the docker container.

I'm not sure yet why trusted.overlay.opaque is being set but trusted.overlay.origin isn't; it may be worth asking the fuse-overlayfs devs if that's expected behavior.

sipsma commented 4 years ago

Okay, looked some more, there's actually yet another layer to this issue (pun intended).

@hinshun pointed out this code in fuse-overlayfs where opaque gets set, which made me realize that fuse-overlayfs was probably not actually able to set xattrs at all because it was using a real overlay mount as the upperdir of the fuse overlay (files/dirs under overlay mounts don't support xattrs).

Based on that code, that means fuse-overlayfs actually most likely failed to set any xattrs at all (both opaque and origin. The reason you can still see opaque as an xattr when you pull down the docker image is most likely due to the fact that fuse-overlayfs fallsback to creating a .wh..wh..opq file in place of setting an opaque xattr, which containerd exporters know about and handle correctly: https://github.com/containerd/containerd/blob/0ab7f03feecaa9fc51b63dbb634e74d04d68176f/archive/tar_opts_linux.go#L40-L44

So, in summary, I think the issue basically comes down to the fact that origin should have been set on /upperdir/root/.config/configstore/ but couldn't because the upperdir of the fuse mount didn't support xattrs.

hinshun commented 4 years ago

So, in summary, I think the issue basically comes down to the fact that origin should have been set on /upperdir/root/.config/configstore/ but couldn't because the upperdir of the fuse mount didn't support xattrs.

So should the containerd exporter be detecting the .wh..wh..opq files and translating that to xattrs on the directory?

sipsma commented 4 years ago

So should the containerd exporter be detecting the .wh..wh..opq files and translating that to xattrs on the directory?

Yes, and I think it does, but my understanding is that .wh..wh..opq only handles setting the opaque, not origin. As far as I can see there isn't an equivalent for origin xattrs for some reason, not sure why.