moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.15k stars 1.15k forks source link

Rootless mode doesn't work on Google Container-Optimized OS kernel (CONFIG_SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS?) #879

Closed AkihiroSuda closed 2 years ago

AkihiroSuda commented 5 years ago
~ $ cat Dockerfile
FROM alpine
~ $ export BUILDKIT_HOST=tcp://127.0.0.1:1234
~ $ buildctl b --frontend dockerfile.v0 --local context=. --local dockerfile=.
[+] Building 0.0s (2/2) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                       0.0s
 => => transferring dockerfile: 49B                                                                                                                                                                                                                                        0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                          0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                            0.0s
error: failed to solve: rpc error: code = Unknown desc = failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount290620720: [{Type:bind Source:/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1 Options:[rbind ro]}]: operation not permitted

But unshare -rm mount works 🤔

~ $ unshare -mr
buildkitd-649b4db5d4-jskbq:/home/user# mount --rbind -o ro /home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1 /home/user/.local/tmp/buildkit-mount710693070

$ kubectl get nodes -o wide
NAME                                        STATUS    ROLES     AGE       VERSION         EXTERNAL-IP      OS-IMAGE                             KERNEL-VERSION   CONTAINER-RUNTIME
*****************************************   Ready     <none>    19m       v1.12.5-gke.5   **************   Container-Optimized OS from Google   4.14.89+         docker://17.3.2
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: buildkitd
  name: buildkitd
spec:
  selector:
    matchLabels:
      app: buildkitd
  template:
    metadata:
      labels:
        app: buildkitd
      annotations:
        container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
        container.seccomp.security.alpha.kubernetes.io/buildkitd: unconfined
    spec:
      containers:
      - image: moby/buildkit:v0.4.0-rootless@sha256:3877d091e65429f59919ed5591aaeb863b1889a5314bdfdba5ff9c0dfb2f3ed0
        args:
        - --addr
        - tcp://0.0.0.0:1234
        - --oci-worker-no-process-sandbox
        name: buildkitd
        ports:
        - containerPort: 1234
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: buildkitd
  name: buildkitd
spec:
  ports:
  - port: 1234
    protocol: TCP
  selector:
    app: buildkitd
AkihiroSuda commented 5 years ago

Note: the same step (w/ --oci-worker-snapshotter=native) succeeds with the following envs:

AkihiroSuda commented 5 years ago

wondering this might be related to ChromiumOS LSM, but not sure https://chromium.googlesource.com/chromiumos/third_party/kernel/+/HEAD/security/chromiumos

tonistiigi commented 5 years ago

@AkihiroSuda just to be clear, it does not work without setting securityContext in GKE?

AkihiroSuda commented 5 years ago

No, even privileged: true does not work with rootless image.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: buildkitd
  name: buildkitd
spec:
  selector:
    matchLabels:
      app: buildkitd
  template:
    metadata:
      labels:
        app: buildkitd
    spec:
      containers:
      - image: moby/buildkit:v0.4.0-rootless@sha256:3877d091e65429f59919ed5591aaeb863b1889a5314bdfdba5ff9c0dfb2f3ed0
        args:
        - --addr
        - tcp://0.0.0.0:1234
        name: buildkitd
        ports:
        - containerPort: 1234
        securityContext:
          privileged: true

With rootful image, it works. (tested both overlay and native for rootful)

tonistiigi commented 5 years ago

@AkihiroSuda So is this a regression in v0.4 ?

AkihiroSuda commented 5 years ago

No, even v0.3.0-rootless w/ securityContext: privileged does not work now.

This is rather likely to be a regression in GKE, although I don't have any evidence that v0.3.0-rootless had been working on GKE.

AkihiroSuda commented 5 years ago

v0.4.0-rootless (both overlay and native; both w/ and w/o privileged) works with GKE Ubuntu nodes (kernel 4.15.0-1026-gcp #27-Ubuntu, kube v1.11.7-gke.4, Ubuntu 18.04.1, docker://17.3.2).

Seems an issue on Google COS.

AkihiroSuda commented 5 years ago

strace:

buildkit (fails) (https://github.com/containerd/containerd/pull/1373)

[pid 15561] mkdirat(AT_FDCWD, "/home/user/.local/tmp/buildkit-mount226977687", 0700) = 0
[pid 15561] mount("/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1", "/home/user/.local/tmp/buildkit-mount226977687", 0xc0001f2848, MS_RDONLY|MS_BIN
D|MS_REC, NULL) = 0
[pid 15561] mount("", "/home/user/.local/tmp/buildkit-mount226977687", 0xc0001f284e, MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC, NULL) = -1 EPERM (Operation not permitted)

mount -o rbind,ro (succeeds)

[pid 17658] mount("/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1", "/home/user/.local/tmp/buildkit-mount226977687", NULL, MS_RDONLY|MS_BIND|MS_REC|MS_SILENT, NULL)
= 0

likely to be related to SECURITY_CHROMIUMOS_NO_UNPRIVILEGED_UNSAFE_MOUNTS https://chromium.googlesource.com/chromiumos/third_party/kernel/+/479f3ad5abb7fe6c95aee87a07fc2536ea6039ee/security/chromiumos/Kconfig#21 https://chromium.googlesource.com/chromiumos/third_party/kernel/+/479f3ad5abb7fe6c95aee87a07fc2536ea6039ee/security/chromiumos/lsm.c#133

meysholdt commented 4 years ago

I just tried with the COS nodes of 1.15.4-gke.18 and the regressions seems to be still there :(

JesterOrNot commented 4 years ago

Any updates on this issue?

AkihiroSuda commented 4 years ago

Needs help from Google

JesterOrNot commented 4 years ago

So can anything be done?

AkihiroSuda commented 4 years ago

Maybe https://github.com/AkihiroSuda/containerd-fuse-overlayfs can be a solution, but blocked due to go mod hell

1297

JesterOrNot commented 4 years ago

Can I do anything to help?

AkihiroSuda commented 4 years ago

Another way is to replace the failing mount flags
with what "unshare -rm mount" example in the top comment of this issue uses.

This needs more investigation and help is appreciated, thanks.

JesterOrNot commented 4 years ago

So you want to change the error? (sorry I'm new)

AkihiroSuda commented 4 years ago

"unshare -rm mount" example doesn't produce any error, and we want to avoid BuildKit error by using the same mount flags

AkihiroSuda commented 4 years ago

I assumed fuse-overlayfs snapshotter may work, but seems not :cry:

$ buildctl --addr=kube-pod://buildkitd build --frontend dockerfile.v0 --local dockerfile=. --local context=.
[+] Building 0.2s (2/2) FINISHED                                                                                                
 => [internal] load build definition from Dockerfile                                                                       0.2s
 => => transferring dockerfile: 109B                                                                                       0.2s
 => [internal] load .dockerignore                                                                                          0.2s
 => => transferring context: 2B                                                                                            0.2s
error: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to read dockerfile: failed to mount /home/user/.local/tmp/buildkit-mount998042514: [{Type:bind Source:/home/user/.local/share/buildkit/runc-fuse-overlayfs/snapshots/snapshots/1/fs Options:[rbind ro]}]: operation not permitted
AkihiroSuda commented 4 years ago

Not only the issue in snapshotter

$ git diff
diff --git a/vendor/github.com/containerd/containerd/mount/mount_linux.go b/vendor/github.com/containerd/containerd/mount/mount_linux.go
index a7edd455..526640be 100644
--- a/vendor/github.com/containerd/containerd/mount/mount_linux.go
+++ b/vendor/github.com/containerd/containerd/mount/mount_linux.go
@@ -93,7 +93,10 @@ func (m *Mount) Mount(target string) error {
        const broflags = unix.MS_BIND | unix.MS_RDONLY
        if oflags&broflags == broflags {
                // Remount the bind to apply read only.
-               return unix.Mount("", target, "", uintptr(oflags|unix.MS_REMOUNT), "")
+               unix.Mount("", target, "", uintptr(oflags|unix.MS_REMOUNT), "")
+               // DO-NOT-MERGE:
+               // ignore err here to avoid hitting https://github.com/moby/buildkit/issues/879#issuecomment-473396544
+               // How can we ensure target to be read-only?
        }
        return nil
 }

$ buildctl --addr=kube-pod://buildkitd build --frontend dockerfile.v0 --local dockerfile=. --local context=
.
[+] Building 6.1s (5/6)                                                                                                         
 => [internal] load build definition from Dockerfile                                                                       0.2s
 => => transferring dockerfile: 109B                                                                                       0.2s
 => [internal] load .dockerignore                                                                                          0.2s
 => => transferring context: 2B                                                                                            0.1s
 => [internal] load metadata for docker.io/library/alpine:latest                                                           3.3s
 => [1/3] FROM docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d            2.1s
 => => resolve docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d            0.0s
 => => sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d 1.64kB / 1.64kB                             0.0s
 => => sha256:ddba4d27a7ffc3f86dd6c2f92041af252a1f23a8e742c90e6e1297bfa1bc0c45 528B / 528B                                 0.0s
 => => sha256:c9b1b535fdd91a9855fb7f82348177e5f019329a58c53c47272962dd60f71fc9 2.80MB / 2.80MB                             1.2s
 => => sha256:e7d92cdc71feacf90708cb59182d0df1b911f8ae022d29e8e95d75ca6a99776a 1.51kB / 1.51kB                             0.0s
 => => unpacking docker.io/library/alpine@sha256:ab00606a42621fb68f2ed6ad3c88be54397f981a7b70a79db3d1172b11c4367d          0.1s
 => ERROR [2/3] RUN apk add --no-cache figlet                                                                              0.1s
------
 > [2/3] RUN apk add --no-cache figlet:
#5 0.084 container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.g
o:58: mounting \\\"/home/user/.local/share/buildkit/runc-native/executor/resolv.conf\\\" to rootfs \\\"/home/user/.local/share/b
uildkit/runc-native/executor/c9qbj5rmvwnjixos72ek7k7ko/rootfs\\\" at \\\"/home/user/.local/share/buildkit/runc-native/executor/c
9qbj5rmvwnjixos72ek7k7ko/rootfs/etc/resolv.conf\\\" caused \\\"operation not permitted\\\"\""
------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c apk add --no-cache figlet]: buildki
t-runc did not terminate successfully
dinvlad commented 3 years ago

Any updates on this?

ei-grad commented 2 years ago

Using an idea from https://github.com/bottlerocket-os/bottlerocket/issues/1934 I added an emptyDir volume to /home/user/.local/share/buildkit and it worked.

AkihiroSuda commented 2 years ago

@ei-grad On GCOS kernel? 👀

AkihiroSuda commented 2 years ago

Isn't this VOLUME working by default? 🤔 https://github.com/moby/buildkit/blob/c9a0f4d2de095591e742d7f411d9ed36a03a1c4e/Dockerfile#L292

ei-grad commented 2 years ago

On GCOS kernel

Yes, latest GKE with cos_containerd image. I got fully functional rootless buildkit with resource definitions from examples/kubernetes with only added a emptyDir/hostPath volumeMount for /home/user/.local/share/buildkit.

Isn't this VOLUME working by default? 🤔

Yes, and that's the problem - default volumes are mounted with nosuid,nodev flags, which cause Permission denied error trying to remount this volume without this flags. See details in an excellent investigation from @bcressey there in linked bottlerocket issue.

ei-grad commented 2 years ago

I got fully functional rootless buildkit

Just to clarify - using the native snapshotter. Using the overlayfs/fuse-overlayfs on GCOS requies priveleged: true since the kernel is 5.10 😢.