Closed AhmadMS1988 closed 5 months ago
Does it work if you specify securityContext.privileged
?
We do not want to run it in privileged mode.
We do not want to run it in privileged mode.
Asking for a diagnosis purpose
It worked actually, but still the purpose to run it without privileged.
One question comes to my mind, as we by default use the oci worker, what is this containerd mount?
One question comes to my mind, as we by default use the oci worker, what is this containerd mount?
OCI mode still consumes containerd as a library
Is there any logs or commands that I can execute to help investigating more?
Is there any logs or commands that I can execute to help investigating more?
cat /proc/mounts
in the buildkitd container, and compare the result with Ubuntu nodes, etc.
It worked as expected on both Amazon linux 2 and Ubuntu EKS optimized images based on 20.04. The output of /proc/mounts is:
overlay / overlay rw,context="system_u:object_r:data_t:s0:c208,c287",relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/71/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/59/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/55/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/50/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/45/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/40/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/425/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/425/work 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
devpts /dev/pts devpts rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
mqueue /dev/mqueue mqueue rw,seclabel,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup cgroup2 ro,seclabel,nosuid,nodev,noexec,relatime 0 0
/dev/nvme1n1p1 /etc/hosts xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/nvme1n1p1 /dev/termination-log xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/nvme1n1p1 /etc/hostname xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/nvme1n1p1 /etc/resolv.conf xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
shm /dev/shm tmpfs rw,seclabel,nosuid,nodev,noexec,relatime,size=65536k 0 0
/dev/nvme1n1p1 /home/user/.local/share/buildkit xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
tmpfs /run/secrets/kubernetes.io/serviceaccount tmpfs ro,seclabel,relatime,size=6931992k 0 0
proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
tmpfs /proc/acpi tmpfs ro,context="system_u:object_r:data_t:s0:c208,c287",relatime 0 0
tmpfs /proc/kcore tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/keys tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/latency_stats tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/timer_list tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/scsi tmpfs ro,context="system_u:object_r:data_t:s0:c208,c287",relatime 0 0
tmpfs /sys/firmware tmpfs ro,context="system_u:object_r:data_t:s0:c208,c287",relatime 0 0
Can you please take a look and provide feedback so I can open a ticket to Bottlerocket team with the details? Thanks
Seems relevant to SELinux? Does this work?
securityContext:
seLinuxOptions:
level: s0
type: spc_t
Unfortunately, it did not work. I got the same error.
As far as I can tell this is the same error that was fixed in #3697, but at a different stage in the process.
Running mountsnoop
from bcc
, I can see that the initial set of bind mounts go OK:
buildkitd 210370 210738 4026533418 mount("/home/user/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/2/fs", "/home/user/.local/tmp/buildkit-mount276192057", "bind", MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOATIME|MS_BIND|MS_REC, "") = 0
buildkitd 210370 210738 4026533418 mount("", "/home/user/.local/tmp/buildkit-mount276192057", "", MS_RDONLY|MS_NOSUID|MS_NODEV|MS_REMOUNT|MS_NOATIME|MS_BIND|MS_REC, "") = 0
...
However, the operation ultimately fails in the call to overlay.WriteUpperdir
:
2024-05-06T01:07:57.845201317Z stderr F time="2024-05-06T01:07:57Z" level=warning msg="failed to compute blob by overlay differ (ok=false): failed to write compressed diff: failed to mount /home/user/.local/tmp/containerd-mount1074778686: operation not permitted" span="export layers" spanID=0f5a00d506b35262 traceID=32ade31627d6b338d5e3051b59dea3e2
From the related mountsnoop
output, we can see that the nosuid
and nodev
flags were not passed:
buildkitd 210370 210739 4026533418 mount("/home/user/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/4/fs", "/home/user/.local/tmp/containerd-mount1074778686", "bind", MS_RDONLY|MS_BIND|MS_REC, "") = 0
buildkitd 210370 210739 4026533418 mount("", "/home/user/.local/tmp/containerd-mount1074778686", "", MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC, "") = -EPERM
overlay.WriteUpperdir
calls into mount.WithTempMount
, which uses the containerd mount library. It looks like we end up here and then the remount fails because it doesn't have the equivalent of the UnprivilegedMountFlags
logic.
overlay.WriteUpperdir calls into mount.WithTempMount, which uses the containerd mount library. It looks like we end up here and then the remount fails because it doesn't have the equivalent of the UnprivilegedMountFlags logic.
@bcressey Thanks for analysis. Would you be interested in submitting a PR?
@bcressey if okay, I can work on a fix for this.
@swagatbora90 that'd be great! Let me know if I can help advise on setting up a test environment, or testing out a change when ready.
As mentioned by @bcressey , Bottlerocket mounts its local storage with “nosuid” and “nodev” flags as a hardening step, and those flags are among those that have to be passed in subsequent bind mounts.
Here is the workaround using a persistent volume(EBS csi driver in EKS) instead of emptyDir that in turn uses Bottlerocket's local storage
Pod: Used fsGroup as 1000 to mount the volume within the pod for user (1000) and the Group (1000) to have access
Pod yaml - https://github.com/vtgspk/buildkit-rootless/blob/main/pod.yml Persistent Volume Claim- https://github.com/vtgspk/buildkit-rootless/blob/main/persistent-claim.yml Storage class - https://github.com/vtgspk/buildkit-rootless/blob/main/storage-class.yml
By this way, I am able to get the buildkitd pod up and running and build images successfully within that which uses the EBS mount instead of the Bottlerocket local storage.
@bcressey @AkihiroSuda Added PR to check and preserve unprivileged flags before we remount a bind mount for readonly. However, the change alone was not sufficient and also had to update the above pod spec to mount the /tmp
directory from the host
pod.spec
apiVersion: v1
kind: Pod
metadata:
name: buildkitd
spec:
containers:
- name: buildkitd
image: public.ecr.aws/e5v3s6y4/buildkit-rootless:rootless
args:
- --addr
- tcp://0.0.0.0:1234
- --oci-worker-no-process-sandbox
- --debug
securityContext:
seccompProfile:
type: Unconfined
runAsUser: 1000
runAsGroup: 1000
volumeMounts:
# The first mount is not needed, but makes it explicit that there
# is a VOLUME here which shows up as a separate mount, which is why
# buildkit is able to find the unprivileged mount flags it needs to
# preserve.
- mountPath: /home/user/.local/share/buildkit
name: buildkitd-1
# The second mount is needed, because otherwise there's no explicit
# mount to inspect for mount options, and the underlying filesystem's
# mount flags are obscured by the overlayfs used for the container's
# rootfs.
- mountPath: /home/user/.local/tmp
name: buildkitd-2
env:
# This is required to align the temporary directory created by buildkit
# with the volume mount for that directory.
- name: XDG_RUNTIME_DIR
value: /home/user/.local/tmp
- name: runner
image: moby/buildkit:rootless
command: [ "/bin/sh", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
env:
- name: BUILDKIT_HOST
value: tcp://localhost:1234
volumes:
- name: buildkitd-1
emptyDir: {}
- name: buildkitd-2
emptyDir: {}
Exposing the tmp dir as a bind mount in the container is required, otherwise the directory is just in the container root and its actual mount flags get obfuscated by overlayfs. So, the check for unprivileged flags no longer works. Inorder to make this work we need both 1) Update containerd mount library to preserve nosuid, nodev flags 2) Pod spec update to bind mount /tmp dir.
Let me know if this makes sense. I am also wondering if we no longer need #3697 since we are already checking for the flags downstream in containerd. I will test this out next.
@bcressey @AkihiroSuda Added PR to check and preserve unprivileged flags before we remount a bind mount for readonly. However, the change alone was not sufficient and also had to update the above pod spec to mount the
/tmp
directory from the hostpod.spec
apiVersion: v1 kind: Pod metadata: name: buildkitd spec: containers: - name: buildkitd image: public.ecr.aws/e5v3s6y4/buildkit-rootless:rootless args: - --addr - tcp://0.0.0.0:1234 - --oci-worker-no-process-sandbox - --debug securityContext: seccompProfile: type: Unconfined runAsUser: 1000 runAsGroup: 1000 volumeMounts: # The first mount is not needed, but makes it explicit that there # is a VOLUME here which shows up as a separate mount, which is why # buildkit is able to find the unprivileged mount flags it needs to # preserve. - mountPath: /home/user/.local/share/buildkit name: buildkitd-1 # The second mount is needed, because otherwise there's no explicit # mount to inspect for mount options, and the underlying filesystem's # mount flags are obscured by the overlayfs used for the container's # rootfs. - mountPath: /home/user/.local/tmp name: buildkitd-2 env: # This is required to align the temporary directory created by buildkit # with the volume mount for that directory. - name: XDG_RUNTIME_DIR value: /home/user/.local/tmp - name: runner image: moby/buildkit:rootless command: [ "/bin/sh", "-c", "--" ] args: [ "while true; do sleep 30; done;" ] env: - name: BUILDKIT_HOST value: tcp://localhost:1234 volumes: - name: buildkitd-1 emptyDir: {} - name: buildkitd-2 emptyDir: {}
Exposing the tmp dir as a bind mount in the container is required, otherwise the directory is just in the container root and its actual mount flags get obfuscated by overlayfs. So, the check for unprivileged flags no longer works. Inorder to make this work we need both 1) Update containerd mount library to preserve nosuid, nodev flags 2) Pod spec update to bind mount /tmp dir.
Let me know if this makes sense. I am also wondering if we no longer need #3697 since we are already checking for the flags downstream in containerd. I will test this out next.
Updating to buildkit v0.14.0
resolves the failed to mount
issue when using the rootless configuration on bottlerocket.
Hi fellows in buildkit. I know this might have open multiple times like here and here, but will try to bring it again with more details so you may be able to help more. I am trying to run buildkit in rootless in EKS using bottlerocket, the infra information are below:
The below pod definition is used:
Note the the runner is actually a custom image that we use in our CI, but replaced with the same buildkit container as it has buildctl to use, but buildkit container is the same.
When we run buildctl on the runner, we get the following error:
Bottlerocket is configured with:
Really appreciate your help in identifying where the missing peace to let this to work. Thank you