Closed chrischdi closed 1 month ago
The leaked mounts are most probably the result of wrong mount propagation used.
The bug it causes might be workarounded by making GetCgroupMounts
prefer the standard paths, i.e. /sys/fs/cgroup/$controller
. I will look into it.
Hi @kolyshkin , with wrong mount propagation: Is there something we can do about it?
We are using kubernetes and the problem occurs only on the CSI Nodeplugin, which needs to have bidirectional
mount propagation for /var/lib/kubelet
and HostToContainer
for /dev
.
The first one is needed, because the CSI plugin does do mounts for other containers (prior to the other pod starts) and I think /dev
is needed for CSI to see new attached disks.
Thank you for looking into it :-)
One other approach to workaround it would be to check the parent ID
field (second field in /proc/self/mountinfoto be the same for all mounts (or to be equal to the
mount IDfield of
/sys/fs/cgroup` mount.
I still don't understand why GetCgroupMounts
is not picking up the first mount. I know there is a race in the kernel when it comes to serving /proc/self/mountinfo (and similar files) -- in particular, if the next entry to be read is deleted (i.e. the mount is unmounted), the rest of mountinfo is never read. But it is not applicable to the case.
When I debugged into it I have seen that the entries in /proc/self/mountinfo
were ordered in another way, compared to when I did a simple cat /proc/self/mountinfo
. But I also don't know why the output was not the same.
In fact we should always use /sys/fs/cgroup, this seems to be the de-facto standard these days. It will still be interesting to see /proc/self/mountinfo where other cgroup entries precede those with /sys/fs/cgroup mountpotint.
We are also experiencing the same thing with csi-rbd plugin. Found @chrischdi thread, and was able to delete the extra mounts in order for kubelet to come up. We are on coreos -- 4.19.106 (Coreos 2345.3.0).
The extra mounts weren't with the string /run/containerd/
. They aren't uniform neither.
@kolyshkin We have the same problem too, and I find not only cgroup mount was leaked to host mount ns, all mount in the csi container which use the bidirection mount propagation were leaked.
❯ cat mount|grep 7d0849b82c48 overlay on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs) type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/278/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/244/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/15/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/279/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/279/work,index=off,nfs_export=off) overlay on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs) type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/278/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/244/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/15/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/279/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/279/work,index=off,nfs_export=off) proc on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/proc](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/proc) type proc (rw,nosuid,nodev,noexec,relatime) proc on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/proc](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/proc) type proc (rw,nosuid,nodev,noexec,relatime) sysfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys) type sysfs (ro,nosuid,nodev,noexec,relatime) sysfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys) type sysfs (ro,nosuid,nodev,noexec,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup) type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup) type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/systemd](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/systemd) type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/systemd](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/systemd) type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpuset](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpuset) type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpuset](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpuset) type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpu,cpuacct](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpu,cpuacct) type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpu,cpuacct](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpu,cpuacct) type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/memory](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/memory) type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/memory](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/memory) type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/devices](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/devices) type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/devices](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/devices) type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/net_cls,net_prio](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/net_cls,net_prio) type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/net_cls,net_prio](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/net_cls,net_prio) type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/hugetlb](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/hugetlb) type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/hugetlb](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/hugetlb) type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/perf_event](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/perf_event) type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/perf_event](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/perf_event) type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/blkio](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/blkio) type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/blkio](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/blkio) type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/freezer](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/freezer) type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/freezer](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/freezer) type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/pids](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/pids) type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/pids](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/pids) type cgroup (rw,nosuid,nodev,noexec,relatime,pids) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/csi](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/csi) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/csi](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/csi) type ext4 (rw,relatime) udev on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev) type devtmpfs (rw,nosuid,relatime,size=119520900k,nr_inodes=29880225,mode=755) devpts on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/pts](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/pts) type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/shm](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/shm) type tmpfs (rw,nosuid,nodev) hugetlbfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/hugepages](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/hugepages) type hugetlbfs (rw,relatime,pagesize=2M) mqueue on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/mqueue](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/mqueue) type mqueue (rw,relatime) udev on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev) type devtmpfs (rw,nosuid,relatime,size=119520900k,nr_inodes=29880225,mode=755) devpts on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/pts](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/pts) type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/shm](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/shm) type tmpfs (rw,nosuid,nodev) hugetlbfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/hugepages](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/hugepages) type hugetlbfs (rw,relatime,pagesize=2M) mqueue on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/mqueue](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/mqueue) type mqueue (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/resolv.conf](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/resolv.conf) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/resolv.conf](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/resolv.conf) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hosts](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hosts) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hosts](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hosts) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hostname](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hostname) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hostname](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hostname) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet) type ext4 (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/5910334a-cfd1-456f-a0e8-188e12b9bd43/volumes/kubernetes.io~secret/kube-proxy-token-74vlm](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/5910334a-cfd1-456f-a0e8-188e12b9bd43/volumes/kubernetes.io~secret/kube-proxy-token-74vlm) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/00fe260c-76ad-45f0-add3-58e4bd3b8981/volumes/kubernetes.io~secret/csi-ebs-token-968j7](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/00fe260c-76ad-45f0-add3-58e4bd3b8981/volumes/kubernetes.io~secret/csi-ebs-token-968j7) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/89ed4967-4749-40ef-8e0c-c8dfd598bbe6/volumes/kubernetes.io~secret/csi-nas-token-qrrdb](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/89ed4967-4749-40ef-8e0c-c8dfd598bbe6/volumes/kubernetes.io~secret/csi-nas-token-qrrdb) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/0f6e4e69-7b22-49a3-a8c1-1976dd377d1c/volumes/kubernetes.io~secret/flannel-token-wc6dz](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/0f6e4e69-7b22-49a3-a8c1-1976dd377d1c/volumes/kubernetes.io~secret/flannel-token-wc6dz) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/23c56271-a650-4b3e-82c8-8df5084812f5/volumes/kubernetes.io~secret/default-token-ftxhn](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/23c56271-a650-4b3e-82c8-8df5084812f5/volumes/kubernetes.io~secret/default-token-ftxhn) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/a3f0f8a1-fb25-4fae-b435-4038a9428cd7/volumes/kubernetes.io~secret/default-token-ftxhn](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/a3f0f8a1-fb25-4fae-b435-4038a9428cd7/volumes/kubernetes.io~secret/default-token-ftxhn) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/23c56271-a650-4b3e-82c8-8df5084812f5/volumes/kubernetes.io~empty-dir/workdir](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/23c56271-a650-4b3e-82c8-8df5084812f5/volumes/kubernetes.io~empty-dir/workdir) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/a3f0f8a1-fb25-4fae-b435-4038a9428cd7/volumes/kubernetes.io~empty-dir/workdir](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/a3f0f8a1-fb25-4fae-b435-4038a9428cd7/volumes/kubernetes.io~empty-dir/workdir) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/c5262037-7baf-457c-8b9c-b1f768b12b4f/volumes/kubernetes.io~secret/default-token-ftxhn](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/c5262037-7baf-457c-8b9c-b1f768b12b4f/volumes/kubernetes.io~secret/default-token-ftxhn) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/c5262037-7baf-457c-8b9c-b1f768b12b4f/volumes/kubernetes.io~empty-dir/workdir](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/c5262037-7baf-457c-8b9c-b1f768b12b4f/volumes/kubernetes.io~empty-dir/workdir) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/d87fd265-3bc5-41a2-95d5-a4860115b1d2/volumes/kubernetes.io~secret/cattle-credentials](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/d87fd265-3bc5-41a2-95d5-a4860115b1d2/volumes/kubernetes.io~secret/cattle-credentials) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/d87fd265-3bc5-41a2-95d5-a4860115b1d2/volumes/kubernetes.io~secret/cattle-token-7gwj7](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/d87fd265-3bc5-41a2-95d5-a4860115b1d2/volumes/kubernetes.io~secret/cattle-token-7gwj7) type tmpfs (rw,relatime) tmpfs on /run/containerd/io
may be there are some bugs in runc? such as prepareRoot or mountToRootfs func?
This is some logs of leaked csi container 7d0849b82c486573d1. I observed the leak at Aug 11 14:15:59, and the container strat failed at Aug 11 14:16:07, which means leaked happen before csi container running, so I suspect the problem may be in runc.
Aug 11 14:15:56 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:56.070737230+08:00" level=info msg="CreateContainer within sandbox \"05e2bfd2393844bba91801e98c93127bcfac944c41bdb5a977ab327b27d6bee4\" for &ContainerMetadata{ Name:csi-ebs-driver,Attempt:0,} returns container id \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" Aug 11 14:15:56 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:56.071110121+08:00" level=info msg="StartContainer for \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" Aug 11 14:15:59 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:59.780516679+08:00" level=warning msg="failed to cleanup rootfs mount" error="failed to unmount target /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0 849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: device or resource busy" Aug 11 14:15:59 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:59.781017810+08:00" level=info msg="shim disconnected" id=7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:15:59 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:59.781054748+08:00" level=warning msg="cleaning up after shim disconnected" id=7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb namespace=k8s.i o Aug 11 14:15:59 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:59.782681956+08:00" level=error msg="collecting metrics for 7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb" error="ttrpc: closed: unknown" Aug 11 14:16:04 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:04.834477825+08:00" level=warning msg="failed to clean up after shim disconnected" error="unmount rootfs /run/containerd/io.containerd.runtime.v2.task/k8s.io/7 d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: failed to unmount target /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: device or resou rce busy" id=7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb namespace=k8s.io Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.313355781+08:00" level=error msg="failed to delete bundle" error="unmount rootfs /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d16484 09c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: failed to unmount target /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: device or resource busy" id=7d0849b8 2c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.313419607+08:00" level=error msg="failed to delete shim" error="2 errors occurred:\n\t* close wait error: context deadline exceeded\n\t* failed to delete bu ndle: unmount rootfs /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: failed to unmount target /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486 573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: device or resource busy\n\n" id=7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.313587220+08:00" level=error msg="Failed to pipe stdout of container \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" error="reading fr om a closed fifo" Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.313647067+08:00" level=error msg="Failed to pipe stderr of container \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" error="reading fr om a closed fifo" Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.314940439+08:00" level=error msg="StartContainer for \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\" failed" error="failed to create co ntainerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error jailing process inside rootfs: pivot_root .: invalid argument: unknown" Aug 11 14:16:07 ncjat34u33gu8siupfck0 kubelet[4635]: E0811 14:16:07.315198 4635 remote_runtime.go:251] StartContainer "7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb" from runtime service failed: rpc error: code = Unk nown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error jailing process inside rootfs: pivot_root .: in valid argument: unknown Aug 11 14:16:07 ncjat34u33gu8siupfck0 kubelet[4635]: I0811 14:16:07.568652 4635 scope.go:111] [topologymanager] RemoveContainer - Container ID: 7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:16:08 ncjat34u33gu8siupfck0 kubelet[4635]: I0811 14:16:08.587152 4635 scope.go:111] [topologymanager] RemoveContainer - Container ID: 7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:16:08 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:08.588202901+08:00" level=info msg="RemoveContainer for \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" Aug 11 14:16:08 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:08.597228076+08:00" level=info msg="RemoveContainer for \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\" returns successfully"
Aug 11 14:15:59 ncjat34u33gu8siupfck0 kubelet[4635]: E0811 14:15:59.510742 4635 pod_workers.go:191] Error syncing pod 150463db-1acb-4296-90d5-1911ff99ad5d ("e9064fed-19360-22319-765df758cf-f875d_aiplay-v2(150463db-1acb-4296-90d5-1911ff99ad5d)"), skipping: failed to "StartContainer" for "e9064fed-19360-22319" with RunContainerError: "failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting \"cgroup\" to rootfs at \"/sys/fs/cgroup\": stat /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/pod150463db-1acb-4296-90d5-1911ff99ad5d/53bb057d03426ffee271042471998769259716007bf32a21c04d227e440ef40b: no such file or directory: unknown"
I have experiencing the same leak problem, and i have spent some days to try to find out root cause but i don't found yet, i found one obvious problem after inspected my host environment where mount leak problem occurred and digging into runc code:
prepareRoot(config *configs.Config)
function worked abnormally, because after the rootfsParentMountPrivate(config.Rootfs)
function returned with no error, I still found shared rootfs mounts leaked on the host;And after experiencing the same problem four times, I found that all containers with issues had RootPropagation
configurations with a value of 1064960 (aka rshare), which would make runc configure the root mount in the new mount namespace with the rshare
propagation option. By default, this option is rslave
, so I guess the "rootPropagation": 1064960
configuration item in config.json is the issue initiator.
In addition, not only cgroup mounts will leak, in my environment, all mounts in config.json leaked into the host mount namespace, such as servercertificates
coming from k8s. My runc crashed at pivote_root
, and mounts under rootfs took effect before pivote_root
.
There are some information from my problem case, hope they can be of some use:
Environment informations:
The leaked container rootfs mount in host (the second item should not exist in host mount namespace):
3353 84 0:1132 / /var/lib/containerd/state/io.containerd.runtime.v2.task/k8s.io/73eb4f4f0ee2e2ebb66b7db135f8f019550c9111629b898da69bf3053a40af71/rootfs rw,relatime shared:1261
9248 3353 0:1132 / /var/lib/containerd/state/io.containerd.runtime.v2.task/k8s.io/73eb4f4f0ee2e2ebb66b7db135f8f019550c9111629b898da69bf3053a40af71/rootfs rw,relatime shared:1261
(here are also hundreds mount items over rootfs mount path, omitted)
Logs of crashed runc:
{"level":"error","msg":"container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:134: jailing process inside rootfs caused: pivot_root invalid argument","time":"2024-03-05T17:05:23+08:00"}
I have experiencing the same leak problem, and i have spent some days to try to find out root cause but i don't found yet, i found one obvious problem after inspected my host environment where mount leak problem occurred and digging into runc code:
- Some processes within the
prepareRoot(config *configs.Config)
function worked abnormally, because after therootfsParentMountPrivate(config.Rootfs)
function returned with no error, I still found shared rootfs mounts leaked on the host;And after experiencing the same problem four times, I found that all containers with issues had
RootPropagation
configurations with a value of 1064960 (aka rshare), which would make runc configure the root mount in the new mount namespace with thershare
propagation option. By default, this option isrslave
, so I guess the"rootPropagation": 1064960
configuration item in config.json is the issue initiator.In addition, not only cgroup mounts will leak, in my environment, all mounts in config.json leaked into the host mount namespace, such as
servercertificates
coming from k8s. My runc crashed atpivote_root
, and mounts under rootfs took effect beforepivote_root
.There are some information from my problem case, hope they can be of some use:
Environment informations:
- OS: Debian GNU/Linux 9
- Kernel: 5.4.210 amd64
- RunC version: v1.0.2
- The leaked container rootfs mount in host (the second item should not exist in host mount namespace):
3353 84 0:1132 / /var/lib/containerd/state/io.containerd.runtime.v2.task/k8s.io/73eb4f4f0ee2e2ebb66b7db135f8f019550c9111629b898da69bf3053a40af71/rootfs rw,relatime shared:1261 9248 3353 0:1132 / /var/lib/containerd/state/io.containerd.runtime.v2.task/k8s.io/73eb4f4f0ee2e2ebb66b7db135f8f019550c9111629b898da69bf3053a40af71/rootfs rw,relatime shared:1261 (here are also hundreds mount items over rootfs mount path, omitted)
- Logs of crashed runc:
{"level":"error","msg":"container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:134: jailing process inside rootfs caused: pivot_root invalid argument","time":"2024-03-05T17:05:23+08:00"}
rootfsPropagation
is rshared
because at least one mount is configured for bidirectional propagation.
I encountered the same problem. I also think that the rootfsParentMountPrivate
function has an exception, although it does not return an error. rootfsParentMountPrivate
does not change rootfs to private. The implementation of the rootfsParentMountPrivate
function is not complicated. Maybe it encountered a race condition?
Did you finally locate the cause?
I have experiencing the same leak problem, and i have spent some days to try to find out root cause but i don't found yet, i found one obvious problem after inspected my host environment where mount leak problem occurred and digging into runc code:
- Some processes within the
prepareRoot(config *configs.Config)
function worked abnormally, because after therootfsParentMountPrivate(config.Rootfs)
function returned with no error, I still found shared rootfs mounts leaked on the host;And after experiencing the same problem four times, I found that all containers with issues had
RootPropagation
configurations with a value of 1064960 (aka rshare), which would make runc configure the root mount in the new mount namespace with thershare
propagation option. By default, this option isrslave
, so I guess the"rootPropagation": 1064960
configuration item in config.json is the issue initiator. In addition, not only cgroup mounts will leak, in my environment, all mounts in config.json leaked into the host mount namespace, such asservercertificates
coming from k8s. My runc crashed atpivote_root
, and mounts under rootfs took effect beforepivote_root
. There are some information from my problem case, hope they can be of some use:
Environment informations:
- OS: Debian GNU/Linux 9
- Kernel: 5.4.210 amd64
- RunC version: v1.0.2
- The leaked container rootfs mount in host (the second item should not exist in host mount namespace):
3353 84 0:1132 / /var/lib/containerd/state/io.containerd.runtime.v2.task/k8s.io/73eb4f4f0ee2e2ebb66b7db135f8f019550c9111629b898da69bf3053a40af71/rootfs rw,relatime shared:1261 9248 3353 0:1132 / /var/lib/containerd/state/io.containerd.runtime.v2.task/k8s.io/73eb4f4f0ee2e2ebb66b7db135f8f019550c9111629b898da69bf3053a40af71/rootfs rw,relatime shared:1261 (here are also hundreds mount items over rootfs mount path, omitted)
- Logs of crashed runc:
{"level":"error","msg":"container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:134: jailing process inside rootfs caused: pivot_root invalid argument","time":"2024-03-05T17:05:23+08:00"}
rootfsPropagation
isrshared
because at least one mount is configured for bidirectional propagation.I encountered the same problem. I also think that the
rootfsParentMountPrivate
function has an exception, although it does not return an error.rootfsParentMountPrivate
does not change rootfs to private. The implementation of therootfsParentMountPrivate
function is not complicated. Maybe it encountered a race condition?Did you finally locate the cause?
we found getParentMount func in the rootfsParentMountPrivate func return wrong mountPoint. In the common k8s case, it should return overlayFsMountPoint of container, while it returned "/run", which is the parent mount of overlayFsMountPoint. I suspect the root cause may be a bug in the kernel: sometimes the overlayFsMountPoint just created cannot be observed in the new mount namespace? cc @zhaodiaoer
I think i have found the root cause of this problem, let me explain the complete picture of this problem:
TL;DR: Currently, the mechanism provided by github.com/moby/sys/mountinfo to obtain a complete mount list has a bug in its implementation on Linux. The process of traversing procfs is unsafe and there is a possibility of missing entries in the traversal result. I have also raised one issue for this.
Detail version:
rshared
; 2. Change the parent mount option of the rootfs directory of the container to private
; The second step is the dependence to use pivot_root
to change the root of the file system for the container to the container's special rootfs, the problem also occurs in the second step.pivot_root
to change the root of the file system for the container to the container's special rootfs directory, then the require that pivot_root
need new root must under private mount not meet, runc log out error like error jailing process inside rootfs: pivot_root .: invalid argument
and exit, but leaked mount still exist on host.CC @LastNight1997 @fuweid
I am well aware of the mountinfo reading bug; in fact, I have a whole repo devoted to the issue: https://github.com/kolyshkin/procfs-test.
This is a kernel bug, which is fixed in kernel v5.8 (see the above repo for details). Distro vendors should either upgrade their kernels, or backport the relevant patch (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f7fa85607bc398f843ff0f).
Theoretically, we can add a retry in getParentMount. Practically, this is very bad performance-wise.
@zhaodiaoer thanks for investigating that. If you can figure out a reliable way to know if/when we should re-try reading mounts in getParentMount
(so we can re-read it conditionally, not always), we can do that. But I'm opposed to always re-reading mounts.
Currently, the mechanism provided by github.com/moby/sys/mountinfo to obtain a complete mount list has a bug in its implementation on Linux. The process of traversing procfs is unsafe and there is a possibility of missing entries in the traversal result. I have also raised one issue for this.
Alas, this is a kernel bug, not a mountinfo package bug (otherwise we should have it fixed by now).
Can anyone who has seen this issue test the proposed patch in https://github.com/opencontainers/runc/pull/4417 and report (in that PR, not here!) if it fixes the issue?
I am well aware of the mountinfo reading bug; in fact, I have a whole repo devoted to the issue: https://github.com/kolyshkin/procfs-test.
This is a kernel bug, which is fixed in kernel v5.8 (see the above repo for details). Distro vendors should either upgrade their kernels, or backport the relevant patch (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f7fa85607bc398f843ff0f).
Theoretically, we can add a retry in getParentMount. Practically, this is very bad performance-wise.
Yes, after this kernel bug fix, this mount leak issue will be solved. very important information, Thanks !
@zhaodiaoer thanks for investigating that. If you can figure out a reliable way to know if/when we should re-try reading mounts in
getParentMount
(so we can re-read it conditionally, not always), we can do that. But I'm opposed to always re-reading mounts.
I am still thinking of a solution, but I haven't come up with one yet...
I'm coming over her by debugging https://github.com/kubernetes/kubernetes/issues/91023 together with containerd v1.3.4 which ships runc:
We have identified that there is some kind of leak of cgroup mounts which result in e.g. the following lines in
/proc/self/mountinfo
:When such a leak does exist, runc tries to use use a wrong cgroup during
libcontainer/rootfs_linux
'sprepareRootfs
.I was able to reproduce the bug by:
This results in the above output.
I was able to debug a bit into runc here and found the following The function
GetCgroupMounts(false)
returns in this case the wrong mountpoint for the systemd cgroup (/run/foo/rootfs/sys/fs/cgroup/systemd
insetad of/sys/fs/cgroup/systemd
).This is because in
/proc/self/mountinfo
the mount/run/foo/rootfs/sys/fs/cgroup/systemd
occured before/sys/fs/cgroup/systemd
(which seems weird for me, because having a look myself to/proc/self/mountinfo
and processing it would order them the other way around).As a POC I added the following patch to runc which fixed it for my test case:
of course this does not work for upstream, at least to fix the original leak I would need to match on something like
/run/containerd/
.