Open chrischdi opened 4 years ago
The leaked mounts are most probably the result of wrong mount propagation used.
The bug it causes might be workarounded by making GetCgroupMounts
prefer the standard paths, i.e. /sys/fs/cgroup/$controller
. I will look into it.
Hi @kolyshkin , with wrong mount propagation: Is there something we can do about it?
We are using kubernetes and the problem occurs only on the CSI Nodeplugin, which needs to have bidirectional
mount propagation for /var/lib/kubelet
and HostToContainer
for /dev
.
The first one is needed, because the CSI plugin does do mounts for other containers (prior to the other pod starts) and I think /dev
is needed for CSI to see new attached disks.
Thank you for looking into it :-)
One other approach to workaround it would be to check the parent ID
field (second field in /proc/self/mountinfoto be the same for all mounts (or to be equal to the
mount IDfield of
/sys/fs/cgroup` mount.
I still don't understand why GetCgroupMounts
is not picking up the first mount. I know there is a race in the kernel when it comes to serving /proc/self/mountinfo (and similar files) -- in particular, if the next entry to be read is deleted (i.e. the mount is unmounted), the rest of mountinfo is never read. But it is not applicable to the case.
When I debugged into it I have seen that the entries in /proc/self/mountinfo
were ordered in another way, compared to when I did a simple cat /proc/self/mountinfo
. But I also don't know why the output was not the same.
In fact we should always use /sys/fs/cgroup, this seems to be the de-facto standard these days. It will still be interesting to see /proc/self/mountinfo where other cgroup entries precede those with /sys/fs/cgroup mountpotint.
We are also experiencing the same thing with csi-rbd plugin. Found @chrischdi thread, and was able to delete the extra mounts in order for kubelet to come up. We are on coreos -- 4.19.106 (Coreos 2345.3.0).
The extra mounts weren't with the string /run/containerd/
. They aren't uniform neither.
@kolyshkin We have the same problem too, and I find not only cgroup mount was leaked to host mount ns, all mount in the csi container which use the bidirection mount propagation were leaked.
❯ cat mount|grep 7d0849b82c48 overlay on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs) type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/278/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/244/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/15/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/279/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/279/work,index=off,nfs_export=off) overlay on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs) type overlay (rw,relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/278/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/244/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/15/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/279/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/279/work,index=off,nfs_export=off) proc on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/proc](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/proc) type proc (rw,nosuid,nodev,noexec,relatime) proc on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/proc](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/proc) type proc (rw,nosuid,nodev,noexec,relatime) sysfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys) type sysfs (ro,nosuid,nodev,noexec,relatime) sysfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys) type sysfs (ro,nosuid,nodev,noexec,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup) type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup) type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=755) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/systemd](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/systemd) type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/systemd](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/systemd) type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpuset](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpuset) type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpuset](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpuset) type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpu,cpuacct](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpu,cpuacct) type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpu,cpuacct](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/cpu,cpuacct) type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/memory](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/memory) type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/memory](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/memory) type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/devices](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/devices) type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/devices](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/devices) type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/net_cls,net_prio](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/net_cls,net_prio) type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/net_cls,net_prio](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/net_cls,net_prio) type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/hugetlb](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/hugetlb) type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/hugetlb](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/hugetlb) type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/perf_event](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/perf_event) type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/perf_event](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/perf_event) type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/blkio](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/blkio) type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/blkio](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/blkio) type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/freezer](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/freezer) type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/freezer](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/freezer) type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/pids](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/pids) type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/pids](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/fs/cgroup/pids) type cgroup (rw,nosuid,nodev,noexec,relatime,pids) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/csi](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/csi) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/csi](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/csi) type ext4 (rw,relatime) udev on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev) type devtmpfs (rw,nosuid,relatime,size=119520900k,nr_inodes=29880225,mode=755) devpts on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/pts](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/pts) type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/shm](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/shm) type tmpfs (rw,nosuid,nodev) hugetlbfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/hugepages](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/hugepages) type hugetlbfs (rw,relatime,pagesize=2M) mqueue on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/mqueue](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/mqueue) type mqueue (rw,relatime) udev on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev) type devtmpfs (rw,nosuid,relatime,size=119520900k,nr_inodes=29880225,mode=755) devpts on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/pts](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/pts) type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/shm](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/shm) type tmpfs (rw,nosuid,nodev) hugetlbfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/hugepages](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/hugepages) type hugetlbfs (rw,relatime,pagesize=2M) mqueue on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/mqueue](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/dev/mqueue) type mqueue (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/resolv.conf](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/resolv.conf) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/resolv.conf](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/resolv.conf) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hosts](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hosts) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hosts](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hosts) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hostname](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hostname) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hostname](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/etc/hostname) type ext4 (rw,relatime) /dev/vdb on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet) type ext4 (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/5910334a-cfd1-456f-a0e8-188e12b9bd43/volumes/kubernetes.io~secret/kube-proxy-token-74vlm](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/5910334a-cfd1-456f-a0e8-188e12b9bd43/volumes/kubernetes.io~secret/kube-proxy-token-74vlm) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/00fe260c-76ad-45f0-add3-58e4bd3b8981/volumes/kubernetes.io~secret/csi-ebs-token-968j7](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/00fe260c-76ad-45f0-add3-58e4bd3b8981/volumes/kubernetes.io~secret/csi-ebs-token-968j7) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/89ed4967-4749-40ef-8e0c-c8dfd598bbe6/volumes/kubernetes.io~secret/csi-nas-token-qrrdb](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/89ed4967-4749-40ef-8e0c-c8dfd598bbe6/volumes/kubernetes.io~secret/csi-nas-token-qrrdb) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/0f6e4e69-7b22-49a3-a8c1-1976dd377d1c/volumes/kubernetes.io~secret/flannel-token-wc6dz](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/0f6e4e69-7b22-49a3-a8c1-1976dd377d1c/volumes/kubernetes.io~secret/flannel-token-wc6dz) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/23c56271-a650-4b3e-82c8-8df5084812f5/volumes/kubernetes.io~secret/default-token-ftxhn](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/23c56271-a650-4b3e-82c8-8df5084812f5/volumes/kubernetes.io~secret/default-token-ftxhn) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/a3f0f8a1-fb25-4fae-b435-4038a9428cd7/volumes/kubernetes.io~secret/default-token-ftxhn](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/a3f0f8a1-fb25-4fae-b435-4038a9428cd7/volumes/kubernetes.io~secret/default-token-ftxhn) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/23c56271-a650-4b3e-82c8-8df5084812f5/volumes/kubernetes.io~empty-dir/workdir](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/23c56271-a650-4b3e-82c8-8df5084812f5/volumes/kubernetes.io~empty-dir/workdir) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/a3f0f8a1-fb25-4fae-b435-4038a9428cd7/volumes/kubernetes.io~empty-dir/workdir](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/a3f0f8a1-fb25-4fae-b435-4038a9428cd7/volumes/kubernetes.io~empty-dir/workdir) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/c5262037-7baf-457c-8b9c-b1f768b12b4f/volumes/kubernetes.io~secret/default-token-ftxhn](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/c5262037-7baf-457c-8b9c-b1f768b12b4f/volumes/kubernetes.io~secret/default-token-ftxhn) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/c5262037-7baf-457c-8b9c-b1f768b12b4f/volumes/kubernetes.io~empty-dir/workdir](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/c5262037-7baf-457c-8b9c-b1f768b12b4f/volumes/kubernetes.io~empty-dir/workdir) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/d87fd265-3bc5-41a2-95d5-a4860115b1d2/volumes/kubernetes.io~secret/cattle-credentials](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/d87fd265-3bc5-41a2-95d5-a4860115b1d2/volumes/kubernetes.io~secret/cattle-credentials) type tmpfs (rw,relatime) tmpfs on /run/containerd/io.containerd.runtime.v2.task/[k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/d87fd265-3bc5-41a2-95d5-a4860115b1d2/volumes/kubernetes.io~secret/cattle-token-7gwj7](http://k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/var/lib/kubelet/pods/d87fd265-3bc5-41a2-95d5-a4860115b1d2/volumes/kubernetes.io~secret/cattle-token-7gwj7) type tmpfs (rw,relatime) tmpfs on /run/containerd/io
may be there are some bugs in runc? such as prepareRoot or mountToRootfs func?
This is some logs of leaked csi container 7d0849b82c486573d1. I observed the leak at Aug 11 14:15:59, and the container strat failed at Aug 11 14:16:07, which means leaked happen before csi container running, so I suspect the problem may be in runc.
Aug 11 14:15:56 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:56.070737230+08:00" level=info msg="CreateContainer within sandbox \"05e2bfd2393844bba91801e98c93127bcfac944c41bdb5a977ab327b27d6bee4\" for &ContainerMetadata{ Name:csi-ebs-driver,Attempt:0,} returns container id \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" Aug 11 14:15:56 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:56.071110121+08:00" level=info msg="StartContainer for \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" Aug 11 14:15:59 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:59.780516679+08:00" level=warning msg="failed to cleanup rootfs mount" error="failed to unmount target /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0 849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: device or resource busy" Aug 11 14:15:59 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:59.781017810+08:00" level=info msg="shim disconnected" id=7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:15:59 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:59.781054748+08:00" level=warning msg="cleaning up after shim disconnected" id=7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb namespace=k8s.i o Aug 11 14:15:59 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:15:59.782681956+08:00" level=error msg="collecting metrics for 7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb" error="ttrpc: closed: unknown" Aug 11 14:16:04 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:04.834477825+08:00" level=warning msg="failed to clean up after shim disconnected" error="unmount rootfs /run/containerd/io.containerd.runtime.v2.task/k8s.io/7 d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: failed to unmount target /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: device or resou rce busy" id=7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb namespace=k8s.io Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.313355781+08:00" level=error msg="failed to delete bundle" error="unmount rootfs /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d16484 09c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: failed to unmount target /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: device or resource busy" id=7d0849b8 2c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.313419607+08:00" level=error msg="failed to delete shim" error="2 errors occurred:\n\t* close wait error: context deadline exceeded\n\t* failed to delete bu ndle: unmount rootfs /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: failed to unmount target /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486 573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs: device or resource busy\n\n" id=7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.313587220+08:00" level=error msg="Failed to pipe stdout of container \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" error="reading fr om a closed fifo" Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.313647067+08:00" level=error msg="Failed to pipe stderr of container \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" error="reading fr om a closed fifo" Aug 11 14:16:07 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:07.314940439+08:00" level=error msg="StartContainer for \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\" failed" error="failed to create co ntainerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error jailing process inside rootfs: pivot_root .: invalid argument: unknown" Aug 11 14:16:07 ncjat34u33gu8siupfck0 kubelet[4635]: E0811 14:16:07.315198 4635 remote_runtime.go:251] StartContainer "7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb" from runtime service failed: rpc error: code = Unk nown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error jailing process inside rootfs: pivot_root .: in valid argument: unknown Aug 11 14:16:07 ncjat34u33gu8siupfck0 kubelet[4635]: I0811 14:16:07.568652 4635 scope.go:111] [topologymanager] RemoveContainer - Container ID: 7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:16:08 ncjat34u33gu8siupfck0 kubelet[4635]: I0811 14:16:08.587152 4635 scope.go:111] [topologymanager] RemoveContainer - Container ID: 7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb Aug 11 14:16:08 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:08.588202901+08:00" level=info msg="RemoveContainer for \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\"" Aug 11 14:16:08 ncjat34u33gu8siupfck0 containerd[3828]: time="2023-08-11T14:16:08.597228076+08:00" level=info msg="RemoveContainer for \"7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb\" returns successfully"
Aug 11 14:15:59 ncjat34u33gu8siupfck0 kubelet[4635]: E0811 14:15:59.510742 4635 pod_workers.go:191] Error syncing pod 150463db-1acb-4296-90d5-1911ff99ad5d ("e9064fed-19360-22319-765df758cf-f875d_aiplay-v2(150463db-1acb-4296-90d5-1911ff99ad5d)"), skipping: failed to "StartContainer" for "e9064fed-19360-22319" with RunContainerError: "failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting \"cgroup\" to rootfs at \"/sys/fs/cgroup\": stat /run/containerd/io.containerd.runtime.v2.task/k8s.io/7d0849b82c486573d1648409c675bf3f67113db630639075ed25ad73a122b2bb/rootfs/sys/pod150463db-1acb-4296-90d5-1911ff99ad5d/53bb057d03426ffee271042471998769259716007bf32a21c04d227e440ef40b: no such file or directory: unknown"
I have experiencing the same leak problem, and i have spent some days to try to find out root cause but i don't found yet, i found one obvious problem after inspected my host environment where mount leak problem occurred and digging into runc code:
prepareRoot(config *configs.Config)
function worked abnormally, because after the rootfsParentMountPrivate(config.Rootfs)
function returned with no error, I still found shared rootfs mounts leaked on the host;And after experiencing the same problem four times, I found that all containers with issues had RootPropagation
configurations with a value of 1064960 (aka rshare), which would make runc configure the root mount in the new mount namespace with the rshare
propagation option. By default, this option is rslave
, so I guess the "rootPropagation": 1064960
configuration item in config.json is the issue initiator.
In addition, not only cgroup mounts will leak, in my environment, all mounts in config.json leaked into the host mount namespace, such as servercertificates
coming from k8s. My runc crashed at pivote_root
, and mounts under rootfs took effect before pivote_root
.
There are some information from my problem case, hope they can be of some use:
Environment informations:
The leaked container rootfs mount in host (the second item should not exist in host mount namespace):
3353 84 0:1132 / /var/lib/containerd/state/io.containerd.runtime.v2.task/k8s.io/73eb4f4f0ee2e2ebb66b7db135f8f019550c9111629b898da69bf3053a40af71/rootfs rw,relatime shared:1261
9248 3353 0:1132 / /var/lib/containerd/state/io.containerd.runtime.v2.task/k8s.io/73eb4f4f0ee2e2ebb66b7db135f8f019550c9111629b898da69bf3053a40af71/rootfs rw,relatime shared:1261
(here are also hundreds mount items over rootfs mount path, omitted)
Logs of crashed runc:
{"level":"error","msg":"container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:134: jailing process inside rootfs caused: pivot_root invalid argument","time":"2024-03-05T17:05:23+08:00"}
I'm coming over her by debugging https://github.com/kubernetes/kubernetes/issues/91023 together with containerd v1.3.4 which ships runc:
We have identified that there is some kind of leak of cgroup mounts which result in e.g. the following lines in
/proc/self/mountinfo
:When such a leak does exist, runc tries to use use a wrong cgroup during
libcontainer/rootfs_linux
'sprepareRootfs
.I was able to reproduce the bug by:
This results in the above output.
I was able to debug a bit into runc here and found the following The function
GetCgroupMounts(false)
returns in this case the wrong mountpoint for the systemd cgroup (/run/foo/rootfs/sys/fs/cgroup/systemd
insetad of/sys/fs/cgroup/systemd
).This is because in
/proc/self/mountinfo
the mount/run/foo/rootfs/sys/fs/cgroup/systemd
occured before/sys/fs/cgroup/systemd
(which seems weird for me, because having a look myself to/proc/self/mountinfo
and processing it would order them the other way around).As a POC I added the following patch to runc which fixed it for my test case:
of course this does not work for upstream, at least to fix the original leak I would need to match on something like
/run/containerd/
.