Open sibidharan opened 3 months ago
Hi @sibidharan, thanks for trying Sysbox and reporting the issue.
When I launch KASM container with sysbox with a GPU by sharing --device=/dev/dri/renderD128
I don't know that sharing the device will work, it will probably show up with nobody:nogroup
inside the rootless Sysbox container. There's a long-standing GitHub issue about adding support for GPUs inside Sysbox containers but the work is not yet completed.
Regarding the continous unmounts: maybe there's a bug in Sysbox, but let me provide a bit of context so you can better understand what's going on (and hopefully help us root cause the problem).
Sysbox intercepts the mount
and umount
syscalls inside the container. When intercepting the umount
syscall, one of the things Sysbox does is to check if the unmount is targeting of the sysbox-fs mounts inside the container (e.g., /sys/kernel, /sys/devices/virtual, and a few others). If so, it ignores the unmount for security reasons, as otherwise a process inside the container can tear-down the sysbox-fs mounts and likely bypass the kernel-resource namespacing that these mounts provide.
My guess is that what's happening is that systemd inside the Sysbox container is unmounting /run/systemd/mount-rootfs/sys/devices/virtual
, which must be a sysbox-fs mount too (i.e., it must be a bind-mount of /sys/devices/virtual into a different mountpoint). You can confirm by doing findmnt
inside the Sysbox container. Nevertheless, Sysbox should allow that mount to be unmounted because it's not the original /sys/devices/virtual
mount, but rather a bind-mount of it. My guess is that due to some bug in Sysbox, it's getting confused and ignoring the unmount, and that in turn is causing systemd to re-issue the unmount call (ad infinitum).
Let's start by having you post the output of findmnt
inside the Sysbox container to figure out what's going on ...
Thanks!
I don't know that sharing the device will work, it will probably show up with
nobody:nogroup
inside the rootless Sysbox container. There's a long-standing GitHub issue about adding support for GPUs inside Sysbox containers but the work is not yet completed.
Yes, for this I created a udev rule that chmod the /dev/dri/renderD128
as 666 readable one, and I am not sending the actual card0
but just the renderD128
node, which is actually working inside sysbox and I am able to use GPU acceleration very well by sharing the renderD128
node. Indeed it is showing as nobody:nogroup
but with the help of udev rules I am able to use the character device it without any issue.
As far the findmnt
here is the output from inside the sysbox container that is not a rootless setup, and it is using /sbin/init
as PID 1.
TARGET SOURCE FSTYPE OPTIONS
/ . shiftfs rw,relatime
├─/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware tmpfs tmpfs ro,relatime,uid=165536,gid=165536,inode64
│ ├─/sys/fs/cgroup tmpfs tmpfs ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,u
│ │ ├─/sys/fs/cgroup/systemd
│ │ │ systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,name=systemd
│ │ ├─/sys/fs/cgroup/net_cls,net_prio
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
│ │ ├─/sys/fs/cgroup/cpuset
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,cpuset
│ │ ├─/sys/fs/cgroup/freezer
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,freezer
│ │ ├─/sys/fs/cgroup/devices
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,devices
│ │ ├─/sys/fs/cgroup/hugetlb
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,hugetlb
│ │ ├─/sys/fs/cgroup/memory
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,memory
│ │ ├─/sys/fs/cgroup/cpu,cpuacct
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
│ │ ├─/sys/fs/cgroup/rdma
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,rdma
│ │ ├─/sys/fs/cgroup/misc
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,misc
│ │ ├─/sys/fs/cgroup/perf_event
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,perf_event
│ │ ├─/sys/fs/cgroup/blkio
│ │ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,blkio
│ │ └─/sys/fs/cgroup/pids
│ │ cgroup cgroup rw,nosuid,nodev,noexec,relatime,pids
│ ├─/sys/devices/virtual
│ │ │ sysboxfs[/sys/devices/virtual] fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permi
│ │ └─/sys/devices/virtual/powercap
│ │ tmpfs tmpfs ro,relatime,uid=165536,gid=165536,inode64
│ ├─/sys/kernel sysboxfs[/sys/kernel] fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permi
│ └─/sys/module/nf_conntrack/parameters
│ sysboxfs[/sys/module/nf_conntrack/parameters] fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permi
├─/proc proc proc rw,nosuid,nodev,noexec,relatime
│ ├─/proc/bus proc[/bus] proc ro,nosuid,nodev,noexec,relatime
│ ├─/proc/fs proc[/fs] proc ro,nosuid,nodev,noexec,relatime
│ ├─/proc/irq proc[/irq] proc ro,nosuid,nodev,noexec,relatime
│ ├─/proc/sysrq-trigger
│ │ proc[/sysrq-trigger] proc ro,nosuid,nodev,noexec,relatime
│ ├─/proc/acpi tmpfs tmpfs ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/keys udev[/null] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ ├─/proc/timer_list udev[/null] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ ├─/proc/scsi tmpfs tmpfs ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/swaps sysboxfs[/proc/swaps] fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permi
│ ├─/proc/sys sysboxfs[/proc/sys] fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permi
│ └─/proc/uptime sysboxfs[/proc/uptime] fuse rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permi
├─/dev tmpfs tmpfs rw,nosuid,size=65536k,mode=755,uid=165536,gid=165536,inode6
│ ├─/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime
│ ├─/dev/pts devpts devpts rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666
│ ├─/dev/shm shm tmpfs rw,nosuid,nodev,noexec,relatime,size=65536k,uid=165536,gid=
│ ├─/dev/null udev[/null] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ ├─/dev/random udev[/random] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ ├─/dev/kmsg udev[/null] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ ├─/dev/full udev[/full] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ ├─/dev/tty udev[/tty] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ ├─/dev/zero udev[/zero] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ ├─/dev/urandom udev[/urandom] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
│ └─/dev/dri/renderD128
│ udev[/dri/renderD128] devtmpfs rw,nosuid,relatime,size=40632724k,nr_inodes=10158181,mode=7
├─/run tmpfs tmpfs rw,nosuid,nodev,relatime,size=65536k,mode=755,uid=165536,gi
│ ├─/run/user/1000 tmpfs tmpfs rw,nosuid,nodev,relatime,size=419428k,nr_inodes=104857,mode
│ └─/run/lock tmpfs tmpfs rw,nosuid,nodev,noexec,relatime,size=4096k,uid=165536,gid=1
├─/var/lib/buildkit /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/buildkit/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae]
│ ext4 rw,relatime
├─/var/labsstorage /dev/vdb[/storage/bc8f9027a554affb095f58827a850269]
│ xfs rw,relatime,idmapped,attr2,inode64,logbufs=8,logbsize=32k,n
├─/etc/resolv.conf /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/docker/containers/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae/resolv.conf]
│ ext4 rw,relatime,idmapped
├─/etc/hostname /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/docker/containers/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae/hostname]
│ ext4 rw,relatime,idmapped
├─/etc/hosts /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/docker/containers/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae/hosts]
│ ext4 rw,relatime,idmapped
├─/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
│ /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/containerd/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae]
│ ext4 rw,relatime
├─/var/lib/docker /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/docker/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae]
│ ext4 rw,relatime
├─/var/lib/rancher/k3s
│ /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/rancher-k3s/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae]
│ ext4 rw,relatime
├─/var/lib/rancher/rke2
│ /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/rancher-rke2/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae]
│ ext4 rw,relatime
├─/var/lib/kubelet /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/kubelet/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae]
│ ext4 rw,relatime
├─/var/lib/k0s /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/k0s/0c944f10badddd149a0c05eee3fdf96a2ac36258e51362040cca226301595fae]
│ ext4 rw,relatime
├─/usr/src/linux-headers-5.15.0-107
│ /dev/mapper/ubuntu--vg-ubuntu--lv[/usr/src/linux-headers-5.15.0-107]
│ ext4 ro,relatime,idmapped
├─/usr/src/linux-headers-5.15.0-107-generic
│ /dev/mapper/ubuntu--vg-ubuntu--lv[/usr/src/linux-headers-5.15.0-107-generic]
│ ext4 ro,relatime,idmapped
└─/usr/lib/modules/5.15.0-107-generic
/dev/mapper/ubuntu--vg-ubuntu--lv[/usr/lib/modules/5.15.0-107-generic]
ext4 ro,relatime,idmapped
Hi @sibidharan, thanks for the findmnt
output, but I don't see the mount on /run/systemd/mount-rootfs/sys/devices/virtual
in it. I think you need to take the findmnt
output when the problem actually occurs (?).
Looking again at this, I think systemd is actually recursively bind-mounting the container's rootfs (/
) to /run/systemd/mount-rootfs/
, and then unmounting /run/systemd/mount-rootfs/sys/devices/virtual
from it, but Sysbox is ignoring the last unmount. Not sure though (this is why the findmnt
at the time the problem occurs would help).
Assuming I am right, you can try the following to manually reproduce (inside the Sysbox container):
$ mkdir -p /run/test/mount-rootfs
$ mount --rbind / /run/test/mount-rootfs # recursively bind mount the container's root (/) to /run/test/mount-rootfs
$ umount /run/test/mount-rootfs/sys/devices/virtual
After the last umount
, check if the mount at /run/test/mount-rootfs/sys/devices/virtual
was removed or if it remains in place.
My guess is that what's happening is that systemd inside the Sysbox container is unmounting
/run/systemd/mount-rootfs/sys/devices/virtual
, which must be a sysbox-fs mount too (i.e., it must be a bind-mount of /sys/devices/virtual into a different mountpoint).
Thank you for this valuable input, I was able to boildown the process thats issuing infinite call under /sbin/init
called (crub_all)
- looks non standard and I am unable to find any info about this (crub_all)
in the internet, but killing this process stops the infinite unmount calls. To temporarily address this issue, I am issuing a kill command for this process shortly after the container is up, but I am not sure if it is a right approach, since this feels a bit hacky.
I think you need to take the
findmnt
output when the problem actually occurs (?).
I indeed took this findmnt
from inside the sysbox container. Do you want to see the host's findmnt
aswell?
Assuming I am right, you can try the following to manually reproduce (inside the Sysbox container):
$ mkdir -p /run/test/mount-rootfs $ mount --rbind / /run/test/mount-rootfs # recursively bind mount the container's root (/) to /run/test/mount-rootfs $ umount /run/test/mount-rootfs/sys/devices/virtual
After the last
umount
, check if the mount at/run/test/mount-rootfs/sys/devices/virtual
was removed or if it remains in place.
Here is the output:
root@kali:/var/labsdata# mkdir -p /run/test/mount-rootfs
root@kali:/var/labsdata# mount --rbind / /run/test/mount-rootfs
mount: /run/test/mount-rootfs: permission denied.
dmesg(1) may have more information after failed mount system call.
root@kali:/var/labsdata# umount /run/test/mount-rootfs/sys/devices/virtual
umount: /run/test/mount-rootfs/sys/devices/virtual: no mount point specified.
root@kali:/var/labsdata#
Thank you for this valuable input, I was able to boildown the process thats issuing infinite call under /sbin/init called (crub_all) - looks non standard and I am unable to find any info about this (crub_all) in the internet, but killing this process stops the infinite unmount calls.
I see ... interesting; it must be related to systemd though, since it's trying to unmount under a systemd path (/run/systemd/mount-rootfs/sys/devices/virtual
). Maybe looking at the systemd source code will yield more info about it.
I indeed took this findmnt from inside the sysbox container. Do you want to see the host's findmnt aswell?
Sorry I meant to take the findmnt when the crub_all
process is issuing the infinite umount syscalls, so I can see the mountpoint it's trying to unmount.
It may be that you need to issue findmnt
in a loop too until the findmnt output reports a mount under /run/systemd/mount-rootfs/sys/devices/virtual
(?)
Thank you for this valuable input, I was able to boildown the process thats issuing infinite call under /sbin/init called (crub_all) - looks non standard and I am unable to find any info about this (crub_all) in the internet, but killing this process stops the infinite unmount calls.
I see ... interesting; it must be related to systemd though, since it's trying to unmount under a systemd path (
/run/systemd/mount-rootfs/sys/devices/virtual
). Maybe looking at the systemd source code will yield more info about it.I indeed took this findmnt from inside the sysbox container. Do you want to see the host's findmnt aswell?
Sorry I meant to take the findmnt when the
crub_all
process is issuing the infinite umount syscalls, so I can see the mountpoint it's trying to unmount.It may be that you need to issue
findmnt
in a loop too until the findmnt output reports a mount under/run/systemd/mount-rootfs/sys/devices/virtual
(?)
Yes, I run findmnt with watch -n 1 "findmnt | grep virtual"
while (crub_all)
is sending infinite umount calls but nothing much is showing up in the findmnt
. Should I increase the watch frequency? I will try this and update you.
Yes, I run findmnt with watch -n 1 "findmnt | grep virtual" while (crub_all) is sending infinite umount calls but nothing much is showing up in the findmnt. Should I increase the watch frequency? I will try this and update you.
Got it ... yes increase it to 0.5 secs (the minimum supported I believe): watch -n 0.5 ...
Hopefully that works; if it doesn't then we need to think of another approach to debug it ...
If that doesn't work, maybe the unmount is happening in a dedicated mount namespace. In that case, from inside the sysbox container, find the PID of the crub_all
process that performs that unmount, and then try:
$ cat /proc/<pid>/mountinfo | grep "/run/systemd/mount-rootfs/sys/devices/virtual"
Or maybe:
$ nsenter -a -t <pid>
$ findmnt
$ exit
HI @ctalledo , sorry for delayed response.
I found more info about the (crub_all)
process - it belongs to e2scrub_all.timer and e2scrub_reap.service
root@kali:/var/labsdata# systemctl status e2scrub_reap
● e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots
Loaded: loaded (/usr/lib/systemd/system/e2scrub_reap.service; enabled; preset: disabled)
Active: activating (start) since Sat 2024-06-08 11:31:03 UTC; 26min ago
Docs: man:e2scrub_all(8)
Main PID: 45746 ((crub_all))
Tasks: 1 (limit: 95232)
Memory: 236.0K ()
CGroup: /system.slice/e2scrub_reap.service
└─45746 "(crub_all)"
Jun 08 11:31:03 kali.selfmade.ninja systemd[1]: Starting e2scrub_reap.service - Remove Stale Online ext4 Metadata Check Snapshots...
and this timer starts the this automatically on every sunday, and I guess this is not needed inside the container - am I correct? Disabling these services are enough or this has to be fixed in Sysbox also?
root@kali:/var/labsdata# systemctl status e2scrub_all.timer
● e2scrub_all.timer - Periodic ext4 Online Metadata Check for All Filesystems
Loaded: loaded (/usr/lib/systemd/system/e2scrub_all.timer; enabled; preset: enabled)
Active: active (waiting) since Sat 2024-06-08 11:15:58 UTC; 48min ago
Trigger: Sun 2024-06-09 03:10:26 UTC; 15h left
Triggers: ● e2scrub_all.service
Jun 08 11:15:58 kali.selfmade.ninja systemd[1]: Started e2scrub_all.timer - Periodic ext4 Online Metadata Check for All Filesystems.
Got it ... yes increase it to 0.5 secs (the minimum supported I believe):
watch -n 0.5 ...
Hopefully that works; if it doesn't then we need to think of another approach to debug it ...
However, nothing showed up under watch, I am trying the
$ cat /proc/
/mountinfo | grep "/run/systemd/mount-rootfs/sys/devices/virtual"
root@kali:/usr/lib/systemd/system# cat /proc/87684/mountinfo | grep "/run/systemd/mount-rootfs/sys/devices/virtual"
45771 45596 0:6467 /sys/devices/virtual /run/systemd/mount-rootfs/sys/devices/virtual rw,nosuid,nodev,relatime - fuse sysboxfs rw,user_id=0,group_id=0,default_permissions,allow_other
45772 45771 0:6576 / /run/systemd/mount-rootfs/sys/devices/virtual/powercap ro,relatime - tmpfs tmpfs ro,uid=165536,gid=165536,inode64
nsenter didn't work btw. Hope this is enough info for you, let me know if you want more info.
Hi @sibidharan,
Thanks for the extra info.
Disabling these services are enough or this has to be fixed in Sysbox also?
Since we are having trouble understanding the issue fully, I can't yet say if it's a Sysbox bug or not.
I think the easiest path would be to disable the e2scrub
services inside the container (they are probably running at host level anyway, so having them inside the container is likely redundant).
I can confirm disabling all e2scrub related services fixes the issue as of now.
Hey @ctalledo, Thank you for looking into that. I’m experiencing the same problem, but with a slight difference. In my case, e2scrub is not running inside the container at all, and disabling the service on the Sysbox host also doesn't resolve the issue. The CPU is still at 100% usage due to Sysbox FS. Logs are pretty much similar. Is there any solution for this?
The issue appeared out of nowhere after rebooting the host. I didn’t perform any updates or make any changes. I simply rebooted the hosts, and the problem appeared on three out of five of them. Furthermore, when I try to set up new hosts, they also exhibit this issue right from the start. This is very strange, and there seems to be some dependency on something that I can’t figure out.
Hi @bushev,
The CPU is still at 100% usage due to Sysbox FS
Mmm ... strange. Must be the workload inside the sysbox container triggering the problem.
Is there an image I can use to reproduce? Any info that allows me to reproduce this locally would help.
For context, one area that could trigger high CPU usage by sysbox-fs is when the Sysbox container is executing lots of mount
or umount
system calls, as those are trapped by Sysbox. Another possible area would be if the Sysbox container is executing lots of accesses to /proc/sys/...
, as that also is partially emulated by Sysbox.
We are not making many requests to /proc/sys
, and we are also not performing many mount
or umount
operations from inside the container. However, on the host machine, there is constant mounting and unmounting happening, with an average of around a thousand active mounts. These mounts are created and removed as users log in and out of the system.
Our image is over 20 GB, so it’s not easy to share it directly. I’ll need to slim it down and make sure the issue still reproduces, which will take some time. If the problem persists, I’ll get back to you.
Since this issue affected our production environment, I had to find a quick workaround to keep the service running. The only solution I’ve found so far is to restart the Sysbox service on the host immediately after starting each new container. This drops the CPU load from 80% down to just a few percent. So far, I haven’t noticed any disruptions in the operation of the containers or the applications running inside them.
I wanted to share this workaround here in case anyone else encounters the same problem and needs a temporary solution.
Hi, I can reproduce a similar issue using the nestybox docker image. Hope that helps :)
services:
dind-sandbox:
image: nestybox/ubuntu-noble-systemd-docker:latest
container_name: dind_sandbox
restart: unless-stopped
runtime: sysbox-runc
Note: Enabling debug logs on sysbox-fs slow it down enough to not use all the CPU.
sysbox-runc
edition: Community Edition (CE)
version: 0.6.4
commit: 085502643ea5281652c6984eed9797872f22698a
built at: Sat Apr 6 16:43:31 UTC 2024
built by: Cesar Talledo
oci-specs: 1.1.0+dev
sysbox-mgr
edition: Community Edition (CE)
version: 0.6.4
commit: 03f5d7bc584fdcb2319b2c1831bd58581185fc1c
built at: Sat Apr 6 16:43:43 UTC 2024
built by: Cesar Talledo
sysbox-fs
edition: Community Edition (CE)
version: 0.6.4
commit: 1a678b72ac430009739fa6596b824f29b1f7fe2e
built at: Sat Apr 6 16:43:40 UTC 2024
built by: Cesar Talledo
[EDIT]: Added sysbox infos
Hi @Kl0ven, thanks for reporting.
Yes I am able to reproduce with the nestybox/ubuntu-noble-systemd
image. There must be something in that image that is causing high CPU usage, I'll investigate.
Thanks.
For the nestybox/ubuntu-noble-systemd
image, I confirmed that it's the e2scrub_reap.service
that is causing the high-CPU usage inside the Sysbox container.
I need to understand what that service is doing and why it's consuming so many CPU cycles when running inside a Sysbox container. I suspect it's walking the container filesystem and causing high CPU utilization once it hits the files under /proc
and /sys
emulated by Sysbox, or doing many mount / unmounts syscalls trapped by Sysbox.
I disabled the service and CPU utilization goes back to normal.
sudo systemctl stop e2scrub_reap.service
I opted for a quick solution of disabling the e2scrub_reap.service
in the nestybox/ubuntu-noble-systemd
image. The rationale is here: https://github.com/nestybox/dockerfiles/pull/38
I've pushed the new image already to Dockerhub, so you should not see the problem using that image.
A "proper fix" will require digging down a bit more to figure out what that service is doing when running inside a Sysbox container that is causing CPU utilization to be so high.
Should also disable the related timer?
and this timer starts the this automatically on every sunday, and I guess this is not needed inside the container - am I correct? Disabling these services are enough or this has to be fixed in Sysbox also?
root@kali:/var/labsdata# systemctl status e2scrub_all.timer ● e2scrub_all.timer - Periodic ext4 Online Metadata Check for All Filesystems Loaded: loaded (/usr/lib/systemd/system/e2scrub_all.timer; enabled; preset: enabled) Active: active (waiting) since Sat 2024-06-08 11:15:58 UTC; 48min ago Trigger: Sun 2024-06-09 03:10:26 UTC; 15h left Triggers: ● e2scrub_all.service
Hi @sibidharan,
Thanks; yes, we should disable that e2scrub_all.timer
; I'll do that in the nestybox/ubuntu-noble-systemd
image (and derivatives of it).
When I launch KASM container with sysbox with a GPU by sharing
--device=/dev/dri/renderD128
, sysbox-fs logs go crazy and it goes high CPU usage. I enabled logs and I see thisIf I restart sysbox-fs service, this issue goes away temporarily on deployed containers (unable to docker exec the running containers afterwards), but if I deploy a new container, this issue again starts while sharing devices or somewhere else (?).
Any reason what causes the infinite loop of
/run/systemd/mount-rootfs/sys/devices/virtual
unmount call that goes away when sysbox-fs is restarted?Log File: sysbox-fs.log
(After some researching...)
I can see a lot of
"Received umount syscall from pid 1092497"
for different targets, and they seem to go perfectly. I just searched for the first occurrence ofumount
in the log file, tracing every umount call.From line 7379 of log file we can see the first occurance of umount call to
/run/systemd/mount-rootfs/sys/devices/virtual
that gets ignored, and from then on its just an infinite loop, for every container I deploy with a device, this just adds up and the log file is full of this messages, I have to turn off the debug log else its consuming lotta storage. This just don't stop, only if I pass the--device=/dev/dri/renderD128
, and with the little knowledge I have, I am able to understand this infinite umount calls should be related to this device I passed, somehow causing an infinite loop.I went through the code located at https://github.com/nestybox/sysbox-fs/blob/master/nsenter/utils.go - this file has a potential possibility to go on a cleanup loop that could repeatedly send unmount calls, that later gets ignored by seccomp, as shown in the log, from here: https://github.com/nestybox/sysbox-fs/blob/4c2bc153f33af1bd30a227a14ecfc8174ff280d5/seccomp/umount.go#L128
Can we skip these devices from unmounting that are for sure going to get ignored by seccomp thus saving lot of CPU? Is my understanding of whats going on is correct? If so, how to solve this issue?