Open FFock opened 1 month ago
To answer part of my question myself, the allow-immutable-remounts=true
option can be activated by
vi /lib/systemd/system/sysbox-fs.service
edit the line with ExecStart=/usr/bin/sysbox-fs
on the host to:
ExecStart=/usr/bin/sysbox-fs --allow-immutable-remounts=true
The restart sysbox with:
systemctl stop sysbox
systemctl start sysbox
After that the error message changes when I try to remount /sys/fs/cgroup
to rw:
$ sudo mount -o remount,rw /sys/fs/cgroup
mount: /sys/fs/cgroup: mount(2) system call failed: Function not implemented.
So now the question is more general: "Is this a bug of sysbox or how to enabled rw cgroup fs with read-only root fs then?"
Hi @FFock,
We have the issue, that root users of a sysbox enabled container can store large files in the root (/) directory (or self-created new sub-directories) of the container. This can cause disk-pressure on the underlying kubernetes worker node as well as trigger a DoS on that worker node (no other containers can store any data in /var/lib/ of the host node anymore).
To prevent such a scenario, putting the root filesystem to read-only on kubernetes is the preferred method.
Got it, makes sense.
So to provide a bit of background on how it works.
When a container is started with --read-only
, Sysbox will honor that and set all the container mounts to read-only. By default, Sysbox will disallow the container from remounting those as read-write, although I can see that it allows it for /sys
(a bug) but not for other mounts (including those under /sys
, such as /sys/fs/cgroup
). For example:
$ docker run --runtime=sysbox-runc -it --rm --read-only nestybox/ubuntu-jammy-docker
# /sys is mounted read-only as expected
root@079cca62228a:/# findmnt | grep "|\-/sys"
|-/sys sysfs sysfs ro,nosuid,nodev,noexec,relatime
# But /sys can be remounted to read-write (a bug)
root@079cca62228a:/# mount -o remount,rw /sys
root@079cca62228a:/# findmnt | grep "|\-/sys"
|-/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime
# However other mounts (e.g., /sys/fs/cgroup) can't be remounted to read-write (as expected):
root@079cca62228a:/# mount -o remount,rw,bind /sys/fs/cgroup
mount: /sys/fs/cgroup: permission denied.
Now, Sysbox can be configured to allow the remount, by passing the --allow-immutable-remounts=true
flag to sysbox-fs via its systemd service (/lib/systemd/system/sysbox-fs.service
). For example, assuming that flag is set, then the remount of /sys/fs/cgroup
from read-only -> read-write is now allowed:
$ docker run --runtime=sysbox-runc -it --rm --read-only nestybox/ubuntu-jammy-docker
root@079cca62228a:/# mount -o remount,rw,bind /sys/fs/cgroup
root@5845499df3de:/# findmnt | grep "|\-/sys"
| |-/sys/fs/cgroup cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot
So now the question is more general: "Is this a bug of sysbox or how to enabled rw cgroup fs with read-only root fs then?"
So based on the above, and assuming you are starting the container with --read-only
(or equivalent in K8s), you can see that to do that you'll need to configure sysbox-fs with --allow-immutable-remounts=true
, as otherwise Sysbox won't allow the remount of /sys/fs/cgroup
to read-write.
Hope that helps!
When the container is started with read-only
and sysbox-fs is started with --allow-immutable-remounts=true
then I got an error on the remount of /sys/fs/cgroup
fails with the following error (as noted my first comment on the original issue):
$ sudo mount -o remount,rw /sys/fs/cgroup
mount: /sys/fs/cgroup: mount(2) system call failed: Function not implemented.
So that is a bug, right?
Hi @ctalledo, did you got some time to look into this bug?
I did not find any workaround yet, I from my point of view it is a critical security issue, because the promised virtualisation for root access on kubernetes containers without "privileged" access rights is broken if the container can create/write arbitrary files on the host/node system!
We have the issue, that root users of a sysbox enabled container can store large files in the root (/) directory (or self-created new sub-directories) of the container. This can cause disk-pressure on the underlying kubernetes worker node as well as trigger a DoS on that worker node (no other containers can store any data in /var/lib/ of the host node anymore).
To prevent such a scenario, putting the root filesystem to read-only on kubernetes is the preferred method.
Unfortunately, sysbox is changes the mount mode of /sys/fs/group as follows from:
to
You may notice that the /sys mount is read-write (rw) although it is not after the container start. I was able to remount it using
sudo mount -o remount,rw /sys
without problems. But same approach fails for /sys/fs/cgroup which is required to be "rw" for running inner docker containers:On the sysbox documentation (see https://github.com/nestybox/sysbox/blob/d61db2575c197fc8d37b54efb25027f454b75c17/docs/user-guide/security.md?plain=1#L218) it is stated that the enabling of the option
allow-immutable-remounts=true
can be set to allow such a remount.My question is now, how and where to set this option
allow-immutable-remounts=true
on a kubernetes sysbox deployment?