cgroup option only honoured if not running in container

benaryorg commented 3 years ago

Currently the cgroup hierarchy is only mounted if not running in a virtualized environment (as added via #58):

https://github.com/void-linux/void-runit/blob/42ca737148ea530dad5945af1a4eb7e471e8b637/core-services/00-pseudofs.sh#L13

However even inside a container it is sometimes preferable to have the guest system initialize cgroups if available¹. The current situation allows CGROUP_MODE to be set to none (or any other string than hybrid, legacy, or unified) to disable such behavior, but no option to enforce the behavior even in a virtualized environment. Unless one starts messing with files which are prone to be overwritten on every update, or one duplicates the code which incurs technical debt.

I am not involved with the topic enough to gauge the effects of enabling this in containers by default however, and I expect the solution to be more involved than to remove that if, hence the issue rather than a PR.

¹: In my case lxc.mount.auto = cgroup:mixed:force is not available, so the only reasonable way is for this to be handled by the code there.

kevcrumb commented 1 year ago

Elaborating on the issue, here's (rootless) Docker as an example for an application requiring cgroups.

dockerd can successfully be started from within a Void container, at least when skipping the service file's attempt to modprobe anything:

grep -vw modprobe /etc/sv/docker/run | sudo

The daemon is running, but docker run hello-world will report an error:

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: mountpoint for devices not found: unknown.
ERRO[0000] error waiting for container: context canceled`

The daemon will additonally mention:

mkdir: cannot create directory ‘/sys/fs/cgroup/systemd’: No such file or directory
mount: /sys/fs/cgroup/systemd: mount point does not exist.
       dmesg(1) may have more information after failed mount system call.
...
...
copy shim log                                 error="read /proc/self/fd/14: file already closed"

To resolve, /sys/fs/cgroup has to at least be made writeable (docker wants to create the systemd cgroup there) and the devices cgroup has to be mounted:

sudo mount -t tmpfs cgroup /sys/fs/cgroup
sudo mkdir /sys/fs/cgroup/devices
mount -t cgroup -o devices none /sys/fs/cgroup/devices

(At startup, dockerd will still complain about other missing cgroups. Those can be made available too by utilizing the above mkdir & mount, but aren't necessary in this example.)

At this point docker run hello-world should execute successfully.

CameronNemo commented 1 year ago

See #103 for some more thoughts on this situation.

@kevcrumb I notice you chose to use cgroup1 inside the container. What is that decision based on? The fact that the docker service basically assumes cgroup1, or did you try cgroup2 and it caused issues? Could you try again with cgroup2 and a docker service run file with the cgroup handling removed?

void-linux / void-runit

cgroup option only honoured if not running in container #74