Open maleadt opened 1 year ago
I barely remember this depends on the kernel version, so some kernels (mistakenly) denied this mount.
Two possible solutions are:
I am not sure what are the implications of bind-mounting host /sys
, and so I would not recommend doing that (without doing some security analysis first, that is).
Now,
Based on these two points, I am closing this as not-a-bug.
Let me know if you feel different.
There's nothing runc can do about this (there's no easy workaround, and bind-mounting /sys is questionable)
But crun
manages fine? I'm unfamiliar with the exact logic taking care of mounting sysfs, but this seems to indicate that there is a way to deal with this from the runtime's side.
Also, I'm happy to upgrade my kernel, but I'm using 5.15 -- the latest LTS -- which isn't exactly ancient. It's still what e.g. Ubuntu 22.04 is using/supporting for the next 5 years or so.
Also, this reproduces on kernel 6.0.10 (Arch Linux)...
OK, please tell us how to repro this (what is your environment and the steps to repro) and we'll take a look.
OK, please tell us how to repro this (what is your environment and the steps to repro) and we'll take a look.
There's not much more to to it than what I've reported here:
./runc.amd64 run test
ERRO[0000] runc run failed: unable to start container process: error during container init: error mounting "sysfs" to rootfs at "/sys": mount sysfs:/sys (via /proc/self/fd/7), flags: 0xe: operation not permitted
This is not runc bug, kernels denied this mount. this is right
why crun can mount sysfs?
because if in user namespace, crun bind /sys not sysfs
I see; thanks!
https://github.com/containers/crun/commit/6785cefbdf982c97a5552c9ce7017b0e8309c291
We should do the same for runc I guess
Note that runc spec --rootless
generates a spec which has /sys
as a bind mount. I guess that is why we never saw this error. The code was added by #744 (specifically, commit d04cbc49d2ae4488a566eab86102c398522aaf14).
I think we still have to support replacing a proper /sys
mount with a bind mount because crun does it.
I'm trying out
runc
to get a simple unpriviliged containerized execution, but am having issues mountingsysfs
:Meanwhile,
crun
manages fine:Full config
```json { "ociVersion": "1.0.1", "platform": { "os": "linux", "arch": "amd64" }, "root": { "path": "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f", "readonly": true }, "mounts": [ { "destination": "/proc", "type": "proc", "source": "proc" }, { "destination": "/dev", "type": "tmpfs", "source": "tmpfs", "options": [ "nosuid", "strictatime", "mode=755", "size=65536k" ] }, { "destination": "/dev/pts", "type": "devpts", "source": "devpts", "options": [ "nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620" ] }, { "destination": "/dev/shm", "type": "tmpfs", "source": "shm", "options": [ "nosuid", "noexec", "nodev", "mode=1777", "size=65536k" ] }, { "destination": "/dev/mqueue", "type": "mqueue", "source": "mqueue", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys", "type": "sysfs", "source": "sysfs", "options": [ "nosuid", "noexec", "nodev" ] }, { "destination": "/sys/fs/cgroup", "type": "cgroup", "source": "cgroup", "options": [ "nosuid", "noexec", "nodev", "relatime", "ro" ] } ], "process": { "terminal": true, "cwd": "/root", "env": [ "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "TERM=xterm" ], "args": [ "/bin/bash", "--login" ], "rlimits": [ { "type": "RLIMIT_NOFILE", "hard": 1024, "soft": 1024 } ], "capabilities": { "bounding": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ], "permitted": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ], "inheritable": [ "CAP_AUDIT_WRITE", "CAP_KILL", "CAP_NET_BIND_SERVICE" ], "effective": [ "CAP_AUDIT_WRITE", "CAP_KILL" ], "ambient": [ "CAP_NET_BIND_SERVICE" ] }, "noNewPrivileges": true }, "user": { "uid": 0, "gid": 0 }, "hostname": "test", "linux": { "resources": { "devices": [ { "allow": false, "access": "rwm" } ] }, "namespaces": [ { "type": "pid" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "user" }, { "type": "cgroup" } ], "uidMappings": [ { "containerID": 0, "hostID": 1000, "size": 1 } ], "gidMappings": [ { "containerID": 0, "hostID": 1000, "size": 1 } ], "devices": null } } ```Binding
sys
instead works around the issue: