opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.74k stars 2.09k forks source link

Default linux cgroup devices.allow rules too general #1618

Open dasizeman opened 6 years ago

dasizeman commented 6 years ago

In runc/libcontainer/specconv/spec_linux.go, there are two default rules, c *:* m and b *:* m, commented "allow mknod for any device". I have a use case where I am scheduling Docker containers using the SLURM resource manager, and specifically the character device rule is causing issues.

When SLURM restricts access to GPU devices (which are character devices), it writes rwm rules to devices.deny. When we run the a Docker job through SLURM, we are using Docker's --cgroup-parent option to set SLURM's cgroup restrictions as the parent of Docker's policies, ensuring that the SLURM restrictions are propagated.

This is where the issue come's in. When running the container, runc is trying to set the aforementioned c *:* m rule for Docker, but this is in conflict with one or more of the parent GPU deny rules (set by SLURM), so the container fails to run.

I have tried building a custom version of runc that gets rid of the c *:* m rule, but it seems like this causes all container runs to completely break (hang).

It seems like runc should not be setting device rules that are this global by default, since it can cause conflicts with other cgroup policies like I am observing. This could also cause issues when users try to set explicit device rules with moby's --device-cgroup-rule flag (introduced here.).

Similarly, the b *:* m rule could cause issues with block device cgroup rules defined in the parent policy.

Does anyone have insight into what devices actually need mknod device permissions in order for containers to run properly? I think these default rules should be restricted unless there is some explicit reason for them to be so global.

cyphar commented 6 years ago

runc creates several devices during startup (like /dev/full, /dev/null, /dev/random, /dev/urandom, /dev/zero and a few others depending on how the TTY was set up) -- so disabling creation of all devices will likely break things. Though I'm not quite sure why it's hanging rather than providing an error.

On the other hand, I agree that it's unnecessary for us to enable mknod of all character and block devices. This is not a security issue per-se, but you're right that we shouldn't need those permissions...

/cc @crosbymichael -- I think this came from libcontainer no?

dasizeman commented 6 years ago

There are already explicit allow rules for the devices you mentioned and some others, so I'm wondering what device requires mknod access that is being denied when I remove the global rules. Any insight or suggestions into debugging this?

cyphar commented 6 years ago

@dasizeman If you can run runc under strace -f you'd be able to figure out what mknod is failing.

Just a quick note: there's a well-known kernel bug that means when you try to strace -f a multi-threaded process that does an exec, strace may stall indefinitely when exec is executed. If that happens, just kill strace and then try again -- it'll work eventually since the bug is a race condition related to the kernel's signal handling when a process is ptrace attached.

dasizeman commented 6 years ago

It turns out there must have been some sort of issue with my build environment when I was seeing the hanging issues. I have removed the global mknod rules and I am unable to see any negative effects in a development build. I have created a PR to remove the rules. Please verify/advise if there is anything you need from my end.