opencontainers / runc

CLI tool for spawning and running containers according to the OCI specification
https://www.opencontainers.org/
Apache License 2.0
11.86k stars 2.11k forks source link

rootfsPropagation=shared does not work #1755

Open alban opened 6 years ago

alban commented 6 years ago

Tested with runc from git today (git describe = v1.0.0-rc5-17-g9facb87f).

How to test with rootfsPropagation=shared:


oci-runtime-tool generate --privileged --linux-rootfs-propagation=shared --process-terminal=true --rootfs-path=/home/alban/distro-trees/f26/ > config.json
sudo strace -f -e unshare,clone,mount,pivot_root,chdir,fchdir runc run c1

[pid  1602] mount("", "/", 0xc42009520c, MS_SHARED, NULL) = 0
[pid  1602] mount("", "/home", 0xc4200957d6, MS_PRIVATE, NULL) = 0
[pid  1602] mount("/home/alban/distro-trees/f26/", "/home/alban/distro-trees/f26/", 0xc4200957d7, MS_BIND|MS_REC, NULL) = 0
[pid  1602] mount("proc", "/home/alban/distro-trees/f26/proc", "proc", 0, NULL) = 0
[pid  1602] mount("tmpfs", "/home/alban/distro-trees/f26/dev", "tmpfs", MS_NOSUID|MS_STRICTATIME, "mode=755,size=65536k") = 0
[pid  1602] mount("devpts", "/home/alban/distro-trees/f26/dev/pts", "devpts", MS_NOSUID|MS_NOEXEC, "newinstance,ptmxmode=0666,mode=0"...) = 0
[pid  1602] mount("shm", "/home/alban/distro-trees/f26/dev/shm", "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, "mode=1777,size=65536k") = 0
[pid  1602] mount("mqueue", "/home/alban/distro-trees/f26/dev/mqueue", "mqueue", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = 0
[pid  1602] mount("sysfs", "/home/alban/distro-trees/f26/sys", "sysfs", MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = 0
[pid  1602] chdir("/home/alban/distro-trees/f26/") = 0
[pid  1602] fchdir(9)                   = 0
[pid  1602] pivot_root(".", ".")        = 0
[pid  1602] fchdir(8)                   = 0
[pid  1602] mount("", ".", 0xc420095a64, MS_REC|MS_SLAVE, NULL) = 0
[pid  1602] chdir("/")                  = 0
[pid  1602] mount("/dev/pts/0", "/dev/console", 0xc420095ac8, MS_BIND, NULL) = 0
[pid  1602] chdir("/")                  = 0

How to test with rootfsPropagation=private:

oci-runtime-tool generate --privileged --linux-rootfs-propagation=private --process-terminal=true --rootfs-path=/home/alban/distro-trees/f26/ > config.json
sudo strace -f -e unshare,clone,mount,pivot_root,chdir,fchdir runc run c1

[pid  3878] mount("", "/", 0xc4200f6ebc, MS_PRIVATE, NULL) = 0
[pid  3878] mount("", "/home", 0xc4200f7476, MS_PRIVATE, NULL) = 0
[pid  3878] mount("/home/alban/distro-trees/f26/", "/home/alban/distro-trees/f26/", 0xc4200f7477, MS_BIND|MS_REC, NULL) = 0
[pid  3878] mount("proc", "/home/alban/distro-trees/f26/proc", "proc", 0, NULL) = 0
[pid  3878] mount("tmpfs", "/home/alban/distro-trees/f26/dev", "tmpfs", MS_NOSUID|MS_STRICTATIME, "mode=755,size=65536k") = 0
[pid  3878] mount("devpts", "/home/alban/distro-trees/f26/dev/pts", "devpts", MS_NOSUID|MS_NOEXEC, "newinstance,ptmxmode=0666,mode=0"...) = 0
[pid  3878] mount("shm", "/home/alban/distro-trees/f26/dev/shm", "tmpfs", MS_NOSUID|MS_NODEV|MS_NOEXEC, "mode=1777,size=65536k") = 0
[pid  3878] mount("mqueue", "/home/alban/distro-trees/f26/dev/mqueue", "mqueue", MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = 0
[pid  3878] mount("sysfs", "/home/alban/distro-trees/f26/sys", "sysfs", MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOEXEC, NULL) = 0
[pid  3878] chdir("/home/alban/distro-trees/f26/") = 0
[pid  3878] fchdir(9)                   = 0
[pid  3878] pivot_root(".", ".")        = 0
[pid  3878] fchdir(8)                   = 0
[pid  3878] mount("", ".", 0xc4200f7704, MS_REC|MS_SLAVE, NULL) = 0
[pid  3878] chdir("/")                  = 0
[pid  3878] mount("/dev/pts/0", "/dev/console", 0xc4200f7768, MS_BIND, NULL) = 0
[pid  3878] chdir("/")                  = 0

At a first glance, changing the flag rootfsPropagation appears to do the correct thing: the line 1 of the strace log uses MS_SHARED or MS_PRIVATE depending on the rootfsPropagation flag.

However, cat /proc/self/mountinfo in the container shows that it does not work. I have to run mount --make-shared / manually in the container to make it shared.

The mount call for the rootfsPropagation is not done on the rootfs but on the oldrootfs, before the pivot_root. Since it is not recursive, it has not effect on the container rootfs. Then I tried --linux-rootfs-propagation=rshared but it still does not work. I wonder if the mount("", ".", ... MS_REC|MS_SLAVE after the pivot_root reverts the effect.

cyphar commented 6 years ago

I wonder if the mount("", ".", ... MS_REC|MS_SLAVE after the pivot_root reverts the effect.

It shouldn't. In that context, . is the oldroot not the new one. (Or at least it was when I wrote this comment.) But you're right that this line is quite suspect:

mount("", "/", 0xc4200f6ebc, MS_PRIVATE, NULL) = 0

Hmmm...

dongsupark commented 6 years ago

I can reproduce this failure, exactly as @alban described.

First of all, the slave mount after pivot_root is already correct. It should remain there as @cyphar said. Without that part, mount propagation affects the host, and the host gets broken.

The main problem is that in prepareRoot() the host rootfs is mounted with the given flag such as MS_SHARED, and after that, the container rootfs is mounted with MS_PRIVATE. As a result, the original flag is overridden. This is not an issue if the given flag is MS_PRIVATE or MS_SLAVE, but it is definitely an issue when MS_SHARED is given.

Though a tricky thing is that we need to call rootfsParentMountPrivate() to prevent pivot_root from failing. So I came up with an approach of checking for mount flags before calling rootfsParentMountPrivate(), and doing chroot() instead of pivotRoot(). I'll create a PR.