Closed rodnymolina closed 4 years ago
Just spent some time testing this one on Fedora-30. Unfortunately, we are reproducing the same issue there:
[root@fedora-30 overlayfs]# mount -t overlay overlay -o lowerdir=/home/rodny/overlayfs/lower,upperdir=/home/rodny/overlayfs/upper,workdir=/home/rodny/overlayfs/work /home/rodny/overlayfs/merged
mount: /home/rodny/overlayfs/merged: permission denied.
[root@fedora-30 overlayfs]#
[rodny@fedora-30 ~]$ uname -a
Linux fedora-30 5.0.16-300.fc30.x86_64 #1 SMP Tue May 14 19:33:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[rodny@fedora-30 ~]$
We recently added mount syscall interception to sysbox system containers.
We can leverage this feature to solve this issue, by having sysbox intercept mounts of overlayfs by processes in the sys container and perform those on behalf of the sys container.
This would bypass the permission problems, since sysbox is true root on the host.
More importantly, it would make Sysbox less dependent on Ubuntu, opening the door to supporting other distros.
Note however that mount syscall interception relies on very recent linux kernels (seccomp-notify mechanism + seccomp-notify "continue").
NOTE: This issue, combined with issue #160, mean that support for system containers on ext4 will require Ubuntu Disco (linux kernel 5.0).
As expected, problem is easily reproduced in Centos 8 too:
[root@centos-8-vm ~]# uname -a
Linux centos-8-vm 4.18.0-193.6.3.el8_2.x86_64 #1 SMP Wed Jun 10 11:09:32 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@centos-8-vm ~]# mkdir lower upper work merged
[root@centos-8-vm ~]# ls -lrt
total 16
drwxr-xr-x. 2 root root 6 Jul 12 04:37 upper
drwxr-xr-x. 2 root root 6 Jul 12 04:37 lower
drwxr-xr-x. 2 root root 6 Jul 12 04:37 work
drwxr-xr-x. 2 root root 6 Jul 12 04:37 merged
[root@centos-8-vm ~]#
[root@centos-8-vm ~]# unshare -i -m -n -p -u -U -C -f --mount-proc -r /bin/bash
[root@centos-8-vm ~]# ls -lrt
total 16
drwxr-xr-x. 2 root root 6 Jul 12 04:37 upper
drwxr-xr-x. 2 root root 6 Jul 12 04:37 lower
drwxr-xr-x. 2 root root 6 Jul 12 04:37 work
drwxr-xr-x. 2 root root 6 Jul 12 04:37 merged
[root@centos-8-vm ~]#
[root@centos-8-vm ~]# pwd
/root
[root@centos-8-vm ~]# mount -t overlay overlay -o lowerdir=/root/lower,upperdir=/root/upper,workdir=/root/work /root/merged
mount: /root/merged: permission denied.
[root@centos-8-vm ~]#
Most of the non-debian based distros are incapable of mounting overlayfs over unprivileged user-namespaces. They seem to be relying on fuse-overlayfs tool to workaround this issue.
In Redhat's case, their kernel will allow fuse-overlayfs utilization starting in 4.18+, and they are even considering to backport fuse-overlayfs to 3.10 kernel. On the other hand, they are fully aware of the runtime penalty that implies running this feature in user-space. See more details here:
https://indico.cern.ch/event/757415/contributions/3421994/attachments/1855302/3047064/Podman_Rootless_Containers.pdf https://www.redhat.com/sysadmin/behind-scenes-podman
As part of this task, we should also investigate what other distros have this problem.
Just got a working implementation of overlayfs-mount handler by making use of our syscall-trapping infrastructure. There is still one loose-end to take care of (i.e. docker nesting not working yet), but at least we can now successfully mount overlayfs within an unprivileged user-namespace context.
root@test-1:~# cd /var/lib/docker
root@test-1:/var/lib/docker#
root@test-1:/var/lib/docker# mkdir rodny
root@test-1:/var/lib/docker# cd rodny/
root@test-1:/var/lib/docker/rodny# mkdir lower upper work merged
root@test-1:/var/lib/docker/rodny#
<-- Before changes ...
root@test-1:/var/lib/docker/rodny# mount -t overlay overlay -olowerdir=/var/lib/docker/rodny/lower,upperdir=/var/lib/docker/rodny/upper,workdir=/var/lib/docker/rodny/work /var/lib/docker/rodny/merged
mount: /var/lib/docker/rodny/merged: permission denied.
root@test-1:/var/lib/docker/rodny#
<-- After ...
root@test-1:/var/lib/docker/rodny# mount -t overlay overlay -olowerdir=/var/lib/docker/rodny/lower,upperdir=/var/lib/docker/rodny/upper,workdir=/var/lib/docker/rodny/work /var/lib/docker/rodny/merged
root@test-1:/var/lib/docker/rodny# findmnt
TARGET SOURCE FSTYPE OPTIONS
...
|-/var/lib/docker /dev/vda1[/var/lib/sysbox/docker/baseVol/63e6efda3700689e415b795304167c364190800f21c62b9bd7b915a6154d86d4]
| xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota
| `-/var/lib/docker/rodny/merged overlay overlay
rw,relatime,seclabel,lowerdir=/var/lib/docker/rodny/lower,upperdir=/var/lib/docker/rodny/upper,workdir=/var/lib/docker/rodny/work
Fixed by PR https://github.com/nestybox/sysbox-fs/pull/10. Closing.
In the mainline Linux kernel, it's not possible to mount overlayfs from within a container (or more accurately from outside the initial user-namespace). Doing so causes
permission denied
response.To reproduce the issue, simply enter a user-namespace and the mount overlayfs:
It's not clear to me why this restriction exists; it may be related to the security issue described in this lwn.net article.
This is a problem as it won't allow us to run system containers on ext4, because if an inner docker daemon is launched, the inner docker will try to mount overlayfs for the container images and this operation will fail.
Fortunately the problem does not occur on Ubuntu. There appears to be a patch from Ubuntu that allows this. As described in here:
"Ubuntu carries a patch that allows overlayfs mounting inside of an unprivileged user namespace, so we were carrying the fix mentioned above as a delta against the upstream Linux kernel since the issue didn't affect upstream overlayfs. "
Note that the problem does not affect system containers on btrfs, because in that case overlayfs is not used by the inner docker; it uses btrfs subvolumes.
(Ref #62)