nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.8k stars 155 forks source link

Cannot mount VM shared folders inside docker system container #716

Open Lucky1313 opened 1 year ago

Lucky1313 commented 1 year ago

I guess I'm not 100% sure this is actually a supported use case, but I couldn't find evidence either way.

My usage for sysbox is to be able to run a VM (using vagrant) which can then use sysbox to run system containers, that can then be used to test my application running inside the system container (that needs to use docker and systemd directly). Right now running on an Ubuntu machine, but the thinking with the VM is to be able to support using a MacOS host as well.

Everything seems to work, except for setting correct IDs for bind mounts (from VM filesystem to inner docker filesystem). Looking at the logs for sysbox-mgr, it seems like none of the methods for ID-mapping for mounts succeed:

Jul 11 20:09:09 ubuntu2204.localdomain systemd[1]: Starting sysbox-mgr (part of the Sysbox container runtime)...
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Starting ..."
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Sysbox data root: /var/lib/sysbox"
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Shiftfs module found in kernel: yes"
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Shiftfs works properly: no"
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Shiftfs-on-overlayfs works properly: no"
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="ID-mapped mounts supported by kernel: yes"
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Overlayfs on ID-mapped mounts supported by kernel: no"
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Operating in system container mode."
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Inner container image preloading enabled."
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Listening on /run/sysbox/sysmgr.sock"
Jul 11 20:09:09 ubuntu2204.localdomain systemd[1]: Started sysbox-mgr (part of the Sysbox container runtime).
Jul 11 20:09:09 ubuntu2204.localdomain sysbox-mgr[9948]: time="2023-07-11 20:09:09" level=info msg="Ready ..."

Tested using both virtualbox and libvirt (kvm) virtualization for vagrant.

Host Machine

Ubuntu 22.04 with 5.17 kernel

VM Machine

Ubuntu 22.04 with 5.15 kernel + ShiftFS installed Sysbox version 0.6.1 Docker version 24.0.4

Recreate

# Using virtualbox
$ vagrant up
# Using libvirt
$ vagrant up --provider=libvirt
$ vagrant ssh
# Inside VM
$ docker run --runtime=sysbox-runc -it -v /home/vagrant:/root/ws ubuntu:focal /bin/bash
# Inside Container
$ ls -la /root/
# Should see
total 24
drwx------ 1 root   root    4096 Jul 11 20:41 .
drwxr-xr-x 1 root   root    4096 Jul 11 20:41 ..
-rw-r--r-- 1 root   root    3106 Dec  5  2019 .bashrc
-rw-r--r-- 1 root   root     161 Dec  5  2019 .profile
drwxr-x--- 6 nobody nogroup 4096 Jul 11 20:11 ws

This will happen for mounting folders from the VM disk into the docker container, as well as for synced VM folders (through either virtualbox or libvirtio-fs).

Vagrantfile (libvirt) ```ruby $script = <<-SCRIPT set -euxo pipefail export DEBIAN_FRONTEND=noninteractive echo "Installing docker" apt-get update apt-get install -y ca-certificates curl gnupg mkdir -p /etc/apt/keyrings/ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --batch --dearmor -o /etc/apt/keyrings/docker.gpg chmod a+r /etc/apt/keyrings/docker.gpg echo \ "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" > \ /etc/apt/sources.list.d/docker.list apt-get update apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin usermod -a -G docker vagrant # Install shiftfs apt-get install -y make dkms git wget git clone -b k5.16 https://github.com/toby63/shiftfs-dkms.git shiftfs-k516 pushd shiftfs-k516 ./update1 make -f Makefile.dkms modinfo shiftfs popd echo "Installing sysbox" # From https://github.com/nestybox/sysbox/blob/master/docs/user-guide/install-package.md wget -q https://downloads.nestybox.com/sysbox/releases/v0.6.1/sysbox-ce_0.6.1-0.linux_amd64.deb apt-get install -y jq apt-get install -y ./sysbox-ce_0.6.1-0.linux_amd64.deb rm sysbox-ce_0.6.1-0.linux_amd64.deb echo "Done" SCRIPT Vagrant.configure("2") do |config| config.vm.box = "generic/ubuntu2204" config.vm.provision "shell", inline: $script config.vm.synced_folder "./", "/home/vagrant/ws/", type: "virtiofs" config.ssh.forward_agent = true config.vm.provider :libvirt do |libvirt| libvirt.cpus = 4 libvirt.memory = 8192 libvirt.memorybacking :access, :mode => "shared" end end ```
Vagrantfile (virtualbox) ```ruby $script = <<-SCRIPT set -euxo pipefail export DEBIAN_FRONTEND=noninteractive echo "Installing docker" apt-get update apt-get install -y ca-certificates curl gnupg mkdir -p /etc/apt/keyrings/ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --batch --dearmor -o /etc/apt/keyrings/docker.gpg chmod a+r /etc/apt/keyrings/docker.gpg echo \ "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" > \ /etc/apt/sources.list.d/docker.list apt-get update apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin usermod -a -G docker vagrant # Install shiftfs apt-get install -y make dkms git wget git clone -b k5.16 https://github.com/toby63/shiftfs-dkms.git shiftfs-k516 pushd shiftfs-k516 ./update1 make -f Makefile.dkms modinfo shiftfs popd echo "Installing sysbox" # From https://github.com/nestybox/sysbox/blob/master/docs/user-guide/install-package.md wget -q https://downloads.nestybox.com/sysbox/releases/v0.6.1/sysbox-ce_0.6.1-0.linux_amd64.deb apt-get install -y jq apt-get install -y ./sysbox-ce_0.6.1-0.linux_amd64.deb rm sysbox-ce_0.6.1-0.linux_amd64.deb echo "Done" SCRIPT Vagrant.configure("2") do |config| config.vm.box = "ubuntu/jammy64" config.vm.provision "shell", inline: $script config.vm.synced_folder "./", "/home/vagrant/ws/" config.ssh.forward_agent = true config.vm.provider "virtualbox" do |v| v.memory = 4096 v.cpus = 8 end end ```
rodnymolina commented 1 year ago

@Lucky1313, thanks for reporting this one. Haven't looked in detail yet, but yes, this scenario (vagrant) is fully supported, that's actually what I use most of the time.

Before we spend any time on this, could you please try to install the latest sysbox release (v0.6.2) as there are important enhancements in this area?

Lucky1313 commented 1 year ago

Sorry about having the wrong version, but confirmed issue is still present on v0.6.2. It does look like the detection of host capabilities is different:

Jul 11 22:33:16 ubuntu-jammy systemd[1]: Starting sysbox-mgr (part of the Sysbox container runtime)...
Jul 11 22:33:16 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:16" level=info msg="Starting ..."
Jul 11 22:33:16 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:16" level=info msg="Sysbox data root: /var/lib/sysbox"
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="Shiftfs module found in kernel: yes"
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="Shiftfs works properly: yes"
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="Shiftfs-on-overlayfs works properly: yes"
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="ID-mapped mounts supported by kernel: yes"
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="Overlayfs on ID-mapped mounts supported by kernel: no"
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="Operating in system container mode."
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="Inner container image preloading enabled."
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="Listening on /run/sysbox/sysmgr.sock"
Jul 11 22:33:17 ubuntu-jammy sysbox-mgr[5359]: time="2023-07-11 22:33:17" level=info msg="Ready ..."
Jul 11 22:33:17 ubuntu-jammy systemd[1]: Started sysbox-mgr (part of the Sysbox container runtime).

But bind mounted folder still has nobody:nogroup IDs.

Sysbox versions:

Jul 11 22:33:17 ubuntu-jammy systemd[1]: Started Sysbox container runtime.
Jul 11 22:33:17 ubuntu-jammy sh[5382]: sysbox-runc
Jul 11 22:33:17 ubuntu-jammy sh[5382]:         edition:         Community Edition (CE)
Jul 11 22:33:17 ubuntu-jammy sh[5382]:         version:         0.6.2
Jul 11 22:33:17 ubuntu-jammy sh[5382]:         commit:         60ca93c783b19c63581e34aa183421ce0b9b26b7
Jul 11 22:33:17 ubuntu-jammy sh[5382]:         built at:         Mon Jun 12 03:49:19 UTC 2023
Jul 11 22:33:17 ubuntu-jammy sh[5382]:         built by:         Cesar Talledo
Jul 11 22:33:17 ubuntu-jammy sh[5382]:         oci-specs:         1.0.2-dev
Jul 11 22:33:17 ubuntu-jammy sh[5389]: sysbox-mgr
Jul 11 22:33:17 ubuntu-jammy sh[5389]:         edition:         Community Edition (CE)
Jul 11 22:33:17 ubuntu-jammy sh[5389]:         version:         0.6.2
Jul 11 22:33:17 ubuntu-jammy sh[5389]:         commit:         4b5fb1def9abe6a256cfe62bacaf2a7d333d81d2
Jul 11 22:33:17 ubuntu-jammy sh[5389]:         built at:         Mon Jun 12 03:49:55 UTC 2023
Jul 11 22:33:17 ubuntu-jammy sh[5389]:         built by:         Cesar Talledo
Jul 11 22:33:17 ubuntu-jammy sh[5394]: sysbox-fs
Jul 11 22:33:17 ubuntu-jammy sh[5394]:         edition:         Community Edition (CE)
Jul 11 22:33:17 ubuntu-jammy sh[5394]:         version:         0.6.2
Jul 11 22:33:17 ubuntu-jammy sh[5394]:         commit:         30fd49edbd51048fed8b2ad0af327598d30b29eb
Jul 11 22:33:17 ubuntu-jammy sh[5394]:         built at:         Mon Jun 12 03:49:46 UTC 2023
Jul 11 22:33:17 ubuntu-jammy sh[5394]:         built by:         Cesar Talledo
rodnymolina commented 1 year ago

@Lucky1313, sorry for the delay. The sysbox-mgr logs above indicate that shiftfs is properly being detected and no operational issues are being found during initialization.

For that reason, I suspect that the problem is not with your container's root file-system, on which shiftfs is probably working fine. The UID mismatch that you are observing is probably specific to the resources being bind-mounted, not sure why, maybe due the fact that the underlying file-system is virtiofs (?)...

To confirm the above and help us narrow down the issue, please do the following:

Lucky1313 commented 1 year ago

Verified that it is the synced folder that causes the issue, regardless of if it is a virtualbox or virtiofs folder. Interestingly, by the look of it, just having a synced folder in the directory tree of the internally docker mounted folder will cause the entire tree to have improperly mapped IDs (i.e. even though the synced folder in the VM is /home/vagrant/ws, all of /home/vagrant gets the bad ID when it's mounted inside the docker container). Paths without synced folders in them at all do get properly mounted inside the docker container correctly.

Tried upgrading kernel on both the virtualbox and libvirt providers, to both 5.19.0-50 and 6.2.0-26, no changes for any system.

So looks like synced folders in vagrant aren't supported at this time?

`findmnt` on Virtualbox ```bash vagrant@ubuntu-jammy:~$ findmnt TARGET SOURCE FSTYPE OPTIONS / /dev/sda1 ext4 rw,relatime,discard,errors=remount-ro ├─/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime │ ├─/sys/kernel/security securityfs securityfs rw,nosuid,nodev,noexec,relatime │ ├─/sys/fs/cgroup cgroup2 cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot │ ├─/sys/fs/pstore pstore pstore rw,nosuid,nodev,noexec,relatime │ ├─/sys/fs/bpf bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 │ ├─/sys/kernel/debug debugfs debugfs rw,nosuid,nodev,noexec,relatime │ ├─/sys/kernel/tracing tracefs tracefs rw,nosuid,nodev,noexec,relatime │ ├─/sys/kernel/config configfs configfs rw,nosuid,nodev,noexec,relatime │ └─/sys/fs/fuse/connections fusectl fusectl rw,nosuid,nodev,noexec,relatime ├─/proc proc proc rw,nosuid,nodev,noexec,relatime │ └─/proc/sys/fs/binfmt_misc systemd-1 autofs rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=251 │ └─/proc/sys/fs/binfmt_misc binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime ├─/dev udev devtmpfs rw,nosuid,relatime,size=1980944k,nr_inodes=495236,mode=755,inode64 │ ├─/dev/pts devpts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 │ ├─/dev/shm tmpfs tmpfs rw,nosuid,nodev,inode64 │ ├─/dev/hugepages hugetlbfs hugetlbfs rw,relatime,pagesize=2M │ └─/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime ├─/run tmpfs tmpfs rw,nosuid,nodev,noexec,relatime,size=400392k,mode=755,inode64 │ ├─/run/lock tmpfs tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 │ ├─/run/credentials/systemd-sysusers.service ramfs ramfs ro,nosuid,nodev,noexec,relatime,mode=700 │ ├─/run/user/1000 tmpfs tmpfs rw,nosuid,nodev,relatime,size=400388k,nr_inodes=100097,mode=700,uid=1000,gid=1000,inode64 │ └─/run/snapd/ns tmpfs[/snapd/ns] tmpfs rw,nosuid,nodev,noexec,relatime,size=400392k,mode=755,inode64 │ └─/run/snapd/ns/lxd.mnt nsfs[mnt:[4026532195]] nsfs rw ├─/snap/core20/1891 /dev/loop0 squashfs ro,nodev,relatime,errors=continue ├─/snap/lxd/24322 /dev/loop1 squashfs ro,nodev,relatime,errors=continue ├─/snap/snapd/19361 /dev/loop2 squashfs ro,nodev,relatime,errors=continue ├─/vagrant vagrant vboxsf rw,relatime │ └─/vagrant vagrant vboxsf rw,relatime └─/home/vagrant/ws home_vagrant_ws_ vboxsf rw,relatime └─/home/vagrant/ws home_vagrant_ws_ vboxsf rw,relatime ```
`findmnt` on libvirt ```bash TARGET SOURCE FSTYPE OPTIONS / /dev/mapper/ubuntu--vg-ubuntu--lv ext4 rw,relatime ├─/sys sysfs sysfs rw,nosuid,nodev,noexec,relatime │ ├─/sys/kernel/security securityfs securityfs rw,nosuid,nodev,noexec,relatime │ ├─/sys/fs/cgroup cgroup2 cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot │ ├─/sys/fs/pstore pstore pstore rw,nosuid,nodev,noexec,relatime │ ├─/sys/fs/bpf bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 │ ├─/sys/kernel/tracing tracefs tracefs rw,nosuid,nodev,noexec,relatime │ ├─/sys/kernel/debug debugfs debugfs rw,nosuid,nodev,noexec,relatime │ ├─/sys/fs/fuse/connections fusectl fusectl rw,nosuid,nodev,noexec,relatime │ └─/sys/kernel/config configfs configfs rw,nosuid,nodev,noexec,relatime ├─/proc proc proc rw,nosuid,nodev,noexec,relatime │ └─/proc/sys/fs/binfmt_misc systemd-1 autofs rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=18851 │ └─/proc/sys/fs/binfmt_misc binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime ├─/dev udev devtmpfs rw,nosuid,relatime,size=4011824k,nr_inodes=1002956,mode=755,inode64 │ ├─/dev/pts devpts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 │ ├─/dev/shm tmpfs tmpfs rw,nosuid,nodev,inode64 │ ├─/dev/hugepages hugetlbfs hugetlbfs rw,relatime,pagesize=2M │ └─/dev/mqueue mqueue mqueue rw,nosuid,nodev,noexec,relatime ├─/run tmpfs tmpfs rw,nosuid,nodev,noexec,relatime,size=814028k,mode=755,inode64 │ ├─/run/lock tmpfs tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 │ ├─/run/credentials/systemd-sysusers.service none ramfs ro,nosuid,nodev,noexec,relatime,mode=700 │ ├─/run/snapd/ns tmpfs[/snapd/ns] tmpfs rw,nosuid,nodev,noexec,relatime,size=814028k,mode=755,inode64 │ │ └─/run/snapd/ns/lxd.mnt nsfs[mnt:[4026532381]] nsfs rw │ └─/run/user/1000 tmpfs tmpfs rw,nosuid,nodev,relatime,size=814028k,nr_inodes=203507,mode=700,uid=1000,gid=1000,inode64 ├─/snap/lxd/24322 /dev/loop0 squashfs ro,nodev,relatime,errors=continue ├─/snap/core20/1822 /dev/loop1 squashfs ro,nodev,relatime,errors=continue ├─/snap/snapd/18357 /dev/loop2 squashfs ro,nodev,relatime,errors=continue ├─/boot /dev/vda2 ext4 rw,relatime └─/home/vagrant/ws d3fa989972cdf06a7ed8de28edaa950 virtiofs rw,relatime ```