nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.84k stars 160 forks source link

Unable to Start Docker Daemon in Sysbox Container with NFS Mount #811

Open bushev opened 5 months ago

bushev commented 5 months ago

Hi, the Docker daemon fails to start within a Sysbox container when the Docker directory is mounted from an NFS server. The issue appears to be related to permissions on the NFS-mounted directory.

On NFS Server

cat /etc/exports

/var/cs/users 10.0.105.0/24(rw,no_subtree_check,all_squash,anonuid=166537,anongid=165536)

Where, 166537 is 165536 + 1001 (1001 is GUI of the user in a Docker container)

ls -lah /var/cs/users/952/u-9524/.ide

total 12K
drwxr-xr-x  5 166537 165536  113 Jun  9 02:05  .
drwxr-xr-x 22 166537 165536 4.0K Jun 13 14:21  ..
drwx--x--- 12 166537 165536  214 Jun 12 18:58  docker
-rw-r--r--  1 166537 165536   35 Jun  9 02:05  info.json

sudo ls -lah /var/cs/users/952/u-9524/.ide/docker

total 4.0K
drwx--x--- 12 166537 165536 214 Jun 12 18:58 .
drwxr-xr-x  5 166537 165536 113 Jun  9 02:05 ..
drwx--x--x  4 166537 165536 170 Jun  9 02:05 buildkit
drwx--x---  2 166537 165536  10 Jun 12 18:58 containers
-rw-------  1 166537 165536  36 Jun  9 02:05 engine-id
drwx------  3 166537 165536  25 Jun  9 02:05 image
drwxr-x---  3 166537 165536  27 Jun  9 02:05 network
drwx------  4 166537 165536  44 Jun  9 02:05 plugins
drwx------  2 166537 165536  10 Jun 12 18:58 runtimes
drwx------  2 166537 165536  10 Jun  9 02:05 swarm
drwx------  2 166537 165536  10 Jun 12 18:58 tmp
drwx--x---  3 166537 165536  25 Jun  9 02:06 vfs
drwx-----x  2 166537 165536  33 Jun  9 02:05 volumes

On Container's Host

cat /etc/docker/daemon.json

{
"userns-remap": "sysbox",
"runtimes": {
"sysbox-runc": {
"path": "/usr/bin/sysbox-runc"
}
},
"bip": "172.20.0.1/16",
"default-address-pools": [
{
"base": "172.25.0.0/16",
"size": 24
}
],
"insecure-registries": [
"10.0.200.37:5000"
]
}

sudo ls -lah /mnt/nfs/users/952/u-9524/.ide

total 12K
drwxr-xr-x  5 166537 165536  113 Jun  9 02:05  .
drwxr-xr-x 22 166537 165536 4.0K Jun 13 14:46  ..
drwx--x--- 12 166537 165536  214 Jun 12 18:58  docker
-rw-r--r--  1 166537 165536   35 Jun  9 02:05  info.json

sudo ls -lah /mnt/nfs/users/952/u-9524/.ide/docker

total 4.0K
drwx--x--- 12 166537 165536 214 Jun 12 18:58 .
drwxr-xr-x  5 166537 165536 113 Jun  9 02:05 ..
drwx--x--x  4 166537 165536 170 Jun  9 02:05 buildkit
drwx--x---  2 166537 165536  10 Jun 12 18:58 containers
-rw-------  1 166537 165536  36 Jun  9 02:05 engine-id
drwx------  3 166537 165536  25 Jun  9 02:05 image
drwxr-x---  3 166537 165536  27 Jun  9 02:05 network
drwx------  4 166537 165536  44 Jun  9 02:05 plugins
drwx------  2 166537 165536  10 Jun 12 18:58 runtimes
drwx------  2 166537 165536  10 Jun  9 02:05 swarm
drwx------  2 166537 165536  10 Jun 12 18:58 tmp
drwx--x---  3 166537 165536  25 Jun  9 02:06 vfs
drwx-----x  2 166537 165536  33 Jun  9 02:05 volumes

Inside a container

ls -lah /home/user/.ide

total 16K
drwxr-xr-x  5 user root  113 Jun  9 02:05  .
drwxr-xr-x 22 user root 4.0K Jun 13 14:41  ..
drwx--x--- 12 user root  214 Jun 12 18:58  docker
-rw-r--r--  1 user root   35 Jun  9 02:05  info.json

sudo ls -lah /home/user/.ide/docker

total 4.0K
drwx--x--- 12 user root 214 Jun 12 18:58 .
drwxr-xr-x  5 user root 113 Jun  9 02:05 ..
drwx--x--x  4 user root 170 Jun  9 02:05 buildkit
drwx--x---  2 user root  10 Jun 12 18:58 containers
-rw-------  1 user root  36 Jun  9 02:05 engine-id
drwx------  3 user root  25 Jun  9 02:05 image
drwxr-x---  3 user root  27 Jun  9 02:05 network
drwx------  4 user root  44 Jun  9 02:05 plugins
drwx------  2 user root  10 Jun 12 18:58 runtimes
drwx------  2 user root  10 Jun  9 02:05 swarm
drwx------  2 user root  10 Jun 12 18:58 tmp
drwx--x---  3 user root  25 Jun  9 02:06 vfs
drwx-----x  2 user root  33 Jun  9 02:05 volumes

sudo systemctl restart docker || journalctl -u docker


Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xeu docker.service" for details.
Jun 13 14:45:05 3c5adbffe1a5 systemd[1]: Starting Docker Application Container Engine...
Jun 13 14:45:05 3c5adbffe1a5 dockerd[104066]: time="2024-06-13T14:45:05.339870626Z" level=info msg="Starting up"
Jun 13 14:45:05 3c5adbffe1a5 dockerd[104066]: could not create or set daemon root permissions: /home/user/.ide/docker: chown /home/user/.ide/docker: operation not permitted
Jun 13 14:45:05 3c5adbffe1a5 systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jun 13 14:45:05 3c5adbffe1a5 systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 13 14:45:05 3c5adbffe1a5 systemd[1]: Failed to start Docker Application Container Engine.

##### Container created with mount

{ Source: '/mnt/nfs/users/952/u-9524', Target: '/home/user', Type: 'bind', ReadOnly: false, BindOptions: { Propagation: 'rprivate' } }



Just to give you a little context, we're using NFS share to store Docker data from the container. This way, we can quickly start up our containers and have a shared storage system.
bushev commented 5 months ago

I've also attempted to use separate NFS shares like so:

cat /etc/exports

/var/cs/home   10.0.105.0/24(rw,no_subtree_check,all_squash,anonuid=166537,anongid=165536) # user:root
/var/cs/docker 10.0.105.0/24(rw,no_subtree_check,all_squash,anonuid=165536,anongid=165536) # root:root

Next, I tried to mount them separately:

{
  Source: '/mnt/nfs/home/952/u-9524',
  Target: '/home/user',
  Type: 'bind' as MountType,
  ReadOnly: false,
  BindOptions: {
    Propagation: 'rprivate' as MountPropagation
  }
},
{
  Source: '/mnt/nfs/docker/952/u-9524',
  Target: '/var/lib/docker',
  Type: 'bind' as MountType,
  ReadOnly: false,
  BindOptions: {
    Propagation: 'rprivate' as MountPropagation
  }
}

However, this resulted in a container start failure with an error message:

(HTTP code 500) server error - failed to create task for container: failed to create shim task: OCI runtime create failed: error in the container spec: invalid mount config: failed to request mount source preps from sysbox-mgr: failed to invoke PrepMounts via grpc: rpc error: code = Unknown desc = failed to shift uids via chown for mount source at /mnt/nfs/docker/952/u-9524: failed to shift ACL for /mnt/nfs/docker/952/u-9524: failed to get ACL for /mnt/nfs/docker/952/u-9524: operation not supported: unknown

Then I decided to use a different directory for docker, like Target: '/var/lib/my-docker' and change docker directory too:

/etc/docker/daemon.json

{"data-root": "/var/lib/my-docker"}

The container started successfully and the shares seemed fine. Yet, when I tried to pull an image inside a containre with docker pull ubuntu, I encountered this:

docker pull ubuntu

Using default tag: latest
latest: Pulling from library/ubuntu
00d679a470c4: Extracting [==================================================>]  28.87MB/28.87MB
failed to register layer: failed to Lchown "/etc/gshadow" for UID 0, GID 42: lchown /etc/gshadow: operation not permitted

So, as you can see, I'm pretty much stuck right now :-)


findmnt

TARGET                                SOURCE                                               FSTYPE   OPTIONS
/                                     overlay                                              overlay  rw,relatime,lowerdir=/var/lib/docker/165536.165536/overlay2/l/QIMHXFAT44O2GT5IYCFJTAXF3S:/var/lib/docker/165536.165536/overlay2/l/O5POPDBVEQFUKY7ZDZXGQTVN4Z:/var/lib/docker/165536.165536/overlay2/l/CDE7PAY527
|-/sys                                sysfs                                                sysfs    rw,nosuid,nodev,noexec,relatime
| |-/sys/firmware                     tmpfs                                                tmpfs    ro,relatime,uid=165536,gid=165536,inode64
| |-/sys/fs/cgroup                    tmpfs                                                tmpfs    ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,uid=165536,gid=165536,inode64
| | |-/sys/fs/cgroup/systemd          systemd                                              cgroup   rw,nosuid,nodev,noexec,relatime,xattr,name=systemd
| | |-/sys/fs/cgroup/pids             cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,pids
| | |-/sys/fs/cgroup/perf_event       cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,perf_event
| | |-/sys/fs/cgroup/cpuset           cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,cpuset
| | |-/sys/fs/cgroup/blkio            cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,blkio
| | |-/sys/fs/cgroup/devices          cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,devices
| | |-/sys/fs/cgroup/net_cls,net_prio cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
| | |-/sys/fs/cgroup/cpu,cpuacct      cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
| | |-/sys/fs/cgroup/freezer          cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,freezer
| | |-/sys/fs/cgroup/memory           cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,memory
| | |-/sys/fs/cgroup/rdma             cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,rdma
| | |-/sys/fs/cgroup/misc             cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,misc
| | `-/sys/fs/cgroup/hugetlb          cgroup                                               cgroup   rw,nosuid,nodev,noexec,relatime,hugetlb
| |-/sys/devices/virtual              sysboxfs[/sys/devices/virtual]                       fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
| | `-/sys/devices/virtual/powercap   tmpfs                                                tmpfs    ro,relatime,uid=165536,gid=165536,inode64
| |-/sys/kernel                       sysboxfs[/sys/kernel]                                fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
| `-/sys/module/nf_conntrack/parameters
|                                     sysboxfs[/sys/module/nf_conntrack/parameters]        fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
|-/proc                               proc                                                 proc     rw,nosuid,nodev,noexec,relatime
| |-/proc/bus                         proc[/bus]                                           proc     ro,nosuid,nodev,noexec,relatime
| |-/proc/fs                          proc[/fs]                                            proc     ro,nosuid,nodev,noexec,relatime
| |-/proc/irq                         proc[/irq]                                           proc     ro,nosuid,nodev,noexec,relatime
| |-/proc/sysrq-trigger               proc[/sysrq-trigger]                                 proc     ro,nosuid,nodev,noexec,relatime
| |-/proc/acpi                        tmpfs                                                tmpfs    ro,relatime,uid=165536,gid=165536,inode64
| |-/proc/keys                        udev[/null]                                          devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
| |-/proc/timer_list                  udev[/null]                                          devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
| |-/proc/scsi                        tmpfs                                                tmpfs    ro,relatime,uid=165536,gid=165536,inode64
| |-/proc/swaps                       sysboxfs[/proc/swaps]                                fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
| |-/proc/sys                         sysboxfs[/proc/sys]                                  fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
| `-/proc/uptime                      sysboxfs[/proc/uptime]                               fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
|-/dev                                tmpfs                                                tmpfs    rw,nosuid,size=65536k,mode=755,uid=165536,gid=165536,inode64
| |-/dev/mqueue                       mqueue                                               mqueue   rw,nosuid,nodev,noexec,relatime
| |-/dev/pts                          devpts                                               devpts   rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666
| |-/dev/shm                          shm                                                  tmpfs    rw,nosuid,nodev,noexec,relatime,size=65536k,uid=165536,gid=165536,inode64
| |-/dev/null                         udev[/null]                                          devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
| |-/dev/random                       udev[/random]                                        devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
| |-/dev/kmsg                         udev[/null]                                          devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
| |-/dev/full                         udev[/full]                                          devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
| |-/dev/tty                          udev[/tty]                                           devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
| |-/dev/zero                         udev[/zero]                                          devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
| `-/dev/urandom                      udev[/urandom]                                       devtmpfs rw,nosuid,relatime,size=8145820k,nr_inodes=2036455,mode=755,inode64
|-/run                                tmpfs                                                tmpfs    rw,nosuid,nodev,relatime,size=65536k,mode=755,uid=165536,gid=165536,inode64
| |-/run/user/1001                    tmpfs                                                tmpfs    rw,nosuid,nodev,relatime,size=419428k,nr_inodes=104857,mode=700,uid=166537,gid=165536,inode64
| `-/run/lock                         tmpfs                                                tmpfs    rw,nosuid,nodev,noexec,relatime,size=4096k,uid=165536,gid=165536,inode64
|-/home/user                          10.0.200.70:/var/cs/home/952/u-9524                  nfs4     rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.105.5,local_lock=none,addr=10.0.200.70
|-/cs-sockets                         /dev/sda1[/etc/nginx/sockets/u9524]                  ext4     rw,relatime,idmapped,discard,errors=remount-ro
|-/etc/resolv.conf                    /dev/sdc[/165536.165536/containers/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7/resolv.conf]
|                                                                                          xfs      rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,usrquota,prjquota,grpquota
|-/etc/hostname                       /dev/sdc[/165536.165536/containers/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7/hostname]
|                                                                                          xfs      rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,usrquota,prjquota,grpquota
|-/etc/hosts                          /dev/sdc[/165536.165536/containers/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7/hosts]
|                                                                                          xfs      rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,usrquota,prjquota,grpquota
|-/var/lib/my-docker                  10.0.200.70:/var/cs/docker/952/u-9524                nfs4     rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.105.5,local_lock=none,addr=10.0.200.70
|-/var/lib/kubelet                    /dev/sda1[/var/lib/sysbox/kubelet/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/k0s                        /dev/sda1[/var/lib/sysbox/k0s/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/buildkit                   /dev/sda1[/var/lib/sysbox/buildkit/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
|                                     /dev/sda1[/var/lib/sysbox/containerd/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/docker                     /dev/sda1[/var/lib/sysbox/docker/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/rancher/k3s                /dev/sda1[/var/lib/sysbox/rancher-k3s/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/rancher/rke2               /dev/sda1[/var/lib/sysbox/rancher-rke2/72f3997457a639da4038fbce18d0de30707bc4c9ddc0cb066702174ce5a658f7]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/usr/src/linux-headers-5.15.0-112   /dev/sda1[/usr/src/linux-headers-5.15.0-112]         ext4     ro,relatime,idmapped,discard,errors=remount-ro
|-/usr/src/linux-headers-5.15.0-112-generic
|                                     /dev/sda1[/usr/src/linux-headers-5.15.0-112-generic] ext4     ro,relatime,idmapped,discard,errors=remount-ro
`-/usr/lib/modules/5.15.0-112-generic /dev/sda1[/usr/lib/modules/5.15.0-112-generic]       ext4     ro,relatime,idmapped,discard,errors=remount-ro
bushev commented 5 months ago

I've tried once more to make things work, but unfortunately, I haven't been successful.

I realized that I had incorrectly configured NFS sharing; the idmapping for NFS wasn't functioning as it should, and I was using an older kernel that didn't support "Overlayfs on ID-mapped mounts". I've corrected all of these issues, but I still can't run the command sudo chown root /home/user/text.txt within the container. Oddly enough, I can execute any command from the Sysbox host (on the NFS client side).

Here are the configurations on my NFS Server:

cat /etc/exports

/var/cs/home 10.0.105.0/24(rw,no_subtree_check,no_root_squash)

cat /etc/idmapd.conf

[General]

Verbosity = 0
# set your own domain here, if it differs from FQDN minus hostname
# Domain = localdomain
Domain = lan

[Mapping]

Nobody-User = nobody
Nobody-Group = nogroup

In addition to the above, I ran this command: sudo echo N > /sys/module/nfsd/parameters/nfs4_disable_idmapping.

The command ls -lah /var/cs/home/952/u-9524/ provides the following:

-rw-rw-r--   1 cs-user cs-root    0 Jun 22 15:13 text.txt

(Here, I've added cs-user and cs-root using the specified commands for easier reading. sudo useradd -u 165536 cs-root and sudo useradd -u 166537 cs-user)

On the NFS client (Sysbox host) side, I have this setup:

The mount command is

sudo mount 10.0.200.70:/var/cs/home /mnt/nfs/home

The output of ls -lah /mnt/nfs/home/952/u-9524 on the host is

-rw-rw-r--  1 cs-user 165536    0 Jun 22 15:13 text.txt

The command sudo chown cs-ubuntu /mnt/nfs/home/952/u-9524/text.txt works just fine. But when I try to do the same inside a container, here's what happens:

Then, I'm trying to make the same inside a container

docker run --rm --name tmp -it --runtime sysbox-runc -v /mnt/nfs/home/952/u-9524:/home/user 44c062a02c99 /bin/sh -c "ls -lah /home/user && chown root /home/user/text.txt"

The result is

-rw-rw-r-- 1 user root 0 Jun 22 15:13 /home/user/text.txt
chown: changing ownership of '/home/user/text.txt': Operation not permitted

idmapping seems to be working: journalctl --no-pager -u sysbox-mgr

Jun 21 21:33:38 worker-5 systemd[1]: Starting sysbox-mgr (part of the Sysbox container runtime)...
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Starting ..."
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Sysbox data root: /var/lib/sysbox"
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Shiftfs module found in kernel: no"
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Shiftfs works properly: no"
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Shiftfs-on-overlayfs works properly: no"
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="ID-mapped mounts supported by kernel: yes"
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Overlayfs on ID-mapped mounts supported by kernel: yes"
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Operating in system container mode."
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Inner container image preloading enabled."
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Listening on /run/sysbox/sysmgr.sock"
Jun 21 21:33:38 worker-5 sysbox-mgr[1006]: time="2024-06-21 21:33:38" level=info msg="Ready ..."
Jun 21 21:33:38 worker-5 systemd[1]: Started sysbox-mgr (part of the Sysbox container runtime).

cat /etc/idmapd.conf

[General]

Verbosity = 0
# set your own domain here, if it differs from FQDN minus hostname
# Domain = localdomain
Domain = lan

[Mapping]

Nobody-User = nobody
Nobody-Group = nogroup

Here's what my /etc/docker/daemon.json file looks like:

{
"userns-remap": "sysbox",
"runtimes": {
"sysbox-runc": {
"path": "/usr/bin/sysbox-runc"
}
},
"bip": "172.20.0.1/16",
"default-address-pools": [
{
"base": "172.25.0.0/16",
"size": 24
}
],
"insecure-registries": [
"10.0.200.37:5000"
]
}

findmnt

docker run --rm --name tmp -it --runtime sysbox-runc -v /mnt/nfs/home/952/u-9524:/home/user 44c062a02c99 findmnt
TARGET                                  SOURCE                                             FSTYPE   OPTIONS
/                                       overlay                                            overlay  rw,relatime,lowerdir=/var/lib/docker/165536.165536/overlay2/l/ZJ3XHL4OSA67GW46RG7OWZIW63:/var/lib/docker/165536.165536/overlay2/l/HNIO4YVNNN34FFILPGJXBNEGXE:/var/lib/docker/165536.165536/overlay2/l/IV34VEYV47FU3UC7S6FETFPAE5:/var/lib
|-/sys                                  sysfs                                              sysfs    rw,nosuid,nodev,noexec,relatime
| |-/sys/firmware                       tmpfs                                              tmpfs    ro,relatime,uid=165536,gid=165536,inode64
| |-/sys/fs/cgroup                      tmpfs                                              tmpfs    rw,nosuid,nodev,noexec,relatime,mode=755,uid=165536,gid=165536,inode64
| | |-/sys/fs/cgroup/systemd            systemd                                            cgroup   rw,nosuid,nodev,noexec,relatime,xattr,name=systemd
| | |-/sys/fs/cgroup/perf_event         cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,perf_event
| | |-/sys/fs/cgroup/memory             cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,memory
| | |-/sys/fs/cgroup/blkio              cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,blkio
| | |-/sys/fs/cgroup/net_cls,net_prio   cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
| | |-/sys/fs/cgroup/misc               cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,misc
| | |-/sys/fs/cgroup/cpuset             cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,cpuset
| | |-/sys/fs/cgroup/freezer            cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,freezer
| | |-/sys/fs/cgroup/cpu,cpuacct        cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
| | |-/sys/fs/cgroup/hugetlb            cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,hugetlb
| | |-/sys/fs/cgroup/pids               cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,pids
| | |-/sys/fs/cgroup/rdma               cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,rdma
| | `-/sys/fs/cgroup/devices            cgroup                                             cgroup   rw,nosuid,nodev,noexec,relatime,devices
| |-/sys/devices/virtual                sysboxfs[/sys/devices/virtual]                     fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
| |-/sys/kernel                         sysboxfs[/sys/kernel]                              fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
| `-/sys/module/nf_conntrack/parameters sysboxfs[/sys/module/nf_conntrack/parameters]      fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
|-/proc                                 proc                                               proc     rw,nosuid,nodev,noexec,relatime
| |-/proc/bus                           proc[/bus]                                         proc     ro,nosuid,nodev,noexec,relatime
| |-/proc/fs                            proc[/fs]                                          proc     ro,nosuid,nodev,noexec,relatime
| |-/proc/irq                           proc[/irq]                                         proc     ro,nosuid,nodev,noexec,relatime
| |-/proc/sysrq-trigger                 proc[/sysrq-trigger]                               proc     ro,nosuid,nodev,noexec,relatime
| |-/proc/asound                        tmpfs                                              tmpfs    ro,relatime,uid=165536,gid=165536,inode64
| |-/proc/acpi                          tmpfs                                              tmpfs    ro,relatime,uid=165536,gid=165536,inode64
| |-/proc/keys                          udev[/null]                                        devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
| |-/proc/timer_list                    udev[/null]                                        devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
| |-/proc/scsi                          tmpfs                                              tmpfs    ro,relatime,uid=165536,gid=165536,inode64
| |-/proc/swaps                         sysboxfs[/proc/swaps]                              fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
| |-/proc/sys                           sysboxfs[/proc/sys]                                fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
| `-/proc/uptime                        sysboxfs[/proc/uptime]                             fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
|-/dev                                  tmpfs                                              tmpfs    rw,nosuid,size=65536k,mode=755,uid=165536,gid=165536,inode64
| |-/dev/console                        devpts[/0]                                         devpts   rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666
| |-/dev/mqueue                         mqueue                                             mqueue   rw,nosuid,nodev,noexec,relatime
| |-/dev/pts                            devpts                                             devpts   rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666
| |-/dev/shm                            shm                                                tmpfs    rw,nosuid,nodev,noexec,relatime,size=65536k,uid=165536,gid=165536,inode64
| |-/dev/null                           udev[/null]                                        devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
| |-/dev/random                         udev[/random]                                      devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
| |-/dev/kmsg                           udev[/null]                                        devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
| |-/dev/full                           udev[/full]                                        devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
| |-/dev/tty                            udev[/tty]                                         devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
| |-/dev/zero                           udev[/zero]                                        devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
| `-/dev/urandom                        udev[/urandom]                                     devtmpfs rw,nosuid,relatime,size=8097384k,nr_inodes=2024346,mode=755,inode64
|-/home/user                            10.0.200.70:/var/cs/home/952/u-9524                nfs4     rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.105.5,local_lock=none,addr=10.0.200.70
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/etc/resolv.conf                      /dev/sdc[/165536.165536/containers/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade/resolv.conf]
|                                                                                          xfs      rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,usrquota,prjquota,grpquota
|-/etc/hostname                         /dev/sdc[/165536.165536/containers/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade/hostname]
|                                                                                          xfs      rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,usrquota,prjquota,grpquota
|-/etc/hosts                            /dev/sdc[/165536.165536/containers/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade/hosts]
|                                                                                          xfs      rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,usrquota,prjquota,grpquota
|-/var/lib/kubelet                      /dev/sda1[/var/lib/sysbox/kubelet/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/k0s                          /dev/sda1[/var/lib/sysbox/k0s/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/buildkit                     /dev/sda1[/var/lib/sysbox/buildkit/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
|                                       /dev/sda1[/var/lib/sysbox/containerd/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/rancher/k3s                  /dev/sda1[/var/lib/sysbox/rancher-k3s/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/var/lib/rancher/rke2                 /dev/sda1[/var/lib/sysbox/rancher-rke2/10390c0e6738870f816f816328e1d81764c9d2ff0cb4113f1faf2f82546c6ade]
|                                                                                          ext4     rw,relatime,discard,errors=remount-ro
|-/usr/src/linux-headers-6.5.0-41-generic
|                                       /dev/sda1[/usr/src/linux-headers-6.5.0-41-generic] ext4     ro,relatime,idmapped,discard,errors=remount-ro
|-/usr/src/linux-hwe-6.5-headers-6.5.0-41
|                                       /dev/sda1[/usr/src/linux-hwe-6.5-headers-6.5.0-41] ext4     ro,relatime,idmapped,discard,errors=remount-ro
`-/usr/lib/modules/6.5.0-41-generic     /dev/sda1[/usr/lib/modules/6.5.0-41-generic]       ext4     ro,relatime,idmapped,discard,errors=remount-ro

At this point, I'm close to giving up...

nhoefer2 commented 1 month ago

I'm not sure if this will be helpful but try taking a look at this: https://github.com/nestybox/sysbox/issues/849

bushev commented 1 month ago

Thanks @nhoefer2

That looks like it could really help. I’ll give it a try next week and let you know how it goes. By the way, I’m using the XFS file system, but I’m not sure if ACL was enabled or not.

bushev commented 1 month ago

Unfortunately, it does not help me. I decided to give it another shot at attaching an NFS volume to a Docker container. Below, I’ve listed all the steps I took to reproduce the issue (there weren’t that many).

Description:

When attempting to mount NFS volumes on a Docker container, the Docker daemon fails to start inside the container. This issue occurs on a setup involving two hosts running Ubuntu 24.04, with one acting as an NFS server and the other as an NFS client.

Environment:

Steps to Reproduce:

  1. Configure NFS Server (Host 1):

    • Install NFS server:
      sudo apt -y install nfs-kernel-server nfs4-acl-tools
    • Configure /etc/idmapd.conf:
      Domain = ide.lan
    • Update /etc/exports:
      /data 10.0.105.0/24(rw,no_root_squash)
    • Apply changes:
      sudo systemctl restart nfs-server
  2. Configure NFS Client (Host 2):

    • Install NFS client:
      sudo apt -y install nfs-common nfs4-acl-tools
      sudo nano /etc/idmapd.conf  # Domain = ide.lan
      sudo nano /etc/fstab  # nfs-server.ide.lan:/data /mnt/nfs_share nfs defaults 0 0
    • Mount NFS share:
      sudo mkdir /mnt/nfs_share
      sudo mount -a
  3. Verify ACL (Access Control List):

    • On the NFS server:
      sudo setfacl -m g:root:rwx /data/docker
    • On the NFS client:
      getfacl /mnt/nfs_share/docker
  4. Run Docker Container:

    docker run --runtime sysbox-runc --name nfs-poc --rm -it -v /mnt/nfs_share/docker:/var/lib/docker nestybox/ubuntu-noble-systemd-docker:latest
  5. Observe Docker Service Failure in Logs:

    • Inside the container, the following error appears:
      chmod /var/lib/docker: operation not permitted

Error Logs:

Oct 13 19:43:54 f9d74a2ab68a systemd[1]: Failed to start docker.service - Docker Application Container Engine.
Oct 13 19:43:56 f9d74a2ab68a dockerd[1383]: chmod /var/lib/docker: operation not permitted

Additional Behavior:

When restarting the container multiple times, sysbox-mgr shows warnings that the NFS share is already mounted in another container:

systemd[1]: Starting sysbox-mgr.service - sysbox-mgr (part of the Sysbox container runtime)...
time="2024-10-13 19:42:45" level=info msg="Starting ..."
time="2024-10-13 19:42:45" level=info msg="Sysbox data root: /var/lib/sysbox"
time="2024-10-13 19:42:45" level=info msg="Shiftfs module found in kernel: no"
time="2024-10-13 19:42:45" level=info msg="Shiftfs works properly: no"
time="2024-10-13 19:42:45" level=info msg="Shiftfs-on-overlayfs works properly: no"
time="2024-10-13 19:42:45" level=info msg="ID-mapped mounts supported by kernel: yes"
time="2024-10-13 19:42:45" level=info msg="Overlayfs on ID-mapped mounts supported by kernel: yes"
time="2024-10-13 19:42:45" level=info msg="Operating in system container mode."
time="2024-10-13 19:42:45" level=info msg="Inner container image preloading enabled."
time="2024-10-13 19:42:45" level=info msg="Listening on /run/sysbox/sysmgr.sock"
time="2024-10-13 19:42:45" level=info msg="Ready ..."
sysbox-mgr[939]: mount source at /mnt/nfs_share/docker should be mounted in one container only, but is already mounted in containers [f9d74a2ab68a...]

Expected Behavior:

Docker should start successfully inside the container, and NFS shares should be correctly mounted without permission issues or overlapping mounts.

Actual Behavior:

The Docker daemon fails to start due to permission issues on the NFS-mounted directory. Additionally, sysbox-mgr reports that the same NFS mount source is being reused across multiple containers, leading to errors.

ctalledo commented 1 month ago

Hi @bushev, thanks for trying Sysbox, hope you find it useful.

I suspect the problem you are having is that Sysbox uses shiftfs or ID-mapped-mounts on host directories mounted into the container, and I don't believe either of these mechanisms work on top of NFS mounts (unfortunately).

For example, when you do

docker run --runtime sysbox-runc --name nfs-poc --rm -it -v /mnt/nfs_share/docker:/var/lib/docker nestybox/ubuntu-noble-systemd-docker:latest

how does ls -l /var/lib/docker look from inside the container?

bushev commented 1 month ago

Hey Cesar, thanks for looking into that!

I just tried what you suggested, and strangely enough, the previous error seems to have disappeared. I can now confirm that Docker is starting within the container. This might be related to the fact that I rebooted the servers several times and enabled ACL with different parameters afterward. I can’t fully explain why, but it started working, and it seems to be functional for now.

However, when I attempted to pull an image, for example, for MySQL, I encountered an error at the end stating that it couldn’t create a symbolic link. I believe this might be due to a limitation related to NFS and how it’s mounted inside the Sysbox container, but this is clearly a separate issue. Hopefully, this will be the last problem preventing full NFS compatibility.

image

ctalledo commented 1 month ago

Hi @bushev, that's progress, thanks.

I don't know however what could be causing the latest error you see when the image gets pulled by Docker inside the Sysbox contaienr. Does it occur with other images? Say for example, does docker run -it --rm alpine work?

bushev commented 1 month ago

Hmm, no this doesn’t work either, but the error is somehow related to a symlink as before.

user@8c397c02138d:~$ docker run -it --rm alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
43c4264eed91: Extracting [==================================================>]  3.624MB/3.624MB
docker: failed to register layer: failed to Lchown "/etc/shadow" for UID 0, GID 42: lchown /etc/shadow: operation not permitted.
ctalledo commented 1 month ago

I suspect the issue you are facing is not so much related to Sysbox, as it is related to placing /var/lib/docker on an NFS mount. I am pretty sure that if you do the same without Sysbox (e.g., by simply configuring the Docker engine's data-root to an NFS backed directory), you'll see the same error.

Now, as to why it fails, I don't know. But it's probably due to limitations on NFS. Figuring that out would require a deeper investigation.

If I am incorrect and you believe the problem is specific to running Docker engine on Sysbox, then we can dig further to see why that is. But I don't see any indication of this, the problem appears to be related to NFS than anything else.