nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.81k stars 155 forks source link

umount2 fails with ENOENT (No such file or directory) #854

Open gabrielrussoc opened 3 weeks ago

gabrielrussoc commented 3 weeks ago

repro

Step 1: mount and work with FUSE inside sysbox containers:

docker run --cap-add SYS_ADMIN --security-opt=apparmor=unconfined --device /dev/fuse --runtime=sysbox-runc -it gabrielrc/sysbox-umount2

Inside the container, I have a sample python script to create a simple passthrough FUSE:

mkdir passthrough
./sample_fuse.py /etc ./passthrough
ls -l ./passthrough/
...
drwxr-xr-x 2 root root    4096 Oct 19 02:52 ImageMagick-6
drwxr-xr-x 1 root root    4096 Oct 19 02:52 X11
-rw-r--r-- 1 root root    3040 May 25  2023 adduser.conf

Step 2: Try unmount the filesystem and see it failing with No such file or directory:

fusermount -u ./passthrough
fusermount: failed to unmount /tmp/passthrough: No such file or directory

sanity check

The exact same steps DO NOT repro if running with --runtime=runc like:

docker run --cap-add SYS_ADMIN --security-opt=apparmor=unconfined --device /dev/fuse --runtime=runc -it gabrielrc/sysbox-umount2

it's possible to umount2 just fine on the default runc

some initial investigation

If we run fusermount with strace, we can see the syscall failing is umount2:

strace fusermount -u ./passthrough
...
umount2("passthrough", 0)               = -1 ENOENT (No such file or directory)
...

It seems like umount2 is always intercepted by sysbox per this comment: https://github.com/nestybox/sysbox-runc/blob/1b440ff266841f3d2d296e664122a9e29ceb9fd8/libsysbox/syscont/syscalls.go#L371-L384 And indeed we do some file access checks: https://github.com/nestybox/sysbox-fs/blob/da7644ded8f9430b9cc5c4afc50d9edd115da0f7/seccomp/tracer.go#L713-L718

unclear what to do from here -- I'm really not familiar with any of these

version

Distributor ID: Ubuntu
Description:    Ubuntu 24.04.1 LTS
Release:    24.04
Codename:   noble
Linux redacted 6.8.0-1016-aws #17-Ubuntu SMP Mon Sep  2 13:48:07 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
sysbox-runc
        edition:         Community Edition (CE)
        version:         0.6.4
        commit:         085502643ea5281652c6984eed9797872f22698a
        built at:         Sat Apr  6 16:43:31 UTC 2024
        built by:         Cesar Talledo
        oci-specs:         1.1.0+dev
sysbox-mgr
        edition:         Community Edition (CE)
        version:         0.6.4
        commit:         03f5d7bc584fdcb2319b2c1831bd58581185fc1c
        built at:         Sat Apr  6 16:43:43 UTC 2024
        built by:         Cesar Talledo
sysbox-fs
        edition:         Community Edition (CE)
        version:         0.6.4
        commit:         1a678b72ac430009739fa6596b824f29b1f7fe2e
        built at:         Sat Apr  6 16:43:40 UTC 2024
        built by:         Cesar Talledo
ctalledo commented 2 weeks ago

Hi @gabrielrussoc, thanks for trying Sysbox and reporting the issue.

Can you provide the output of findmnt inside the Sysbox container, right before you hit the error in step 2?

gabrielrussoc commented 2 weeks ago

@ctalledo here you go

TARGET                            SOURCE                                           FSTYPE   OPTIONS
/                                 overlay                                          overlay  rw,relatime,lowerdir=/var/lib/docker/overlay2/l/UNB
├─/tmp/passthrough                SimpleFS                                         fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0
├─/sys                            sysfs                                            sysfs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware                 tmpfs                                            tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/sys/fs/cgroup                cgroup                                           cgroup2  rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_r
│ ├─/sys/devices/virtual          sysboxfs[/sys/devices/virtual]                   fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,defau
│ ├─/sys/kernel                   sysboxfs[/sys/kernel]                            fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,defau
│ └─/sys/module/nf_conntrack/parameters
│                                 sysboxfs[/sys/module/nf_conntrack/parameters]    fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,defau
├─/proc                           proc                                             proc     rw,nosuid,nodev,noexec,relatime
│ ├─/proc/bus                     proc[/bus]                                       proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/fs                      proc[/fs]                                        proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/irq                     proc[/irq]                                       proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/sysrq-trigger           proc[/sysrq-trigger]                             proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/acpi                    tmpfs                                            tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/keys                    devtmpfs[/null]                                  devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/proc/latency_stats           devtmpfs[/null]                                  devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/proc/timer_list              devtmpfs[/null]                                  devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/proc/scsi                    tmpfs                                            tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/swaps                   sysboxfs[/proc/swaps]                            fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,defau
│ ├─/proc/sys                     sysboxfs[/proc/sys]                              fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,defau
│ └─/proc/uptime                  sysboxfs[/proc/uptime]                           fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,defau
├─/dev                            tmpfs                                            tmpfs    rw,nosuid,size=65536k,mode=755,uid=165536,gid=16553
│ ├─/dev/console                  devpts[/0]                                       devpts   rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxm
│ ├─/dev/mqueue                   mqueue                                           mqueue   rw,nosuid,nodev,noexec,relatime
│ ├─/dev/pts                      devpts                                           devpts   rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxm
│ ├─/dev/shm                      shm                                              tmpfs    rw,nosuid,nodev,noexec,relatime,size=65536k,uid=165
│ ├─/dev/null                     devtmpfs[/null]                                  devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/dev/random                   devtmpfs[/random]                                devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/dev/kmsg                     devtmpfs[/null]                                  devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/dev/full                     devtmpfs[/full]                                  devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/dev/tty                      devtmpfs[/tty]                                   devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/dev/zero                     devtmpfs[/zero]                                  devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ ├─/dev/urandom                  devtmpfs[/urandom]                               devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
│ └─/dev/fuse                     devtmpfs[/fuse]                                  devtmpfs rw,nosuid,noexec,relatime,size=1995376k,nr_inodes=4
├─/etc/resolv.conf                /dev/root[/var/lib/docker/containers/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2/resolv.conf]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/etc/hostname                   /dev/root[/var/lib/docker/containers/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2/hostname]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/etc/hosts                      /dev/root[/var/lib/docker/containers/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2/hosts]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/var/lib/k0s                    /dev/root[/var/lib/sysbox/k0s/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/var/lib/buildkit               /dev/root[/var/lib/sysbox/buildkit/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
│                                 /dev/root[/var/lib/sysbox/containerd/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/var/lib/docker                 /dev/root[/var/lib/sysbox/docker/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/var/lib/rancher/k3s            /dev/root[/var/lib/sysbox/rancher-k3s/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/var/lib/rancher/rke2           /dev/root[/var/lib/sysbox/rancher-rke2/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/var/lib/kubelet                /dev/root[/var/lib/sysbox/kubelet/e90873360b5cc9123f43021a4d73e56639ea6b956046c5abb0d82a5d281e7bc2]
│                                                                                  ext4     rw,relatime,idmapped,discard,errors=remount-ro,comm
├─/usr/src/linux-headers-6.8.0-1016-aws
│                                 /dev/root[/usr/src/linux-headers-6.8.0-1016-aws] ext4     ro,relatime,idmapped,discard,errors=remount-ro,comm
├─/usr/src/linux-aws-headers-6.8.0-1016
│                                 /dev/root[/usr/src/linux-aws-headers-6.8.0-1016] ext4     ro,relatime,idmapped,discard,errors=remount-ro,comm
└─/usr/lib/modules/6.8.0-1016-aws /dev/root[/usr/lib/modules/6.8.0-1016-aws]       ext4     ro,relatime,idmapped,discard,errors=remount-ro,comm
gabrielrussoc commented 2 weeks ago

also just for clarity, I published the image from the repro on docker hub: https://hub.docker.com/r/gabrielrc/sysbox-umount2

So you can peak what the script does and hopefully even repro yourself if that helps you (appreciate the security concern of running an admin container, but I'm using some cheap AWS VMs to repro this)

Here's the docker file:

FROM python:3.12

RUN pip install fusepy
RUN apt update
RUN apt install -y libfuse-dev fuse strace
COPY ./sample_fuse.py /tmp/sample_fuse.py
RUN chmod +x /tmp/sample_fuse.py

WORKDIR /tmp
ENTRYPOINT /bin/bash

And here's the python script sample_fuse.py:

#!/usr/bin/env python3

import os
import sys
import errno
from fuse import FUSE, FuseOSError, Operations

class SimpleFS(Operations):
    def __init__(self, root):
        self.root = root

    def _full_path(self, partial):
        if partial.startswith("/"):
            partial = partial[1:]
        return os.path.join(self.root, partial)

    def getattr(self, path, fh=None):
        full_path = self._full_path(path)
        st = os.lstat(full_path)
        return dict((key, getattr(st, key)) for key in ('st_atime', 'st_ctime',
                    'st_gid', 'st_mode', 'st_mtime', 'st_nlink', 'st_size', 'st_uid'))

    def readdir(self, path, fh):
        full_path = self._full_path(path)
        dirents = ['.', '..']
        if os.path.isdir(full_path):
            dirents.extend(os.listdir(full_path))
        for r in dirents:
            yield r

    def read(self, path, size, offset, fh):
        with open(self._full_path(path), 'rb') as f:
            f.seek(offset)
            return f.read(size)

    def write(self, path, buf, offset, fh):
        with open(self._full_path(path), 'r+b') as f:
            f.seek(offset)
            f.write(buf)
        return len(buf)

def main(mountpoint, root):
    FUSE(SimpleFS(root), mountpoint, nothreads=True, foreground=False)

if __name__ == '__main__':
    main(sys.argv[2], sys.argv[1])