nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.81k stars 155 forks source link

Cannot create container with linux 6.2 on ubuntu 22.04 #727

Closed rmillet-rs closed 10 months ago

rmillet-rs commented 1 year ago

Hello, with the 6.2.0 kernel, container fails to start (was working with 5.15.0).

docker run -it --rm --runtime sysbox-runc docker.io/library/almalinux:9
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: container_linux.go:424: starting container process caused: process_linux.go:607: container init caused: rootfs_linux.go:123: adding rootfs state caused: rootfs_linux.go:1438: unable to create fsEntry /usr/src/kernels caused: mkdir /usr/src/kernels: read-only file system: unknown.
ERRO[0001] error waiting for container:

This is related to the image, because docker.io/nestybox/alpine-docker:latest starts.

ondh commented 1 year ago

My error:

Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:424: starting container process caused: process_linux.go:607: container init caused: process_linux.go:578: handleReqOp caused: rootfs_init_linux.go:286: re-mounting procfs caused: operation not permitted: unknown
frulio commented 1 year ago

Hi @rmillet-rs,

I tried the installation from zero and worked for me:

image

I installed everything new in a VM (virtualbox) and followed the documentation minus the shiftfs, I used the branch k6.1:

image

Is not much, but I hope this can help.

ctalledo commented 1 year ago

Hi @rmillet-rs,

Thanks for reporting; I was unable to repro on my Ubuntu-22.04 host with Sysbox 0.6.2 and Docker 24.0.5. However I am not using the HWE kernel, so I suspect the problem you are facing it's due to that (although @frulio 's comment above seems to suggest it works with HWE too).

The error you are getting is a bit strange too:

docker: Error response from daemon ... : mkdir /usr/src/kernels: read-only file system: unknown.

What's happening here is that during container start, Sysbox was trying to create the /usr/src/kernels dir inside the container, and somehow the mkdir operation fails with "read-only filesystem". Since the container's root filesystem is normally read-write, it's a strange error.

Can you try @frulio 's steps (particularly the shiftfs installation) to see if that works?

Also, you indicated image docker.io/nestybox/alpine-docker:latest works fine. Can you create a container with that image and post the output of findmnt and ls -l /usr/src ?

Thanks!

ctalledo commented 1 year ago

Hi @ondh, thanks for reporting.

Do you get that error for all Sysbox containers, or just for some images?

Can you please try with docker 24.0.6 (see @frulio 's comment above)?

Thanks!

rmillet-rs commented 1 year ago

Hello,

For info, I initially had the problem on my host then rollbacked the kernel. And reproduced the issue in a VM installed from scratch before opening the ticket. So now, all is done in the VM.

Since the last time, I upgraded the system:

In docker.io/nestybox/alpine-docker:latest:

# findmnt
TARGET                          SOURCE                                                                     FSTYPE   OPTIONS
/                               overlay                                                                    overlay  rw,relatime,lowerdir=/var/lib/docker/overlay2/l/GWSUOWRLOMDAWTWT4DSW5RR44A:/var/lib/docker/overlay2/l/OSSB4QT2S3YAHWV7WYG
├─/sys                          sysfs                                                                      sysfs    rw,nosuid,nodev,noexec,relatime
│ ├─/sys/firmware               tmpfs                                                                      tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/sys/fs/cgroup              cgroup                                                                     cgroup2  rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot
│ ├─/sys/devices/virtual        sysboxfs[/sys/devices/virtual]                                             fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ ├─/sys/kernel                 sysboxfs[/sys/kernel]                                                      fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ └─/sys/module/nf_conntrack/parameters
│                               sysboxfs[/sys/module/nf_conntrack/parameters]                              fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
├─/proc                         proc                                                                       proc     rw,nosuid,nodev,noexec,relatime
│ ├─/proc/bus                   proc[/bus]                                                                 proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/fs                    proc[/fs]                                                                  proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/irq                   proc[/irq]                                                                 proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/sysrq-trigger         proc[/sysrq-trigger]                                                       proc     ro,nosuid,nodev,noexec,relatime
│ ├─/proc/asound                tmpfs                                                                      tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/acpi                  tmpfs                                                                      tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/keys                  udev[/null]                                                                devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
│ ├─/proc/timer_list            udev[/null]                                                                devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
│ ├─/proc/scsi                  tmpfs                                                                      tmpfs    ro,relatime,uid=165536,gid=165536,inode64
│ ├─/proc/swaps                 sysboxfs[/proc/swaps]                                                      fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ ├─/proc/sys                   sysboxfs[/proc/sys]                                                        fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
│ └─/proc/uptime                sysboxfs[/proc/uptime]                                                     fuse     rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other
├─/dev                          tmpfs                                                                      tmpfs    rw,nosuid,size=65536k,mode=755,uid=165536,gid=165536,inode64
│ ├─/dev/console                devpts[/0]                                                                 devpts   rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666
│ ├─/dev/mqueue                 mqueue                                                                     mqueue   rw,nosuid,nodev,noexec,relatime
│ ├─/dev/pts                    devpts                                                                     devpts   rw,nosuid,noexec,relatime,gid=165541,mode=620,ptmxmode=666
│ ├─/dev/shm                    shm                                                                        tmpfs    rw,nosuid,nodev,noexec,relatime,size=65536k,uid=165536,gid=165536,inode64
│ ├─/dev/null                   udev[/null]                                                                devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
│ ├─/dev/random                 udev[/random]                                                              devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
│ ├─/dev/kmsg                   udev[/null]                                                                devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
│ ├─/dev/full                   udev[/full]                                                                devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
│ ├─/dev/tty                    udev[/tty]                                                                 devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
│ ├─/dev/zero                   udev[/zero]                                                                devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
│ └─/dev/urandom                udev[/urandom]                                                             devtmpfs rw,nosuid,relatime,size=939672k,nr_inodes=234918,mode=755,inode64
├─/etc/resolv.conf              /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/docker/containers/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584/resolv.conf]
│                                                                                                          ext4     rw,relatime,idmapped
├─/etc/hostname                 /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/docker/containers/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584/hostname]
│                                                                                                          ext4     rw,relatime,idmapped
├─/etc/hosts                    /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/docker/containers/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584/hosts]
│                                                                                                          ext4     rw,relatime,idmapped
├─/var/lib/docker               /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/docker/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584]
│                                                                                                          ext4     rw,relatime,idmapped
├─/var/lib/rancher/k3s          /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/rancher-k3s/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584]
│                                                                                                          ext4     rw,relatime,idmapped
├─/var/lib/rancher/rke2         /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/rancher-rke2/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584]
│                                                                                                          ext4     rw,relatime,idmapped
├─/var/lib/kubelet              /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/kubelet/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584]
│                                                                                                          ext4     rw,relatime,idmapped
├─/var/lib/k0s                  /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/k0s/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584]
│                                                                                                          ext4     rw,relatime,idmapped
├─/var/lib/buildkit             /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/buildkit/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584]
│                                                                                                          ext4     rw,relatime,idmapped
├─/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs
│                               /dev/mapper/ubuntu--vg-ubuntu--lv[/var/lib/sysbox/containerd/4cfc9e5cc6369a9da178ded24bea0d7dce5caf963312dfd0e075c7c2d1d29584]
│                                                                                                          ext4     rw,relatime,idmapped
├─/usr/src                      /dev/mapper/ubuntu--vg-ubuntu--lv[/usr/src]                                ext4     ro,relatime,idmapped
│ └─/usr/src/linux-headers-6.2.0-36-generic
│                               /dev/mapper/ubuntu--vg-ubuntu--lv[/usr/src/linux-headers-6.2.0-36-generic] ext4     ro,relatime,idmapped
└─/lib/modules/6.2.0-36-generic /dev/mapper/ubuntu--vg-ubuntu--lv[/usr/lib/modules/6.2.0-36-generic]       ext4     ro,relatime,idmapped

# ls -l /usr/src
total 32
drwxr-xr-x   25 root     root          4096 Aug 17 15:04 linux-headers-5.15.0-79
drwxr-xr-x    7 root     root          4096 Aug 17 15:04 linux-headers-5.15.0-79-generic
drwxr-xr-x   25 root     root          4096 Nov  2 09:36 linux-headers-5.15.0-88
drwxr-xr-x    7 root     root          4096 Nov  2 09:38 linux-headers-5.15.0-88-generic
drwxr-xr-x    7 root     root          4096 Aug 17 15:21 linux-headers-6.2.0-26-generic
drwxr-xr-x    7 root     root          4096 Nov  2 09:42 linux-headers-6.2.0-36-generic
drwxr-xr-x   26 root     root          4096 Aug 17 15:18 linux-hwe-6.2-headers-6.2.0-26
drwxr-xr-x   26 root     root          4096 Nov  2 09:39 linux-hwe-6.2-headers-6.2.0-36

I then installed shifts (+reboot), but I still get the error with alma 8 image:

# modinfo shiftfs
filename:       /lib/modules/6.2.0-36-generic/updates/dkms/shiftfs.ko
license:        GPL v2
description:    id shifting filesystem
author:         Christian Brauner <christian.brauner@ubuntu.com>
author:         Seth Forshee <seth.forshee@canonical.com>
author:         James Bottomley
alias:          fs-shiftfs
srcversion:     7D0BB9C7E9F4D9CF3829D8D
depends:        
retpoline:      Y
name:           shiftfs
vermagic:       6.2.0-36-generic SMP preempt mod_unload modversions 

I tried a few images, but only almalinux produces the bug.

If I am the only one impacted, do not waste your time on it. (I don't need to upgrade my kernel, I did this because I had some issues with my material but as it changed nothing I don't need to stick on this kernel).

Thanks

ctalledo commented 1 year ago

Hi @rmillet-rs, I was able to reproduce the problem on a freshly provisioned Ubuntu-Jammy host.

Turns out it's a bug in the way Sysbox finds the kernel headers in the host machine so that it can bind-mount them (read-only) into containers (since apps running inside Sysbox containers often need them).

The bug manifests itself when the host has the kernel headers such as:

ls -l /usr/src/linux-headers-6.2.0-35-generic/
total 1984
drwxr-xr-x 3 root root    4096 Oct 20 17:59 arch
lrwxrwxrwx 1 root root      39 Oct  6 02:29 block -> ../linux-hwe-6.2-headers-6.2.0-35/block
lrwxrwxrwx 1 root root      39 Oct  6 02:29 certs -> ../linux-hwe-6.2-headers-6.2.0-35/certs
lrwxrwxrwx 1 root root      40 Oct  6 02:29 crypto -> ../linux-hwe-6.2-headers-6.2.0-35/crypto
lrwxrwxrwx 1 root root      47 Oct  6 02:29 Documentation -> ../linux-hwe-6.2-headers-6.2.0-35/Documentation
lrwxrwxrwx 1 root root      41 Oct  6 02:29 drivers -> ../linux-hwe-6.2-headers-6.2.0-35/drivers
lrwxrwxrwx 1 root root      36 Oct  6 02:29 fs -> ../linux-hwe-6.2-headers-6.2.0-35/fs
drwxr-xr-x 4 root root    4096 Oct 20 17:59 include
lrwxrwxrwx 1 root root      38 Oct  6 02:29 init -> ../linux-hwe-6.2-headers-6.2.0-35/init
lrwxrwxrwx 1 root root      42 Oct  6 02:29 io_uring -> ../linux-hwe-6.2-headers-6.2.0-35/io_uring
lrwxrwxrwx 1 root root      37 Oct  6 02:29 ipc -> ../linux-hwe-6.2-headers-6.2.0-35/ipc
lrwxrwxrwx 1 root root      40 Oct  6 02:29 Kbuild -> ../linux-hwe-6.2-headers-6.2.0-35/Kbuild
lrwxrwxrwx 1 root root      41 Oct  6 02:29 Kconfig -> ../linux-hwe-6.2-headers-6.2.0-35/Kconfig
drwxr-xr-x 2 root root    4096 Oct 20 17:59 kernel
lrwxrwxrwx 1 root root      37 Oct  6 02:29 lib -> ../linux-hwe-6.2-headers-6.2.0-35/lib
-rw-r--r-- 1 root root   72092 Oct  6 02:29 Makefile
lrwxrwxrwx 1 root root      36 Oct  6 02:29 mm -> ../linux-hwe-6.2-headers-6.2.0-35/mm
-rw-r--r-- 1 root root 1925700 Oct  6 02:29 Module.symvers
lrwxrwxrwx 1 root root      37 Oct  6 02:29 net -> ../linux-hwe-6.2-headers-6.2.0-35/net
lrwxrwxrwx 1 root root      47 Oct  6 02:29 rust -> ../linux-hwe-6.2-lib-rust-6.2.0-35-generic/rust
lrwxrwxrwx 1 root root      41 Oct  6 02:29 samples -> ../linux-hwe-6.2-headers-6.2.0-35/samples
drwxr-xr-x 7 root root   12288 Oct 20 17:59 scripts
lrwxrwxrwx 1 root root      42 Oct  6 02:29 security -> ../linux-hwe-6.2-headers-6.2.0-35/security
lrwxrwxrwx 1 root root      39 Oct  6 02:29 sound -> ../linux-hwe-6.2-headers-6.2.0-35/sound
drwxr-xr-x 4 root root    4096 Oct 20 17:59 tools
lrwxrwxrwx 1 root root      40 Oct  6 02:29 ubuntu -> ../linux-hwe-6.2-headers-6.2.0-35/ubuntu
lrwxrwxrwx 1 root root      37 Oct  6 02:29 usr -> ../linux-hwe-6.2-headers-6.2.0-35/usr
lrwxrwxrwx 1 root root      38 Oct  6 02:29 virt -> ../linux-hwe-6.2-headers-6.2.0-35/virt

Notice how the rust headers are located in a different path than the rest. That, together with running a container with a Fedora image (or similar), triggers the bug in Sysbox.

I'll fix it soon; thanks again for reporting.

michiboo commented 1 year ago

@ctalledo I created a PR @ https://github.com/nestybox/sysbox-runc/pull/91 to try to address this issue

ctalledo commented 10 months ago

Fixed in upcoming v0.6.3 release. Closing.