nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.84k stars 160 forks source link

Problem running Sysbox on Fedora 35 with SELinux enabled #510

Open fhaefemeier opened 2 years ago

fhaefemeier commented 2 years ago

I found sysbox and was happy, because it support my use cases to setup a CI environment without the limitations and security implications with DinD or similar setup. Thanks for this project and your commitment.

I started to read the documentation and tried to provide the sysbox on my host (Fedora based). I found a few minor issues during source build (separate issues will be coming). Now I started to install and configure it.

But, I am at the point to configure docker and the system and it had stopped me. Because of the change to enable userns-remap of docker. It has a major impact on my running host system and their installed containers. I will try to install shiftfs from source, but I would like to ask, when will "... soon ..." be reached.

In the documentation I found the sentence "In the near future (kernels 5.12+), shiftfs is expected to be replaced ... Sysbox will soon have support for this." Are there any plans (time schedule) to enable this feature. It would help very much (hopefully) to reduce the changes on the host system

ctalledo commented 2 years ago

Hi @fhaefemeier, thanks for finding Sysbox and giving it a shot.

Your timing is really good, we are just a few days away from the Sysbox v0.5.0 release which includes ID-mapped mounts support. However, the release will only have the .deb packages (for Ubuntu / Debian) and we won't have an RPM package for it for a few more weeks (we are actively working on this too).

The Sysbox upstream code already includes ID-mapped mounts support, so while we work on the RPM package, you can try Sysbox on Fedora (with kernel >= 5.12) by building it from source (it's pretty easy), but let us know if you need assistance.

fhaefemeier commented 2 years ago

Hi @ctalledo it good news.

The Sysbox upstream code already includes ID-mapped mounts support, so while we work on the RPM package, you can try Sysbox on Fedora (with kernel >= 5.12) by building it from source (it's pretty easy), but let us know if you need assistance.

You mean, it is already functional in the latest code available? I build yesterday sysbox on Fedora following your guide. So I have now a version which provide the new feature. Do I have to do something special or will it be enough to enable sysbox-runc without 'userns-remap' in docker daemon?

ctalledo commented 2 years ago

Hi @fhaefemeier,

You mean, it is already functional in the latest code available?

Yes that's right.

Do I have to do something special or will it be enough to enable sysbox-runc without 'userns-remap' in docker daemon?

Simply enable sysbox-runc without userns-remap.

Assuming you have a kernel >= 5.12, when you launch a container with Docker + Sysbox, it should work and you will see some ID-mapped mounts. E.g.:

$ docker run --runtime=sysbox-runc -it --rm ubuntu

root@62adea865594:/# findmnt | grep idmap 
|-/etc/resolv.conf                                           /dev/nvme1n1p1[/tmp/sysbox-test-var-lib/docker/containers/62adea8655942f1f382f441ba38c07f2d67ae29187fca9a3e99e53b1a3e6fb75/resolv.conf] ext4    rw,relatime,idmapped,errors=remount-ro
|-/etc/hostname                                              /dev/nvme1n1p1[/tmp/sysbox-test-var-lib/docker/containers/62adea8655942f1f382f441ba38c07f2d67ae29187fca9a3e99e53b1a3e6fb75/hostname]    ext4    rw,relatime,idmapped,errors=remount-ro
|-/etc/hosts                                                 /dev/nvme1n1p1[/tmp/sysbox-test-var-lib/docker/containers/62adea8655942f1f382f441ba38c07f2d67ae29187fca9a3e99e53b1a3e6fb75/hosts]       ext4    rw,relatime,idmapped,errors=remount-ro
|-/usr/src/linux-headers-5.13.0-1017-aws                     /dev/root[/usr/src/linux-headers-5.13.0-1017-aws]                                                                                       ext4    ro,relatime,idmapped,discard,errors=remount-ro
|-/usr/src/linux-aws-headers-5.13.0-1017                     /dev/root[/usr/src/linux-aws-headers-5.13.0-1017]                                                                                       ext4    ro,relatime,idmapped,discard,errors=remount-ro
`-/usr/lib/modules/5.13.0-1017-aws                           /dev/root[/usr/lib/modules/5.13.0-1017-aws]                                                                                             ext4    ro,relatime,idmapped,discard,errors=remount-ro

Let me know if you hit any issues please.

fhaefemeier commented 2 years ago

Let me know if you hit any issues please.

I installed sysbox on Fedora 35 with idmapped_mount (see also #513) and start a system container with docker run --runtime=sysbox-runc --rm -it --hostname my_cont debian:latest. It started successfully. But I have a strange behaviour, not sure if it is related to idmapped mounts or something different (I can open another ticket if you want).

The start of a system container is only successful after several failure starts. Failure ends with the message

docker: Error response from daemon: OCI runtime create failed: error in the container spec: failed to request rootfs cloning from sysbox-mgr: failed to invoke ReqCloneRootfs via grpc: rpc error: code = Unknown desc = failed to mount clone for container 489a56472eb8: failed to set up bottom ovfs mount: failed to mount overlayfs on /srv/container/sysbox/rootfs/489a56472eb8f43a157939a69443ee115f87be5d8324fc07358142dd4afbaa7f/bottom/merged: invalid argument: unknown.

I installed sysbox-mgr and sysbox-fs with default parameter. Only data-root is changed to /srv/container/sysbox instead of /var/lib/sysbox. After starting both daemons log shows:

Mär 20 20:01:05 homecloud systemd[1]: Starting sysbox-fs (part of the Sysbox container runtime)...
Mär 20 20:01:05 homecloud sysbox-fs[1391169]: {"level":"info","msg":"Initiating sysbox-fs ...","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-fs[1391169]: {"level":"info","msg":"Initializing with 'allow-immutable-remounts' knob disabled (default)","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-fs[1391169]: {"level":"info","msg":"Initializing with 'allow-immutable-unmounts' knob enabled (default)","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-fs[1391169]: {"level":"info","msg":"FUSE dir = /srv/container/sysboxfs","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-fs[1391169]: {"level":"info","msg":"IOvec memParser elected","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-fs[1391169]: {"level":"info","msg":"Listening on /run/sysbox/sysfs.sock","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-fs[1391169]: {"level":"info","msg":"Ready ...","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud systemd[1]: Started sysbox-fs (part of the Sysbox container runtime).
Mär 20 20:01:05 homecloud systemd[1]: Starting sysbox-mgr (part of the Sysbox container runtime)...
Mär 20 20:01:05 homecloud sysbox-mgr[1391149]: {"level":"info","msg":"Starting ...","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-mgr[1391149]: {"level":"info","msg":"Sysbox data root: /srv/container/sysbox","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-mgr[1391149]: {"level":"warning","msg":"failed to cleanup /srv/container/sysbox: unlinkat /srv/container/sysbox: device or resource busy","time":"2022-03>
Mär 20 20:01:05 homecloud sysbox-mgr[1391149]: {"level":"info","msg":"Listening on /run/sysbox/sysmgr.sock","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud sysbox-mgr[1391149]: {"level":"info","msg":"Ready ...","time":"2022-03-20 20:01:05"}
Mär 20 20:01:05 homecloud systemd[1]: Started sysbox-mgr (part of the Sysbox container runtime).

Let me know how I can help.

fhaefemeier commented 2 years ago

Maybe related. Found in kernel log

Mär 20 21:18:53 homecloud kernel: overlayfs: unrecognized mount option "c693"" or missing value
ctalledo commented 2 years ago

Thanks @fhaefemeier.

Only data-root is changed to /srv/container/sysbox instead of /var/lib/sysbox

Could you please try with the default data-root (i.e., /var/lib/sysbox)? It should work either way, but I wonder if we have a bug that we've not caught.

ctalledo commented 2 years ago

Modified the title to make it a bit more specific to this issue.

fhaefemeier commented 2 years ago

I changed the config and used the default data-root (sysbox-mgr) and mountpoint (sysbox-fs) and have the same behaviour. Before retesting I update to the latest git version (master branch). I forgot to mention in my original comment, I use systemd unit files to start sysbox-mgr and sysbox-fs. I took the unit files from debian package as example.

Log entry sysbox-mgr

Mär 25 23:58:51 homecloud sysbox-mgr[676730]: {"level":"info","msg":"registered new container 9ada1c4eff78","time":"2022-03-25 23:58:51"}
Mär 25 23:58:51 homecloud sysbox-mgr[676730]: {"level":"info","msg":"unregistered container 9ada1c4eff78","time":"2022-03-25 23:58:51"}
Mär 25 23:58:51 homecloud sysbox-mgr[676730]: {"level":"warning","msg":"failed to unbind cloned rootfs for container 9ada1c4eff78: failed to unmount clone for container 9ada1c4eff78: failed to remove top mount: invalid argument","time":"2022-03-25 23:58:51"}
Mär 25 23:58:51 homecloud sysbox-mgr[676730]: {"level":"info","msg":"released resources for container 9ada1c4eff78","time":"2022-03-25 23:58:51"}

Log entry sysbox-fs

Mär 25 23:58:51 homecloud sysbox-fs[676750]: {"level":"info","msg":"Container pre-registration completed: id = 9ada1c4eff78","time":"2022-03-25 23:58:51"}
Mär 25 23:58:51 homecloud sysbox-fs[676750]: {"level":"info","msg":"Container unregistration completed: id = 9ada1c4eff78","time":"2022-03-25 23:58:51"}

I will (for different reasons) reboot my server tomorrow and will have a clean setup. I will report my experience.

fhaefemeier commented 2 years ago

No news after server reboot. Still same behaviour. How can I support you?

ctalledo commented 2 years ago

Hi @fhaefemeier, let me take a look in the next couple of days so we can get to the bottom of this.

ctalledo commented 2 years ago

Hi @fhaefemeier, finally got a chance to take a closer look.

I created a Fedora 35 VM on Google Compute Engine (GCE), the cloned the Sysbox GitHub repo, and did a make test-shell to get a shell inside the Sysbox test container, and after this was able to create containers without problem.

I did hit a few setup issues (which you probably run into also):

1) I had to create a Sysbox test container Dockerfile for Fedora 35 (see this commit).

2) The Sysbox Makefile had a dependency on lsb_release which required me to install the redhat-lsb-core core package on the host machine. I committed a change to remove this requirement going forward.

3) The Fedora-35 VM instance that I used as my host came with / mounted on a disk formatted with btrfs (rather than ext4). This is an issue because we don't officially support btrfs yet (we've not done much testing on it), and ID-mapped mounts don't yet work on btrfs (a Linux kernel limitation). To overcome this, I added an ext4 disk to the VM and pointed both the Docker data-root and the Sysbox data-root to that disk.

After this, I was able to do a make test-shell which gets me a shell inside the test container, and from then was able to deploy containers with Docker + Sysbox without problem. This gives me confidence Sysbox works well on Fedora 35.

Then I took the next step and installed Sysbox on the Fedora 35 VM directly, as follows:

1) Install the fuse package on the host (Sysbox requires it).

$ dnf install -y fuse

2) Build the Sysbox binaries and install them:

$ make sysbox && make install

3) Start Sysbox:

$ ./scr/sysbox

4) Configure Docker with Sysbox (for this I used the convenience script in the Sysbox repo which does the config in /etc/docker/daemon.json and restarts Docker):

$ ./scr/docker-cfg --sysbox-runtime=enable

After this, I was able to run Sysbox containers directly on the Fedora 35 host as follows:

[vagrant@instance-1 sysbox]$ cat /etc/os-release | egrep "^NAME|^VERSION"
NAME="Fedora Linux"
VERSION="35 (Cloud Edition)"
VERSION_ID=35
[vagrant@instance-1 sysbox]$ docker run --runtime=sysbox-runc -it --rm nestybox/ubuntu-focal-systemd-docker

Welcome to Ubuntu 20.04.2 LTS!                 

[  OK  ] Created slice system-getty.slice.                              
[  OK  ] Created slice system-modprobe.slice.       
[  OK  ] Created slice User and Session Slice. 
...
[  OK  ] Reached target Graphical Interface.
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.

Ubuntu 20.04.2 LTS 030f7f2a79bf console

030f7f2a79bf login: admin
Password: 
Welcome to Ubuntu 20.04.2 LTS (GNU/Linux 5.14.10-300.fc35.x86_64 x86_64)

...

admin@030f7f2a79bf:~$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

admin@030f7f2a79bf:~$ docker run -it alpine
Unable to find image 'alpine:latest' locally
latest: Pulling from library/alpine
40e059520d19: Pull complete 
Digest: sha256:f22945d45ee2eb4dd463ed5a431d9f04fcd80ca768bb1acf898d91ce51f7bf04
Status: Downloaded newer image for alpine:latest
/ # 

As you can see, I am not hitting the same error you got (not sure why). Please try to follow the steps above and let me know if this fixes it or not. Also, feel free to join the Sysbox slack channel, as that may be a better forum to get to the bottom of the issue you are facing.

fhaefemeier commented 2 years ago

@ctalledo Thanks providing your test scenario. I will check it, but it will take a little bit of time. I will keep you informed. I use XFS as filesystem (md raid/LVM), if it is important...

One question, did you repeat creating sysbox containers (in short frequence)? In my case, for a test, I called a docker run several times one after another and aprox. three of ten are working...

ctalledo commented 2 years ago

Hi @fhaefemeier,

I use XFS as filesystem (md raid/LVM), if it is important...

That could matter; in my VM I have btrfs or ext4.

did you repeat creating sysbox containers (in short frequence)? In my case, for a test, I called a docker run several times one after another and aprox. three of ten are working...

Yes, I see no problem:

[vagrant@instance-1 sysbox]$ for i in $(seq 1 10); do docker run --runtime=sysbox-runc -d --rm ghcr.io/nestybox/ubuntu-focal-systemd-docker; done
50eef94fac681c353d60dc3903ef449cbc165827d8faad32ed34d42e2b2df3bc
50a8b01326c0d0adc29bccaab1605c02980b9aaf1a108ef90153b20162d4228b
a53e04092089e342c9edd5c3e37580990295e4157418bca1649032d9f0567d55
59f99b09f5170b10a986eeadfe8dc1bab8d1e0889a6f882ebe0c3b1c24978eb0
244cb65d8bc3932628b4b5a820a2a5655a6362ca55a3a486779cf543402822da
b874a9870c16bf9b6db3533a7036befb69ce55df085b54dc442bbcf4982091b9
bfbaefaca8c6c0489b73e656a07350c71c369ae971ea429f4b718ba18accf03b
1d565603b68dae93681b2adf4e60d0a7ab069eb3f9f83571dacea778b756424e
538fc3ae59ae309142f4846b445ba005d83dfcfb5d1f8466b31e1b25ff55506d
ad3e89012b3c9f9e6d38b723a84f1f24846ffae6d52d9e876567be727610604c

[vagrant@instance-1 sysbox]$ docker ps
CONTAINER ID   IMAGE                                          COMMAND                  CREATED          STATUS          PORTS     NAMES
ad3e89012b3c   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   4 seconds ago    Up 2 seconds    22/tcp    sleepy_keldysh
538fc3ae59ae   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   6 seconds ago    Up 4 seconds    22/tcp    ecstatic_greider
1d565603b68d   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   8 seconds ago    Up 6 seconds    22/tcp    nifty_fermi
bfbaefaca8c6   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   10 seconds ago   Up 8 seconds    22/tcp    quizzical_bartik
b874a9870c16   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   13 seconds ago   Up 9 seconds    22/tcp    boring_hawking
244cb65d8bc3   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   16 seconds ago   Up 13 seconds   22/tcp    thirsty_driscoll
59f99b09f517   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   18 seconds ago   Up 15 seconds   22/tcp    inspiring_brahmagupta
a53e04092089   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   19 seconds ago   Up 17 seconds   22/tcp    peaceful_burnell
50a8b01326c0   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   21 seconds ago   Up 19 seconds   22/tcp    fervent_austin
50eef94fac68   ghcr.io/nestybox/ubuntu-focal-systemd-docker   "/sbin/init --log-le…"   22 seconds ago   Up 20 seconds   22/tcp    charming_khorana
fhaefemeier commented 2 years ago

@ctalledo I had the chance to test it in a separate environment (Fedora 35 installed in a qemu VM). make test-shell was successful and installed sysbox inside the VM following your steps (1-4). The installation use XFS as filesystem and everything (docker, sysbox) use default parameter (e.g. data-root for docker and sysbox).

[sysbox@fedora sysbox]$ cat /etc/os-release | egrep "^NAME|^VERSION"
NAME="Fedora Linux"
VERSION="35 (Server Edition)"
VERSION_ID=35

Creating a sysbox with docker run --runtime=sysbox-runc -it --rm nestybox/ubuntu-focal-systemd-docker was successful without errors and repeatable. But there is one important difference to my original host system. Selinux is enabled in docker daemon. If I enable it in the VM I have the same/similar error scenarios. At the end system containers can't be created.

[sysbox@fedora sysbox]$ docker run --runtime=sysbox-runc -it --rm nestybox/ubuntu-focal-systemd-docker
docker: Error response from daemon: failed to create shim: OCI runtime create failed: error in the container spec: failed to request rootfs cloning from sysbox-mgr: failed to invoke ReqCloneRootfs via grpc: rpc error: code = Unknown desc = failed to mount clone for container eb2df2642d49: failed to set up bottom ovfs mount: failed to mount overlayfs on /var/lib/sysbox/rootfs/eb2df2642d49f1f2f937fc53f381e77a1ed241d42746dab5d76c84e4810e75ea/bottom/merged: invalid argument: unknown.

Or different error

[sysbox@fedora sysbox]$ docker run --runtime=sysbox-runc -it --rm nestybox/ubuntu-focal-systemd-docker
Welcome to Ubuntu 20.04.2 LTS!

Failed to create /init.scope control group: Permission denied
Failed to allocate manager object: Permission denied
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
ctalledo commented 2 years ago

Hi @fhaefemeier,

Thanks for the update.

But there is one important difference to my original host system. Selinux is enabled in docker daemon. If I enable it in the VM I have the same/similar error scenarios. At the end system containers can't be created.

How exactly did you enable SELinux in the VM? I want to see if I can repro on my Fedora 35 host.

Thanks.

fhaefemeier commented 2 years ago

If you have done a standard Fedora installation, SELinux is running in enforcing mode (default mode). You can check it in /etc/selinux/config. Additionally the docker daemon is configured with

{
  "log-driver": "journald",
  "selinux-enabled": true,
  "runtimes": {
     "sysbox-runc": {
        "path": "/usr/local/bin/sysbox-runc"
     }
  }
}

You can check the SELinux labels with ls -lZ. There are different labels. The main labels (known and used by me) are

But there are more, for sure. An older post on stack overflow gives some hints.

fhaefemeier commented 2 years ago

Any news?

ctalledo commented 2 years ago

Hi @fhaefemeier, my apologies, did not get a chance to check the SELinux part yet; will try to get to it next week.