nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.8k stars 155 forks source link

Failed to create shim task #557

Open matthewparkinsondes opened 2 years ago

matthewparkinsondes commented 2 years ago

os=ubuntu 20.04, kernel=5.18.2, docker=20.10.16, sysbox=0.5.2 EE, containerd=1.6.4, runc=1.1.1

docker run --runtime=sysbox-runc -m 1g hello-world

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:425: starting container process caused: process_linux.go:368: applying cgroup configuration for process caused: failed to write "1": write /sys/fs/cgroup/memory/docker/fd5011d5720ec36fee184eb963833ae2ad0a8e70f13cddf7798efdf31e5bd586/memory.kmem.limit_in_bytes: operation not supported: unknown.

ERRO[0000] error waiting for container: context canceled

rodnymolina commented 2 years ago

Thanks for filing this up @matthewparkinsondes. As previously discussed, i suspect that we are dealing with a kernel specific issue here as i'm unable to reproduce.

$ docker run --runtime=sysbox-runc -m 1g hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
2db29710123e: Pull complete
Digest: sha256:80f31da1ac7b312ba29d65080fddf797dd76acfb870e677f390d5acba9741b17
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.
...
For more examples and ideas, visit:
 https://docs.docker.com/get-started/

$
$ docker version | grep -i version
 Version:           20.10.16
  Version:          20.10.16

$ runc --version
runc version 1.1.1
commit: v1.1.1-0-g52de29d
spec: 1.0.2-dev
go: go1.17.9
libseccomp: 2.5.1

$ uname -r
5.13.0-44-generic
matheusmbar commented 2 years ago

Try updating Sysbox on host machine to v0.5.2, it includes a fix required for newer Docker versions >=20.10.15.

More information on: https://github.com/nestybox/sysbox/issues/544

ctalledo commented 2 years ago

Try updating Sysbox on host machine to v0.5.2, it includes a fix required for newer Docker versions >=20.10.15.

Not sure I follow @matheusmbar: per above, this issue occurs with Docker 20.10.16 and Sysbox v0.5.2.

matheusmbar commented 2 years ago

Not sure I follow @matheusmbar: per https://github.com/nestybox/sysbox/issues/557#issue-1262436548, this issue occurs with Docker 20.10.16 and Sysbox v0.5.2. I didn't see the Sysbox version indicated there.

pgrobelniak commented 2 years ago

Same here


ubuntu@ubuntu-VirtualBox:~$ uname -r
5.15.0-50-generic
ubuntu@ubuntu-VirtualBox:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:        20.04
Codename:       focal
ubuntu@ubuntu-VirtualBox:~$ sysbox-runc -v
sysbox-runc
        edition:        Community Edition (CE)
        version:        0.5.2
        commit:         d91c42c2125fd7aaf46f66307eb5c2a025f30289
        built at:       Wed May 18 19:49:04 UTC 2022
        built by:       Rodny Molina
        oci-specs:      1.0.2-dev
ubuntu@ubuntu-VirtualBox:~$ sudo docker run --runtime=sysbox-runc hello-world
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: container_linux.go:425: starting container process caused: process_linux.go:607: container init caused: process_linux.go:578: handleReqOp caused: rootfs_init_linux.go:366: failed to mkdirall /var/lib/sysbox/shiftfs/5b3304c9-68db-4942-add6-d6af9c2e8de2/var/lib/docker: mkdir /var/lib/sysbox/shiftfs/5b3304c9-68db-4942-add6-d6af9c2e8de2/var: value too large for defined data type caused: mkdir /var/lib/sysbox/shiftfs/5b3304c9-68db-4942-add6-d6af9c2e8de2/var: value too large for defined data type: unknown.
ctalledo commented 2 years ago

Hi @pgrobelniak , that issue is caused by a bug in the 5.15.0-50-generic kernel (the kernel is missing a Ubuntu-specific patch that enables shiftfs to work on it).

One solution is to upgrade the kernel (e.g., 5.19)

Another is to tell Sysbox to avoid shiftfs. You can do this by simply unloading the shiftfs module in the kernel (e.g., via rmmod) or by passing the "--disable-shiftfs" option to the sysbox-mgr daemon (via the sysbox-mgr systemd service; see this section in the sysbox user guide for more info).

pgrobelniak commented 2 years ago

@ctalledo thanks, turns out switching to 5.15.0-46-generic works too

josvazg commented 1 year ago

--disable-shiftfs is not working for me as a workaround:

$ uname -r
5.15.0-52-generic

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:    22.04
Codename:   jammy

$ sudo docker run --runtime=sysbox-runc hello-world
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:425: starting container process caused: process_linux.go:607: container init caused: process_linux.go:578: handleReqOp caused: rootfs_init_linux.go:366: failed to mkdirall /var/lib/sysbox/shiftfs/6a2422e9-444c-4c4b-a0af-5de90593ef39/var/lib/docker: mkdir /var/lib/sysbox/shiftfs/6a2422e9-444c-4c4b-a0af-5de90593ef39/var: value too large for defined data type caused: mkdir /var/lib/sysbox/shiftfs/6a2422e9-444c-4c4b-a0af-5de90593ef39/var: value too large for defined data type: unknown.
ERRO[0003] error waiting for container: context canceled 

$ systemctl status sysbox -n60
● sysbox.service - Sysbox container runtime
     Loaded: loaded (/lib/systemd/system/sysbox.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-11-18 19:04:50 CET; 2min 15s ago
       Docs: https://github.com/nestybox/sysbox
   Main PID: 3829654 (sh)
      Tasks: 2 (limit: 37934)
     Memory: 388.0K
        CPU: 53ms
     CGroup: /system.slice/sysbox.service
             ├─3829654 /bin/sh -c "/usr/bin/sysbox-runc --version && /usr/bin/sysbox-mgr --version --disable-shiftfs --log-level debug && /usr/bin/sysbox-fs --version && /bin/sleep infinity"
             └─3829672 /bin/sleep infinity
...

$ sysbox-runc -v
sysbox-runc
    edition:    Community Edition (CE)
    version:    0.5.2
    commit:     d91c42c2125fd7aaf46f66307eb5c2a025f30289
    built at:   Wed May 18 19:49:04 UTC 2022
    built by:   Rodny Molina
    oci-specs:  1.0.2-dev

Any advice?

ctalledo commented 1 year ago

Hi @josvazg,

docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:425: starting container process caused: process_linux.go:607: container init caused: process_linux.go:578: handleReqOp caused: rootfs_init_linux.go:366: failed to mkdirall /var/lib/sysbox/shiftfs/6a2422e9-444c-4c4b-a0af-5de90593ef39/var/lib/docker: mkdir /var/lib/sysbox/shiftfs/6a2422e9-444c-4c4b-a0af-5de90593ef39/var: value too large for defined data type caused: mkdir /var/lib/sysbox/shiftfs/6a2422e9-444c-4c4b-a0af-5de90593ef39/var: value too large for defined data type: unknown.

Looks like shiftfs is still enabled (i.e., sysbox-runc is still mounting it).

Can you check if the sysbox-mgr log (journalctl -u sysbox-mgr) shows a message indicating if shiftfs is in fact disabled? sysbox-mgr emits such a message on the log when it starts.

josvazg commented 1 year ago

Thanks for the reply @ctalledo

The logs does not show it:

... systemd[1]: Starting sysbox-mgr (part of the Sysbox container runtime)...
... sysbox-mgr[274492]: time="2022-11-19 09:47:43" level=info msg="Starting ..."
... sysbox-mgr[274492]: time="2022-11-19 09:47:43" level=info msg="Sysbox data root: /var/lib/sysbox"
... sysbox-mgr[274492]: time="2022-11-19 09:47:43" level=info msg="Listening on /run/sysbox/sysmgr.sock"
... sysbox-mgr[274492]: time="2022-11-19 09:47:43" level=info msg="Ready ..."
... systemd[1]: Started sysbox-mgr (part of the Sysbox container runtime).

But I know that it did NOT disable it because I explicitly rmmod shiftfs, checked it was gone from lsmod and it came back again when I started sysbox.

My question is more, if /usr/bin/sysbox-mgr --version --disable-shiftfs does NOT disable shiftfs in sysbox, what do I need to do to accomplish that? Maybe I could force the kernel itself NOT to load the module, but not sure what else is shiftfs used for, I might be breaking some other programs with that. Or maybe is there a newer kernel available that does not suffer from this problem?

ctalledo commented 1 year ago

Hi @josvazg,

I see the issue; you must add the --disable-shiftfs flag in the systemd service unit for sysbox-mgr (sysbox-mgr.service), not in the service unit for Sysbox. The latter is simply a wrapper that calls the systemd service unit for the sysbox-mgr and sysbox-fs, but in addition displays the version info (which is why it calls /usr/bin/sysbox-mgr --version).

Please modify the sysbox-mgr.service, add the --disable-shiftfs flag in there, reload the systemd units (systemctl daemon-reload) and restart sysbox (systemctl restart sysbox). That will work (we test disable-shiftfs in our CI).

More info in the sysbox docs.

Let me know if that works please.

josvazg commented 1 year ago

yeah, that worked! I was editing the wrong systemd unit all this time. But now:

$ sudo docker run --runtime=sysbox-runc hello-world

Hello from Docker!
This message shows that your installation appears to be working correctly.
...
$ sudo systemctl status sysbox-mgr
● sysbox-mgr.service - sysbox-mgr (part of the Sysbox container runtime)
     Loaded: loaded (/lib/systemd/system/sysbox-mgr.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-11-21 18:40:36 CET; 10s ago
   Main PID: 452414 (sysbox-mgr)
      Tasks: 11 (limit: 37934)
     Memory: 7.2M
        CPU: 97ms
     CGroup: /system.slice/sysbox-mgr.service
             └─452414 /usr/bin/sysbox-mgr --disable-shiftfs

... systemd[1]: Starting sysbox-mgr (part of the Sysbox container runtime)...
... sysbox-mgr[452414]: time="2022-11-21 18:40:36" level=info msg="Starting ..."
... sysbox-mgr[452414]: time="2022-11-21 18:40:36" level=info msg="Sysbox data root: /var/lib/sysbox"
...sysbox-mgr[452414]: time="2022-11-21 18:40:36" level=info msg="Shiftfs usage disabled."

^disabled as expected.

Thanks a lot @ctalledo!