nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.61k stars 146 forks source link

Replacing oci-runc with sysbox-runc results in EOF errors #765

Open concourse-sysbox opened 5 months ago

concourse-sysbox commented 5 months ago

I am attempting to add sysbox-runc to a concourse ci worker. The concourse deployment is a tarball of binaries including but not limited to containerd, containerd-shim's, init, ctr, runc, and concourse binaries. The binaries refer to another in relative paths, meaning they do not rely on a package manager or systemd. I am mentioning this because there is no docker, and containerd is not installed on the system as a service.

Ultimately I am attempting to enable concourse to run docker-in-container workflows without passing in a privileged flag (on of sysbox's use cases).

Concourse allows using one of three container managers (guardian, containerd, and houdini). For the purposes of this bug, concourse is configured to use containerd.

I attempted to symlink the runc binary to /bin/sysbox-runc, and I also attempted to set a configuration file on containerd so that it would recognize sysbox-runc as its default runtime. In both cases sysbox failed to launch containers.

Error:

initializing
initializing check: image
selected worker: worker1
run check: find or create container on worker worker1: starting task: new task: failed to create shim task: OCI runtime create failed: container_linux.go:427: starting container process caused: process_linux.go:405: getting the final child's pid from pipe caused: EOF: unknown

Expected output: The container should be created and the job run in the container

System information: Linux 5.19.0-45-generic #46~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 15:06:04 UTC 20 x86_64 x86_64 x86_64 GNU/Linux

Concourse packaged runc information:

runc version 1.1.3
commit: v1.1.3-0-g6724737f
spec: 1.0.2-dev
go: go1.17.10
libseccomp: 2.5.4

Because the spec is 1.0.2-dev I used sysbox-ce_0.6.2:

sysbox-runc
    edition:    Community Edition (CE)
    version:    0.6.2
    commit:     60ca93c783b19c63581e34aa183421ce0b9b26b7
    built at:   Mon Jun 12 03:49:19 UTC 2023
    built by:   Cesar Talledo
    oci-specs:  1.0.2-dev

I was able to directly call sysbox-runc by creating a rootfs and calling sudo sysbox-runc run foobar. Hypothetically, I think the error may have to do with either how pipes or user mappings, or something else is managed between containerd, containerd-shim-runc-v2, and sysbox-runc?

ctalledo commented 5 months ago

Hi @concourse-sysbox,

Thanks for giving Sysbox a shot for this use-case.

Sysbox is made up of 3 components: sysbox-runc, sysbox-fs, and sysbox-mgr. The latter two must be running before launching a container. Once containerd starts the container, it will talk to sysbox-runc, which will then communicate with the sysbox-mgr and sysbox-fs to setup the container.

Question: did you start sysbox-mgr and sysbox-fs?

There's a script here that can start the sysbox components on a host with or without systemd. I encourage you to leverage that if possible, or at least take a look at how it starts Sysbox.

Let me know if this helps.

Thanks again!

ctalledo commented 5 months ago

BTW, this error:

starting container process caused: process_linux.go:405: getting the final child's pid from pipe caused: EOF: unknown

means that as soon as the sysbox-runc started the container process in it's namespaces, the process died (so it never communicated back with sysbox-runc to say "I am ready to go").

That could happen for many reasons, but if sysbox-fs and sysbox-mgr are not running, there's no point in debugging it.

concourse-sysbox commented 5 months ago

Question: did you start sysbox-mgr and sysbox-fs?

No I did not. I assumed (incorrectly) that sysbox-runc was all that was needed. I will attempt to run these and get back.

concourse-sysbox commented 5 months ago

I can confirm that sysbox-fs and sysbox-mgr are running (both in /usr/bin). I now get different errors than before:

run check: find or create container on worker worker1: starting task: new task: failed to create shim task: OCI runtime create failed: error in the container spec: invalid user/group ID config: host ID mappings are non-contiguous: [{ContainerID:0 HostID:4294967294 Size:1} {ContainerID:1 HostID:1 Size:4294967293}]: unknown

It looks to me that the hostID for containerID 0 is maxint or -1 (unsigned).

This is the error I get if I symlink my oci-runc for 1.0.2-dev over to sysbox-runc (also 1.0.2-dev spec). Namely, it's using the same shim binaries, calling conventions, patterns except for instead of calling into oci-runc, the shims are calling into sysbox-runc.

concourse-sysbox commented 5 months ago

@ctalledo Thank you for the help so far.

I have more information. Rather than get this to work with concourse, I figure I would check to make sure everything is working with docker and containerd. It is not.

/etc/docker/daemon.json

{
    "default-runtime": "sysbox-runc",
    "runtimes": {
        "sysbox-runc": {
            "path": "/usr/bin/sysbox-runc"
        }
    },
    "bip": "172.20.0.1/16",
    "default-address-pools": [
        {
            "base": "172.25.0.0/16",
            "size": 24
        }
    ]
}

All services have been restarted, etc.

docker info | grep runtime

 Runtimes: runc sysbox-runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: sysbox-runc

This looks like it may work. Indeed I can start various containers such as "alpine" and "busybox".

However I then try to run a container that accesses /var/fs/cgroup.

shell:path$ docker pull amidos/dcind:latest
latest: Pulling from amidos/dcind
21c83c524219: Pull complete 
32da89ce47f3: Pull complete 
6a00f327f2ca: Pull complete 
2fc1a4220b34: Pull complete 
Digest: sha256:0d88764c64cc3e2209c65f9298a0f60bbb104b8cdc510deec8c501bde01028f2
Status: Downloaded newer image for amidos/dcind:latest
docker.io/amidos/dcind:latest

shell:path$ docker run -it amidos/dcind
Starting Docker...
mount: /sys/fs/cgroup: permission denied.

I think it clearly makes sense to get this working before attempting to get it working in concourse.

concourse-sysbox commented 5 months ago

Update here that I am able to get sysbox-runc working with docker. Using docker:dind image confirms that docker-in-docker can run without "--privileged". However when I start concourse it directly talks to containerd, and I am presuming it is using a regular runc at this point.

ctalledo commented 5 months ago

Hi @concourse-sysbox,

host ID mappings are non-contiguous: [{ContainerID:0 HostID:4294967294 Size:1}

That error sounds like the /etc/subuid and /etc/subgid files may not be configured properly. How do these look?

Update here that I am able to get sysbox-runc working with docker. Using docker:dind image confirms that docker-in-docker can run without "--privileged".

Ok cool!

However when I start concourse it directly talks to containerd, and I am presuming it is using a regular runc at this point.

I am not familiar with concourse, but is there a flag similar to Docker's --runtime flag, so you can tell it to use sysbox-runc (or more accurately, tell it to tell containerd to use sysbox-runc)?

$ docker run --runtime=sysbox-runc -it --rm amidos/dcind:latest Starting Docker... mount: /sys/fs/cgroup: permission denied.

That error makes sense to me: the mount is blocked because the container is unprivileged; if we were to allow it, the container would gain access of the cgroups for the entire system (including other containers, etc.)

I wonder though, why is the amidos/dcind:latest image trying to mount /sys/fs/cgroup, if the container already has a fully functional /sys/fs/cgroup setup?

$ docker run --runtime=sysbox-runc -it --rm alpine

/ # mount | grep cgroup
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
concourse-sysbox commented 5 months ago

The contents of /etc/sub{u|g}id looks the same and as follows:

user1:100000:65536
user2:165536:65536
user3:231072:65536
user4:296608:65536
user5:362144:65536
user6:427680:65536
sysbox:493216:65536

I generally don't know anything about these files.

ctalledo commented 5 months ago

The contents of /etc/sub{u|g}id looks the same and as follows:

...
sysbox:493216:65536

OK that looks good: it's saying that user sysbox in the host can assign UID/GIDs in the range [493216, (493216+65536-1)] to an unprivileged container.

That means that the host ID mappings are non-contiguous: [{ContainerID:0 HostID:4294967294 Size:1} error has a different root cause then.

But sounds like we don't want to focus on this error yet correct?

concourse-sysbox commented 5 months ago

is there a flag similar to Docker's --runtime flag, so you can tell it to use sysbox-runc?

Not that I'm aware of. There is a way to pass a config.toml to containerd. However I have not found a way to reliably get containerd to honor a crafted config.toml file in a manner that gets it to use sysbox-runc. I am hoping since Docker uses containerd and somehow gets containerd to use sysbox-runc, that there's a way to get concourse to also do it...

concourse-sysbox commented 5 months ago

What I know right now is:

concourse-sysbox commented 5 months ago

'm symlinking my otherwise oci spec compatible runc binary to point over to sysbox-runc, and that's when I'm getting the error. My understanding is that the host ID issue is unexpected. Can you confirm that?

I think if that is unexpected from the project, that's something I could use help debugging.

ctalledo commented 5 months ago

'm symlinking my otherwise oci spec compatible runc binary to point over to sysbox-runc, and that's when I'm getting the error.

Oh I see; that should work though. On my host, I symlinked runc to sysbox-runc and things worked fine (with Docker):

root@sysbox-test:/usr/bin# ls -l | grep runc                                                                                                                                                                                                                                                                                 
-rwxr-xr-x 1 root root   10061976 Jan 12 20:30 containerd-shim-runc-v1                                                                                                                                                                                                                                                       
-rwxr-xr-x 1 root root   10087000 Jan 12 20:30 containerd-shim-runc-v2                                                                                                                                                                                                                                                       
lrwxrwxrwx 1 root root         20 Jan 18 01:42 runc -> /usr/bin/sysbox-runc                                                                                                                                                                                                                                                  
-rwxr-xr-x 1 root root    9717064 Jan 12 20:30 runc.bak                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
-rwxr-xr-x 1 root root   21581904 Jan 18 01:41 sysbox-runc                                                                                                                                                                                                                                                                   

root@sysbox-test:/usr/bin# docker run --runtime=runc -it --rm alpine                                                                                                                                                                                                                                                         
/ # cat /proc/self/uid_map                                                                                                                                                                                                                                                                                                   
         0     165536      65536         <<<< this confirms Sysbox created the container

My understanding is that the host ID issue is unexpected. Can you confirm that?

Correct, it's unexpected. As you mentioned, it seems like a -1 (unsigned) bug somewhere, but I've never seen it.

I have not been able to configure concourse/containerd to use sysbox-runc directly

If concourse does not have a flag to select the runtime, then there's a way to configure the "default runtime" in containerd via the /etc/containerd/config.toml file. Alternatively the symlinking should have worked.

There's also a tool called crictl that allows you to talk to containerd directly (i.e., you can use it to pull images, start containers, etc.). You could try it as a way of testing just containerd + sysbox works, and then bring concourse into the picture.

If containerd + sysbox fails with the HostID error above, then it's definitely a Sysbox bug or a misconfig somewhere, and happy to help you debug it.