Open concourse-sysbox opened 5 months ago
Hi @concourse-sysbox,
Thanks for giving Sysbox a shot for this use-case.
Sysbox is made up of 3 components: sysbox-runc, sysbox-fs, and sysbox-mgr. The latter two must be running before launching a container. Once containerd starts the container, it will talk to sysbox-runc, which will then communicate with the sysbox-mgr and sysbox-fs to setup the container.
Question: did you start sysbox-mgr and sysbox-fs?
There's a script here that can start the sysbox components on a host with or without systemd. I encourage you to leverage that if possible, or at least take a look at how it starts Sysbox.
Let me know if this helps.
Thanks again!
BTW, this error:
starting container process caused: process_linux.go:405: getting the final child's pid from pipe caused: EOF: unknown
means that as soon as the sysbox-runc started the container process in it's namespaces, the process died (so it never communicated back with sysbox-runc to say "I am ready to go").
That could happen for many reasons, but if sysbox-fs and sysbox-mgr are not running, there's no point in debugging it.
Question: did you start sysbox-mgr and sysbox-fs?
No I did not. I assumed (incorrectly) that sysbox-runc was all that was needed. I will attempt to run these and get back.
I can confirm that sysbox-fs and sysbox-mgr are running (both in /usr/bin). I now get different errors than before:
run check: find or create container on worker worker1: starting task: new task: failed to create shim task: OCI runtime create failed: error in the container spec: invalid user/group ID config: host ID mappings are non-contiguous: [{ContainerID:0 HostID:4294967294 Size:1} {ContainerID:1 HostID:1 Size:4294967293}]: unknown
It looks to me that the hostID for containerID 0 is maxint or -1 (unsigned).
This is the error I get if I symlink my oci-runc for 1.0.2-dev
over to sysbox-runc
(also 1.0.2-dev spec). Namely, it's using the same shim binaries, calling conventions, patterns except for instead of calling into oci-runc, the shims are calling into sysbox-runc.
@ctalledo Thank you for the help so far.
I have more information. Rather than get this to work with concourse, I figure I would check to make sure everything is working with docker and containerd. It is not.
/etc/docker/daemon.json
{
"default-runtime": "sysbox-runc",
"runtimes": {
"sysbox-runc": {
"path": "/usr/bin/sysbox-runc"
}
},
"bip": "172.20.0.1/16",
"default-address-pools": [
{
"base": "172.25.0.0/16",
"size": 24
}
]
}
All services have been restarted, etc.
docker info | grep runtime
Runtimes: runc sysbox-runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
Default Runtime: sysbox-runc
This looks like it may work. Indeed I can start various containers such as "alpine" and "busybox".
However I then try to run a container that accesses /var/fs/cgroup.
shell:path$ docker pull amidos/dcind:latest
latest: Pulling from amidos/dcind
21c83c524219: Pull complete
32da89ce47f3: Pull complete
6a00f327f2ca: Pull complete
2fc1a4220b34: Pull complete
Digest: sha256:0d88764c64cc3e2209c65f9298a0f60bbb104b8cdc510deec8c501bde01028f2
Status: Downloaded newer image for amidos/dcind:latest
docker.io/amidos/dcind:latest
shell:path$ docker run -it amidos/dcind
Starting Docker...
mount: /sys/fs/cgroup: permission denied.
I think it clearly makes sense to get this working before attempting to get it working in concourse.
Update here that I am able to get sysbox-runc working with docker. Using docker:dind image confirms that docker-in-docker can run without "--privileged". However when I start concourse it directly talks to containerd, and I am presuming it is using a regular runc at this point.
Hi @concourse-sysbox,
host ID mappings are non-contiguous: [{ContainerID:0 HostID:4294967294 Size:1}
That error sounds like the /etc/subuid
and /etc/subgid
files may not be configured properly. How do these look?
Update here that I am able to get sysbox-runc working with docker. Using docker:dind image confirms that docker-in-docker can run without "--privileged".
Ok cool!
However when I start concourse it directly talks to containerd, and I am presuming it is using a regular runc at this point.
I am not familiar with concourse, but is there a flag similar to Docker's --runtime
flag, so you can tell it to use sysbox-runc (or more accurately, tell it to tell containerd to use sysbox-runc)?
$ docker run --runtime=sysbox-runc -it --rm amidos/dcind:latest Starting Docker... mount: /sys/fs/cgroup: permission denied.
That error makes sense to me: the mount is blocked because the container is unprivileged; if we were to allow it, the container would gain access of the cgroups for the entire system (including other containers, etc.)
I wonder though, why is the amidos/dcind:latest
image trying to mount /sys/fs/cgroup
, if the container already has a fully functional /sys/fs/cgroup
setup?
$ docker run --runtime=sysbox-runc -it --rm alpine
/ # mount | grep cgroup
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
The contents of /etc/sub{u|g}id
looks the same and as follows:
user1:100000:65536
user2:165536:65536
user3:231072:65536
user4:296608:65536
user5:362144:65536
user6:427680:65536
sysbox:493216:65536
I generally don't know anything about these files.
The contents of
/etc/sub{u|g}id
looks the same and as follows:... sysbox:493216:65536
OK that looks good: it's saying that user sysbox
in the host can assign UID/GIDs in the range [493216, (493216+65536-1)] to an unprivileged container.
That means that the host ID mappings are non-contiguous: [{ContainerID:0 HostID:4294967294 Size:1}
error has a different root cause then.
But sounds like we don't want to focus on this error yet correct?
is there a flag similar to Docker's --runtime flag, so you can tell it to use sysbox-runc?
Not that I'm aware of. There is a way to pass a config.toml
to containerd. However I have not found a way to reliably get containerd to honor a crafted config.toml
file in a manner that gets it to use sysbox-runc
. I am hoping since Docker uses containerd and somehow gets containerd to use sysbox-runc, that there's a way to get concourse to also do it...
What I know right now is:
/etc/docker/daemon.json
and also by passing --runtime
'm symlinking my otherwise oci spec compatible runc binary to point over to sysbox-runc, and that's when I'm getting the error. My understanding is that the host ID issue is unexpected. Can you confirm that?
I think if that is unexpected from the project, that's something I could use help debugging.
'm symlinking my otherwise oci spec compatible runc binary to point over to sysbox-runc, and that's when I'm getting the error.
Oh I see; that should work though. On my host, I symlinked runc
to sysbox-runc
and things worked fine (with Docker):
root@sysbox-test:/usr/bin# ls -l | grep runc
-rwxr-xr-x 1 root root 10061976 Jan 12 20:30 containerd-shim-runc-v1
-rwxr-xr-x 1 root root 10087000 Jan 12 20:30 containerd-shim-runc-v2
lrwxrwxrwx 1 root root 20 Jan 18 01:42 runc -> /usr/bin/sysbox-runc
-rwxr-xr-x 1 root root 9717064 Jan 12 20:30 runc.bak
-rwxr-xr-x 1 root root 21581904 Jan 18 01:41 sysbox-runc
root@sysbox-test:/usr/bin# docker run --runtime=runc -it --rm alpine
/ # cat /proc/self/uid_map
0 165536 65536 <<<< this confirms Sysbox created the container
My understanding is that the host ID issue is unexpected. Can you confirm that?
Correct, it's unexpected. As you mentioned, it seems like a -1 (unsigned) bug somewhere, but I've never seen it.
I have not been able to configure concourse/containerd to use sysbox-runc directly
If concourse does not have a flag to select the runtime, then there's a way to configure the "default runtime" in containerd via the /etc/containerd/config.toml file. Alternatively the symlinking should have worked.
There's also a tool called crictl that allows you to talk to containerd directly (i.e., you can use it to pull images, start containers, etc.). You could try it as a way of testing just containerd + sysbox works, and then bring concourse into the picture.
If containerd + sysbox fails with the HostID error above, then it's definitely a Sysbox bug or a misconfig somewhere, and happy to help you debug it.
I am attempting to add sysbox-runc to a concourse ci worker. The concourse deployment is a tarball of binaries including but not limited to containerd, containerd-shim's, init, ctr, runc, and concourse binaries. The binaries refer to another in relative paths, meaning they do not rely on a package manager or systemd. I am mentioning this because there is no docker, and containerd is not installed on the system as a service.
Ultimately I am attempting to enable concourse to run docker-in-container workflows without passing in a privileged flag (on of sysbox's use cases).
Concourse allows using one of three container managers (guardian, containerd, and houdini). For the purposes of this bug, concourse is configured to use containerd.
I attempted to symlink the runc binary to /bin/sysbox-runc, and I also attempted to set a configuration file on containerd so that it would recognize sysbox-runc as its default runtime. In both cases sysbox failed to launch containers.
Error:
Expected output: The container should be created and the job run in the container
System information:
Linux 5.19.0-45-generic #46~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 15:06:04 UTC 20 x86_64 x86_64 x86_64 GNU/Linux
Concourse packaged runc information:
Because the spec is 1.0.2-dev I used sysbox-ce_0.6.2:
I was able to directly call sysbox-runc by creating a rootfs and calling
sudo sysbox-runc run foobar
. Hypothetically, I think the error may have to do with either how pipes or user mappings, or something else is managed between containerd, containerd-shim-runc-v2, and sysbox-runc?