Closed pmalek closed 9 months ago
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
@BenTheElder @AkihiroSuda ^^^
EDIT: updating this early comment to note that Colima is fixed via https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1807235030, just upgrade to v0.6.0 colima
This is an issue with the host environment presumably with --cgroupns=private.
colima is @abiosoft
I still don't recommend alpine / openrc for container hosts vs essentially any distro with systemd.
It's unfortunate that we can't even start the container with these options.
you could probably more immediately work around this by using lima with an Ubuntu guest VM
Oh, I'm having the same problem, my environment is in GithubAction that using colima to start docker on MacOS runner.
https://github.com/kubernetes-sigs/kwok/actions/runs/5279627795/jobs/9551621894?pr=654#step:14:95
@BenTheElder I've tried with ubuntu layer (colima has this flag: --layer
to use it) and I'm getting this:
$ colima ssh cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.04
DISTRIB_CODENAME=lunar
DISTRIB_DESCRIPTION="Ubuntu 23.04"
$colima ssh -- uname -a
Linux colima 6.1.29-0-virt #1-Alpine SMP Wed, 17 May 2023 14:22:15 +0000 aarch64 aarch64 aarch64 GNU/Linux
$ docker run --name colima-control-plane --hostname colima-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=colima --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:54688:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.2@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72
9cc1f3da207bb97b37630eb842cc5137ac52c714ff20b6fecfc1e824e5d0d0b6
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
$ docker info
Client:
Version: 24.0.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.10.5
Path: /usr/local/lib/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.18.1
Path: /usr/local/lib/docker/cli-plugins/docker-compose
dev: Docker Dev Environments (Docker Inc.)
Version: v0.1.0
Path: /usr/local/lib/docker/cli-plugins/docker-dev
extension: Manages Docker extensions (Docker Inc.)
Version: v0.2.19
Path: /usr/local/lib/docker/cli-plugins/docker-extension
init: Creates Docker-related starter files for your project (Docker Inc.)
Version: v0.1.0-beta.4
Path: /usr/local/lib/docker/cli-plugins/docker-init
sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
Version: 0.6.0
Path: /usr/local/lib/docker/cli-plugins/docker-sbom
scan: Docker Scan (Docker Inc.)
Version: v0.26.0
Path: /usr/local/lib/docker/cli-plugins/docker-scan
scout: Command line tool for Docker Scout (Docker Inc.)
Version: v0.12.0
Path: /usr/local/lib/docker/cli-plugins/docker-scout
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 23.0.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 1fbd70374134b891f97ce19c70b6e50c7b9f4e0d
runc version: 860f061b76bb4fc671f0f9e900f7d80ff93d4eb7
init version:
Security Options:
seccomp
Profile: builtin
Kernel Version: 6.1.29-0-virt
Operating System: Alpine Linux v3.18
OSType: linux
Architecture: aarch64
CPUs: 6
Total Memory: 7.754GiB
Name: colima
ID: b3c96bfd-b99b-44bc-b950-9b9109012530
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: USER
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
These are the cgroup mounts inside the VM:
mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,inode64)
openrc on /sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
net_cls on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup_root on /host/sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755,inode64)
openrc on /host/sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
none on /host/sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cpuset on /host/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /host/sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /host/sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /host/sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /host/sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /host/sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /host/sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
net_cls on /host/sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /host/sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /host/sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /host/sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /host/sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
tmpfs on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,size=4096k,nr_inodes=1024,mode=755,inode64)
openrc on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
cpuset on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
net_cls on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
none on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
tmpfs on /sys/fs/cgroup/systemd type tmpfs (rw,nosuid,nodev,noexec,relatime,inode64)
uname is still showing alpine kernel and openrc is still showing up even though Ubuntu doesn't use it, I don't think that flag is changing the guest VM
From the lima FAQ I think it only provides an Ubuntu userspace environment and doesn't allow customizing the underlying Guest OS / kernel / ... https://github.com/abiosoft/colima/blob/main/docs/FAQ.md#is-another-distro-supported
So I think colima will always be alpine / openrc unfortunately and subject to bugs like this.
See also past discussion https://github.com/abiosoft/colima/issues/291#issuecomment-1130470008 https://github.com/abiosoft/colima/issues/163 ...
I think https://github.com/lima-vm/lima/blob/master/examples/docker-rootful.yaml would be an Ubuntu + typical docker host env on lima.
I'd also strongly recommend moving to a guest environment that uses cgroup v2 sooner than later, as the ecosystem is poised to drop v1 (I'd guess in the next year or so) and we can't do much about that.
Ubuntu, Debian, Docker desktop, Fedora, ... most linux environments have switched for some time now.
If we can't get this resolved with some patch to colima to enable working cgroups=private containers, we can consider reverting to not require cgroupns=private, but it adds back a third much more broken cgroups nesting environment (cgroup v1, host cgroupns) that we'd otherwise planned to phase out now that docker has supported cgroupns=private for a few years now and podman likewise (also the default on cgroups v2).
From the lima FAQ I think it only provides an Ubuntu userspace environment and doesn't allow customizing the underlying Guest OS / kernel / ...
typo: s/lima/colima/ π
as the ecosystem is poised to drop v1 (I'd guess in the next year or so)
The ecosystem of runc, containerd, etc. isn't likely to drop v1 before 2029 (EL8 EOL).
typo: s/lima/colima/ π
sorry, yes!
same comment suggests lima with ubuntu / docker guest π
The ecosystem of runc, containerd, etc. isn't likely to drop v1 before 2029 (EL8 EOL).
Kubernetes has been discussing it already and I believe systemd but it's good to know some of the others won't. π
Kubernetes has been discussing it already
Is there a KEP?
We also have a lot of DNS issues with Lima due to use Alpine. I really wish they would move away from a musl based operating system.
We also have a lot of DNS issues with Lima due to use Alpine. I really wish they would move away from a musl based operating system.
Lima defaults to Ubuntu...
limactl start template://docker
Using Alpine is a choice by downstream, mostly for size reasons. I don't know of an apk distro using systemd/glibc instead of openrc/musl, but I suppose it is possible (or maybe use Debian, it is also smaller)
I remember spending a lot of hours with lima
due to network issues.
For instance trying to figure out if I can use lima
now instead of colima
: I create the VM from one of the examples that contain docker (https://github.com/lima-vm/lima/tree/master/examples) or via the above mentioned limactl start template://docker
.
This works and I can create kind
cluster when the docker socket is forwarded to the host.
For full context: I use metallb
for LoadBalancer service (with some custom route
and iptables
command so that host traffic is forwarded to the VM and then kind's node.
Now, I'm not sure why (haven't found the place in code that would explain the difference between lima
and colima
) but when I create VMs with colima
and then create the kind
cluster inside it, I can see the kind
network created:
and the underlying network interface br-58c6efc26188
using 172.18.0.1/16
network: ( this can then be used by metallb
to allocate IPs and I'll get traffic routed to the desired service)
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:55:55:38:aa:84 brd ff:ff:ff:ff:ff:ff
inet 192.168.5.15/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5055:55ff:fe38:aa84/64 scope link
valid_lft forever preferred_lft forever
3: col0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:55:55:e7:7d:6d brd ff:ff:ff:ff:ff:ff
inet 192.168.106.2/24 scope global col0
valid_lft forever preferred_lft forever
inet6 fd63:1468:4f87:231a:5055:55ff:fee7:7d6d/64 scope global dynamic flags 100
valid_lft 2590839sec preferred_lft 603639sec
inet6 fe80::5055:55ff:fee7:7d6d/64 scope link
valid_lft forever preferred_lft forever
4: br-58c6efc26188: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 02:42:37:28:dd:56 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-58c6efc26188
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::1/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::42:37ff:fe28:dd56/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::1/64 scope link
valid_lft forever preferred_lft forever
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:41:5a:79:67 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
7: veth471fc84@if6: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue master br-58c6efc26188 state UP
link/ether 6e:e3:f6:39:c8:05 brd ff:ff:ff:ff:ff:ff
inet6 fe80::6ce3:f6ff:fe39:c805/64 scope link
valid_lft forever preferred_lft forever
with lima
I don't get that inferface even though kind
network is created exactly the same way :
this way I can't get the traffic into the cluster using 172.18.0.1
network.
EDIT: the reason for this is most likely docker in lima ubuntu VM using cgroup v2, which causes kind network to land in a separate net namespace (but that's a guess). Not sure how could I then make the traffic get routed inside kind's network (and then its container).
$ sudo lsns --type=net
NS TYPE NPROCS PID USER NETNSID NSFS COMMAND
4026531840 net 118 1 root unassigned /sbin/init
4026532237 net 12 3820 lima unassigned /proc/self/exe --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=bu
4026532314 net 30 4404 lima unassigned /sbin/init
4026532406 net 1 5492 lima unassigned registry serve /etc/docker/registry/config.yml
4026532472 net 1 5628 lima unassigned registry serve /etc/docker/registry/config.yml
4026532543 net 2 6176 165534 unassigned /pause
4026532602 net 2 6144 165534 unassigned /pause
4026532665 net 2 6216 165533 unassigned /pause
4026532724 net 2 6215 165534 unassigned /pause
$ sudo nsenter -n --target 3820 ip a s br-ae7cbfeb3d9b
4: br-ae7cbfeb3d9b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:e8:51:b5:1f brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-ae7cbfeb3d9b
valid_lft forever preferred_lft forever
inet6 fc00:f853:ccd:e793::1/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::42:e8ff:fe51:b51f/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::1/64 scope link
valid_lft forever preferred_lft forever
As for the issue at hand:
I understand that with #3241 the ship might have already sailed but perhaps we might still consider using the provider info Cgroup2
field and set the --cgroupns
flag only when cgroupv2 is available?
Same error happens with Rancher Desktop that is using lima under the hood
Experiencing the same on Rancher Desktop. Downgrading to kind 0.19.0 fixes the issue for now.
Would be great to get a fix for 0.20.0.
The issue I see on Rancher Desktop using Kind 0.20.0 is the following:
$ kind create cluster --name test-cluster --image kindest/node:v1.27.3
Boostrapping clusterβ¦
Creating cluster "test-cluster" ...
β Ensuring node image (kindest/node:v1.27.3) πΌ
β Preparing nodes π¦
Deleted nodes: ["eks-cluster-control-plane"]
ERROR: failed to create cluster: command "docker run --name test-cluster-control-plane --hostname test-cluster-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=test-cluster --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:50566:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.3" failed with error: exit status 125
Command Output: 82623b67d511c7e10ed075323e621ec66befa9047e3c7b56647ca99fd78e0db6
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
Inability to create a container with this docker 20.10.0 feature from 2020-12-08 is still considered a bug in colima / rancher desktop. I'd like to hear a response from those projects before we revert anything. Ensuring private cgroupns is a big benefit for the project.
I understand that with https://github.com/kubernetes-sigs/kind/pull/3241 the ship might have already sailed but perhaps we might still consider using the provider info Cgroup2 field and set the --cgroupns flag only when cgroupv2 is available?
The point of setting this flag is to ensure that this is set on cgroupv1 hosts. cgroupv2 hosts already default to this.
cgroupv1 hosts are the problem. On hosts other than apline/colima/rancher desktop this works great. Alpine and colima / rancher desktop use an unusual init system that doesn't seem to set this up properly.
the reason for this is most likely docker in lima ubuntu VM using cgroup v2, which causes kind network to land in a separate net namespace (but that's a guess).
You may have some eBPF component in the path (which are attached to cgroup2), which without unsharing cgroup2 will attach bits to your host namespace that were meant to go on nodes, thus creating incidental routability. I had a similar issue forwarding ports in kind with Cilium.
Yeah, same issue here. brew install
doesn't support kind@0.19.0
so I had to install it through the go approach. Running go install sigs.k8s.io/kind@v0.19.0
seems to have temporarily fixed the issue.
Yup did same.
I can confirm it works on kind@0.19.0 and fails to work on kind@0.20.0 when using colima.
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
Switching to an Ubuntu image with regular lima
instead of colima
worked for me:
limactl start template://docker
FYI, same error when using rancher-desktop
$ docker info
Client:
Context: rancher-desktop
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc., v0.11.0)
compose: Docker Compose (Docker Inc., v2.19.0)
$ kind create cluster
Creating cluster "kind" ...
β Ensuring node image (kindest/node:v1.27.3) πΌ
β Preparing nodes π¦
Deleted nodes: ["kind-control-plane"]
ERROR: failed to create cluster: command "docker run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:64634:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.3@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72" failed with error: exit status 125
Command Output: d27129e82d852cf6a2e43132ed42b147e5a7a47a518a6bb528f53f7194bbc659
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
Yes this is known. The root issue is alpine Linux used by Colima and Rancher Desktop appears to have broken cgroups which is not overly surprising given the unusual init system. https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1632333425
This issue doesn't appear to be limited to kind and similar errors are happening with buildx. I remain hopeful that Colima, Rancher Desktop, or Alpine will fix this as it doesn't appear to be an issue on other hosts excepting a few with very very old kernels (RHEL7) and doesn't appear to be limited to kind.
Alpine is unlikely to start using systemd, but maybe they can find a way to still support cgroups v2 (somehow)
Colima and rancher desktop should also reconsider alpine for the purposes of running containers. See also: DNS issues with the simple muslc resolver. I have brought this up with at least Colima already
But switching to systemd isn't necessary if the existing init is fixed. We're not depending on anything systemd specific, just working cgroupns. However systemd is the best tested and would be my recommendation.
I think I will stick with Ubuntu LTS for the default kubeadm template (k8s.yaml), even if Debian is also a possibility.
I read somewhere that Rancher might switch to some other distribution for the VM, maybe OpenSUSE.
@BenTheElder what are the current options on Mac given that colima and rancher-desktop are based on Alpine and don't support cgroup v2? is it just pinning kind to v0.19.0 and waiting for one of these projects to fix the issue?
@BenTheElder what are the current options on Mac given that colima and rancher-desktop are based on Alpine and don't support cgroup v2? is it just pinning kind to v0.19.0 and waiting for one of these projects to fix the issue?
The tool both colima and rancher-desktop are built on, lima
, supports other distros / templates, and should work fine. Aside from e.g. docker desktop or running docker in other VM tools that are not pinned to Alpine. Podman desktop also supports kind, though kind needs some improvements around podman still.
limactl start template://docker
should work https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1680876276
Sticking to kind 0.19 is also reasonable in the short term, and we'll want an answer here before 0.21. EDIT: Currently I'd recommend using lima instead.
The most desirable outcome is a fix in rancher desktop / colima so we can continue to roll forward. Enabling cgroupns helps us deal with issues like https://github.com/kubernetes-sigs/kind/issues/3223 / keeping compatibility between the layered container runtimes. cgroups v2 is an even stronger fix but we have no immediate plans to require that as v1 + cgroupns gets us most of the way there.
If we can't get a fix in rancher desktop / colima, we are considering a fallback to no cgroupns when cgroup v1 + cgroupns container create fails, with a warning because this won't be well tested / supported and may leave other difficult to resolve issues like lingering problems with https://github.com/kubernetes-sigs/kind/issues/3223.
@aojea and I are very aware of this problem, for the k8s 1.28 release we made new images available to both kind 0.19 and 0.20 as a small stopgap related to this issue (see the updated release notes, also announced in #kind slack.k8s.io).
Lima has support for running containerd, and Docker, and Podman, and Kubernetes out-of-the-box...
It was deemed unnecessary to have a all-in-one example of kind (or k3d), in addition to kubeadm (and k3s).
But that is also possible, if you want to run kind but don't have access to Docker Engine or Podman Engine:
It was deemed unnecessary to have a all-in-one example of kind (or k3d), in addition to kubeadm (and k3s).
Right, colima and rancher desktop don't have or need kind specific examples either to my knowledge.
kind just needs docker (or podman), so just the example for running docker with a functioning VM guest distro is sufficient.
The standard docker template currently uses ubuntu and is reported to work fine in an earlier comment https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1680876276, as I understand it
Depending on your use case, it may make sense to use the kubeadm or K3s templates instead, but that's a little out of scope here π
limactl start template://docker
is briefly mentioned in https://github.com/lima-vm/lima#advanced-usage, and the output of that command will give info on how to use docker CLI with it, which is all kind needs. https://github.com/lima-vm/lima/blob/7b7b84a7983a7c26138660ad2db6ca9269963894/examples/docker.yaml#L80-L85
P.S. Thanks for your contributions, lima is a cool project :-)
the output of that command will give info on how to use docker CLI with it, which is all kind needs.
You can use the docker.lima
(or podman.lima
, or kubectl.lima
) wrappers to do all the setup for you.
Colima and rancher desktop should also reconsider alpine for the purposes of running containers. See also: DNS issues with the simple muslc resolver. I have brought this up with at least Colima already
DNS over TCP was added to musl and shipped in Alpine 3.18, and it supposedly involved a lot of convincing work upstream. I'm sure the "alpine bad" sentiment will survive it by at least half a decade though.
https://www.openwall.com/lists/musl/2023/05/02/1 https://www.alpinelinux.org/posts/Alpine-3.18.0-released.html
I didn't say "alpine bad" π, I am not recommending it for running containers. I'm sure it's an interesting choice for other purposes.
It remains a non-recommended distro for running containers. Kubernetes, podman, docker, runc, crun, and the rest of the ecosystem can only afford to run and maintain so much CI and alpine and its unusual choices are not included and as evidenced by this thread remain broken for this purpose while other distros are not.
EDIT: A working cgroups environment is a hard requirement for KIND, and the responsibility of the distro/kernel/init.
lima + other popular distros (Ubuntu, Debian, Fedora, ...) provide this. We generally haven't seen people using alpine to host container workloads until rancher desktop / colima became popular, and there have been good reasons not to choose it for this task.
cgroupns=private is something we've been working around for years, with the recent runc skew issues we were forced to re-evaluate this, and this improves reliability and makes kind more maintainable on every other distro[^1].
[^1]: RHEL 7 is also broken by way of being too out of date, but RHEL 8 works and has been out for a while and we probably won't be supporting RHEL with the recent changes there anyhow.
I've been able to switch Alpine to use the unified cgroups v2 layout, which seems to fix the buildkitd
issue.
And it fixes the initial problem with kind
as well, but fails with a different problem right after:
$ docker logs kind-control-plane
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v2
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: setting iptables to detected mode: legacy
INFO: detected IPv4 address: 172.18.0.2
INFO: detected IPv6 address: fc00:f853:ccd:e793::2
INFO: starting init
systemd 247.3-7+deb11u2 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.
Welcome to Debian GNU/Linux 11 (bullseye)!
Set hostname to <kind-control-plane>.
Failed to create /init.scope control group: Operation not supported
Failed to allocate manager object: Operation not supported
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v2
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: setting iptables to detected mode: legacy
INFO: detected IPv4 address: 172.18.0.2
INFO: detected old IPv4 address: 172.18.0.2
INFO: detected IPv6 address: fc00:f853:ccd:e793::2
INFO: detected old IPv6 address: fc00:f853:ccd:e793::2
INFO: starting init
systemd 247.3-7+deb11u2 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.
Welcome to Debian GNU/Linux 11 (bullseye)!
Set hostname to <kind-control-plane>.
Failed to create /init.scope control group: Operation not supported
Failed to allocate manager object: Operation not supported
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
I guess the issue is that cgroups are not writable inside the container.
I guess the issue is that cgroups are not writable inside the container.
yeah, that would be a problem. kind supports cgroups v2 but must be able to write to cgroups. --privileged should be ensuring that and with cgroupns the cgroups should appear as if the root but actually be under the node container from the host side.
v2 always has cgroupns enabled in docker/podman AFAIK. We'd love to see v2 become the norm as the unified hierarchy is a lot less confusing for "nested" and also eliminates the runc awareness-of-controllers skew issue entirely.
I've been able to switch Alpine to use the unified cgroups v2 layout, which seems to fix the buildkitd issue. [...]
When we can't even docker run
a container because it fails during setting up the cgroups or similar I'm going to punt to the distro/kernel/init/..., but failing in the entrypoint script is another matter, at that point kind is doing funky things and may need patching.
If this becomes readily runnable somewhere, we can try to investigate.
If this becomes readily runnable somewhere, we can try to investigate.
You can edit /etc/rc.conf
and set rc_cgroup_mode="unified"
, and then reboot the VM. Afterwards you should have the v2 layout.
On Rancher Desktop you can run
rdctl shell sudo sed -E -i 's/#(rc_cgroup_mode).*/\1="unified"/' /etc/rc.conf
And then restart Rancher Desktop and verify the layout
$ rdctl shell ls /sys/fs/cgroup
acpid cgroup.subtree_control docker
cgroup.controllers cgroup.threads io.stat
cgroup.max.depth cpu.stat lima-guestagent
cgroup.max.descendants cpuset.cpus.effective memory.reclaim
cgroup.procs cpuset.mems.effective memory.stat
cgroup.stat crond sshd
The same should be true for lima
and colima
, but I haven't tested it.
colima seems to be broken too ( https://github.com/abiosoft/colima/issues/792 )
I have tried therc_cgroup_mode
fix on colima but this didn't fix it. I am still getting following error when it is trying to build an image:
runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument
The full log:
mac-jan:my-question-generator jan$ make all
docker-compose -f docker-compose.yml -p my-qg up -d --build
[+] Building 1.6s (7/8)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 736B 0.0s
=> [internal] load metadata for docker.io/library/python:3.10-slim 1.3s
=> [auth] library/python:pull token for registry-1.docker.io 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> CACHED [1/3] FROM docker.io/library/python:3.10-slim@sha256:cc91315c3561d0b87d0525cb814d430cfbc70f10ca54577def184da80e87c1db 0.0s
=> => resolve docker.io/library/python:3.10-slim@sha256:cc91315c3561d0b87d0525cb814d430cfbc70f10ca54577def184da80e87c1db 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 140B 0.0s
=> ERROR [2/3] RUN apt-get update -y && apt-get install -y git nano wget && pip install --upgrade pip 0.2s
------
> [2/3] RUN apt-get update -y && apt-get install -y git nano wget && pip install --upgrade pip:
#0 0.149 runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument
------
failed to solve: process "/bin/sh -c apt-get update -y && apt-get install -y git nano wget && pip install --upgrade pip" did not complete successfully: exit code: 1
make: *** [all] Error 17
mac-jan:my-question-generator jan$
FYI my rc_cgroup_mode settings (note that I did restart colima after making the changes).
mac-jan:my-question-generator jan$ colima ssh
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$ grep cgroup_mode /etc/rc.conf
#rc_cgroup_mode="hybrid"
rc_cgroup_mode="unified"
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$
@janvda Please run ls /sys/fs/cgroup
after restarting colima to verify that you have the cgroup 2 layout now. It is possible that something else in the image is overriding the rc.conf
setting.
If this becomes readily runnable somewhere, we can try to investigate.
@BenTheElder Have you been able to replicate the setup using Rancher Desktop or do you need more information from me?
@janvda Please run
ls /sys/fs/cgroup
after restarting colima to verify that you have the cgroup 2 layout now. It is possible that something else in the image is overriding therc.conf
setting.
mac-jan:my-question-generator jan$ colima ssh ls /sys/fs/group
ls: /sys/fs/group: No such file or directory
FATA[0000] exit status 1
mac-jan:my-question-generator jan$ colima ssh
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$ ls -l /sys/fs
total 0
dr-xr-xr-x 2 root root 0 Aug 31 07:21 bpf
drwxr-xr-x 23 root root 460 Aug 31 06:52 cgroup
drwxr-xr-x 4 root root 0 Aug 31 07:21 ext4
drwxr-xr-x 3 root root 0 Aug 31 07:21 fuse
drwxr-x--- 2 root root 0 Aug 31 06:52 pstore
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$
mac-jan:my-question-generator jan$ colima ssh ls /sys/fs/group ls: /sys/fs/group: No such file or directory
@janvda The directory is called cgroup
, not group
@BenTheElder Have you been able to replicate the setup using Rancher Desktop or do you need more information from me?
Thanks, I'm able to replicate it but haven't had time to root cause yet. At a glance nothing kind is doing jumps out and the cgroup mount appears rw but systemd failes to create cgroups.
We have kind working on other cgroupsv2 hosts, but none are using openRC.
What happened:
After updating to v0.20.0 I cannot create a cluster anymore.
I'm using Mac with colima
What you expected to happen:
No error and cluster creates successfully
How to reproduce it (as minimally and precisely as possible):
Environment:
kind version: (use
kind version
): v0.20.0Runtime info: (use
docker info
orpodman info
):OS (e.g. from
/etc/os-release
): Mac OS with colima VM./etc/os-release
from within the VM that hosts the docker daemon: