kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.45k stars 1.56k forks source link

Rancher-Desktop [Alpine] can't create cluster with v0.20.0 [Previously Also Colima] #3277

Closed pmalek closed 9 months ago

pmalek commented 1 year ago

What happened:

After updating to v0.20.0 I cannot create a cluster anymore.

I'm using Mac with colima

Creating cluster "colima" ...
 βœ“ Ensuring node image (kindest/node:v1.27.2) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
Deleted nodes: ["colima-control-plane"]
ERROR: failed to create cluster: command "docker run --name colima-control-plane --hostname colima-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=colima --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:52490:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.2@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72" failed with error: exit status 125
Command Output: 3236752928bc442ebdaf6bd3b6b164643987d45b1a120ec3cd20ca14cc7f5dd7
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.

What you expected to happen:

No error and cluster creates successfully

How to reproduce it (as minimally and precisely as possible):

  1. Try to create cluster with kind v0.20.0

Environment:

aojea commented 1 year ago

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.

@BenTheElder @AkihiroSuda ^^^

BenTheElder commented 1 year ago

EDIT: updating this early comment to note that Colima is fixed via https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1807235030, just upgrade to v0.6.0 colima


This is an issue with the host environment presumably with --cgroupns=private.

colima is @abiosoft

BenTheElder commented 1 year ago

I still don't recommend alpine / openrc for container hosts vs essentially any distro with systemd.

It's unfortunate that we can't even start the container with these options.

you could probably more immediately work around this by using lima with an Ubuntu guest VM

wzshiming commented 1 year ago

Oh, I'm having the same problem, my environment is in GithubAction that using colima to start docker on MacOS runner.

https://github.com/kubernetes-sigs/kwok/actions/runs/5279627795/jobs/9551621894?pr=654#step:14:95

pmalek commented 1 year ago

@BenTheElder I've tried with ubuntu layer (colima has this flag: --layer to use it) and I'm getting this:

$ colima ssh cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=23.04
DISTRIB_CODENAME=lunar
DISTRIB_DESCRIPTION="Ubuntu 23.04"
$colima ssh -- uname -a
Linux colima 6.1.29-0-virt #1-Alpine SMP Wed, 17 May 2023 14:22:15 +0000 aarch64 aarch64 aarch64 GNU/Linux
$ docker run --name colima-control-plane --hostname colima-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=colima --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:54688:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.2@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72
9cc1f3da207bb97b37630eb842cc5137ac52c714ff20b6fecfc1e824e5d0d0b6
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
$ docker info
Client:
 Version:    24.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.19
    Path:     /usr/local/lib/docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.4
    Path:     /usr/local/lib/docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-scan
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  v0.12.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-scout

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 1
 Server Version: 23.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1fbd70374134b891f97ce19c70b6e50c7b9f4e0d
 runc version: 860f061b76bb4fc671f0f9e900f7d80ff93d4eb7
 init version: 
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 6.1.29-0-virt
 Operating System: Alpine Linux v3.18
 OSType: linux
 Architecture: aarch64
 CPUs: 6
 Total Memory: 7.754GiB
 Name: colima
 ID: b3c96bfd-b99b-44bc-b950-9b9109012530
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: USER
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

These are the cgroup mounts inside the VM:

mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,inode64)
openrc on /sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
net_cls on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup_root on /host/sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755,inode64)
openrc on /host/sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
none on /host/sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cpuset on /host/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /host/sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /host/sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /host/sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /host/sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /host/sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /host/sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
net_cls on /host/sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /host/sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /host/sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /host/sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /host/sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
tmpfs on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,size=4096k,nr_inodes=1024,mode=755,inode64)
openrc on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
cpuset on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
net_cls on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /host/run/containerd/io.containerd.runtime.v2.task/colima/2b274e7b947011e0f0513278d0245b6644c1760edc6cd81af8a72f172b2c4652/rootfs/sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
none on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
tmpfs on /sys/fs/cgroup/systemd type tmpfs (rw,nosuid,nodev,noexec,relatime,inode64)
BenTheElder commented 1 year ago

uname is still showing alpine kernel and openrc is still showing up even though Ubuntu doesn't use it, I don't think that flag is changing the guest VM

BenTheElder commented 1 year ago

From the lima FAQ I think it only provides an Ubuntu userspace environment and doesn't allow customizing the underlying Guest OS / kernel / ... https://github.com/abiosoft/colima/blob/main/docs/FAQ.md#is-another-distro-supported

So I think colima will always be alpine / openrc unfortunately and subject to bugs like this.

See also past discussion https://github.com/abiosoft/colima/issues/291#issuecomment-1130470008 https://github.com/abiosoft/colima/issues/163 ...

I think https://github.com/lima-vm/lima/blob/master/examples/docker-rootful.yaml would be an Ubuntu + typical docker host env on lima.

BenTheElder commented 1 year ago

I'd also strongly recommend moving to a guest environment that uses cgroup v2 sooner than later, as the ecosystem is poised to drop v1 (I'd guess in the next year or so) and we can't do much about that.

Ubuntu, Debian, Docker desktop, Fedora, ... most linux environments have switched for some time now.

If we can't get this resolved with some patch to colima to enable working cgroups=private containers, we can consider reverting to not require cgroupns=private, but it adds back a third much more broken cgroups nesting environment (cgroup v1, host cgroupns) that we'd otherwise planned to phase out now that docker has supported cgroupns=private for a few years now and podman likewise (also the default on cgroups v2).

AkihiroSuda commented 1 year ago

From the lima FAQ I think it only provides an Ubuntu userspace environment and doesn't allow customizing the underlying Guest OS / kernel / ...

typo: s/lima/colima/ πŸ™‚

as the ecosystem is poised to drop v1 (I'd guess in the next year or so)

The ecosystem of runc, containerd, etc. isn't likely to drop v1 before 2029 (EL8 EOL).

BenTheElder commented 1 year ago

typo: s/lima/colima/ πŸ™‚

sorry, yes!

same comment suggests lima with ubuntu / docker guest πŸ˜…

The ecosystem of runc, containerd, etc. isn't likely to drop v1 before 2029 (EL8 EOL).

Kubernetes has been discussing it already and I believe systemd but it's good to know some of the others won't. πŸ˜…

AkihiroSuda commented 1 year ago

Kubernetes has been discussing it already

Is there a KEP?

ryancurrah commented 1 year ago

We also have a lot of DNS issues with Lima due to use Alpine. I really wish they would move away from a musl based operating system.

afbjorklund commented 1 year ago

We also have a lot of DNS issues with Lima due to use Alpine. I really wish they would move away from a musl based operating system.

Lima defaults to Ubuntu...

limactl start template://docker

Using Alpine is a choice by downstream, mostly for size reasons. I don't know of an apk distro using systemd/glibc instead of openrc/musl, but I suppose it is possible (or maybe use Debian, it is also smaller)

pmalek commented 1 year ago

I remember spending a lot of hours with lima due to network issues.

For instance trying to figure out if I can use lima now instead of colima: I create the VM from one of the examples that contain docker (https://github.com/lima-vm/lima/tree/master/examples) or via the above mentioned limactl start template://docker.

This works and I can create kind cluster when the docker socket is forwarded to the host.

For full context: I use metallb for LoadBalancer service (with some custom route and iptables command so that host traffic is forwarded to the VM and then kind's node.

Now, I'm not sure why (haven't found the place in code that would explain the difference between lima and colima) but when I create VMs with colima and then create the kind cluster inside it, I can see the kind network created:

details... ``` # uname -a Linux colima 6.1.29-0-virt #1-Alpine SMP Wed, 17 May 2023 14:22:15 +0000 aarch64 L $ docker inspect kind [ { "Name": "kind", "Id": "58c6efc261888b451fbf9bfbf0c53da9bd4f6bb48c74a45f8ffdfa56946da376", "Created": "2023-06-17T10:50:00.781737055Z", "Scope": "local", "Driver": "bridge", "EnableIPv6": true, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1" }, { "Subnet": "fc00:f853:ccd:e793::/64", "Gateway": "fc00:f853:ccd:e793::1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "7d7ac41ea6b906f18b4fd2fcc49caed4c541abc30012094718ab3e1886d9c8f9": { "Name": "test-control-plane", "EndpointID": "9b603f5e6fcd776515e6eacafb2a87c9cafd0d3e81d73a28d7497283833c11cf", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "fc00:f853:ccd:e793::2/64" } }, "Options": { "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.driver.mtu": "1500" }, "Labels": {} } ] ```

and the underlying network interface br-58c6efc26188 using 172.18.0.1/16 network: ( this can then be used by metallb to allocate IPs and I'll get traffic routed to the desired service)

ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:55:55:38:aa:84 brd ff:ff:ff:ff:ff:ff
    inet 192.168.5.15/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5055:55ff:fe38:aa84/64 scope link
       valid_lft forever preferred_lft forever
3: col0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:55:55:e7:7d:6d brd ff:ff:ff:ff:ff:ff
    inet 192.168.106.2/24 scope global col0
       valid_lft forever preferred_lft forever
    inet6 fd63:1468:4f87:231a:5055:55ff:fee7:7d6d/64 scope global dynamic flags 100
       valid_lft 2590839sec preferred_lft 603639sec
    inet6 fe80::5055:55ff:fee7:7d6d/64 scope link
       valid_lft forever preferred_lft forever
4: br-58c6efc26188: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 02:42:37:28:dd:56 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-58c6efc26188
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::42:37ff:fe28:dd56/64 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::1/64 scope link
       valid_lft forever preferred_lft forever
5: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:41:5a:79:67 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
7: veth471fc84@if6: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue master br-58c6efc26188 state UP
    link/ether 6e:e3:f6:39:c8:05 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::6ce3:f6ff:fe39:c805/64 scope link
       valid_lft forever preferred_lft forever

with lima I don't get that inferface even though kind network is created exactly the same way :

details... ``` $ uname - # I've tried with ubuntu 23.04 using kernel 6.2 as well and the same result Linux lima-docker 5.15.0-72-generic #79-Ubuntu SMP Tue Apr 18 16:53:43 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux $ docker inspect kind [ { "Name": "kind", "Id": "199d499b093a18902d1cba537d7a30f6f83fbd9d3bf6c79f07b25a72c6d1d969", "Created": "2023-06-17T12:07:55.999693706Z", "Scope": "local", "Driver": "bridge", "EnableIPv6": true, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1" }, { "Subnet": "fc00:f853:ccd:e793::/64" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "a1b869e75ea64adc53e59195c2f773f6fb08c2dee7cb01ce9e7981a76476a1fa": { "Name": "kong-test-control-plane", "EndpointID": "bc061de959a24e50bb8abbeac116b0f55472f8b682f37de9f19f688cff67e695", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "fc00:f853:ccd:e793::2/64" } }, "Options": { "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.driver.mtu": "1500" }, "Labels": {} } ] $ ip a 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:55:55:9a:a5:90 brd ff:ff:ff:ff:ff:ff altname enp0s2 inet 192.168.5.15/24 metric 100 brd 192.168.5.255 scope global dynamic eth0 valid_lft 85593sec preferred_lft 85593sec inet6 fec0::5055:55ff:fe9a:a590/64 scope site dynamic mngtmpaddr noprefixroute valid_lft 86322sec preferred_lft 14322sec inet6 fe80::5055:55ff:fe9a:a590/64 scope link valid_lft forever preferred_lft forever 3: docker0: mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:2f:ab:dc:63 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever ```

this way I can't get the traffic into the cluster using 172.18.0.1 network.

EDIT: the reason for this is most likely docker in lima ubuntu VM using cgroup v2, which causes kind network to land in a separate net namespace (but that's a guess). Not sure how could I then make the traffic get routed inside kind's network (and then its container).

$ sudo lsns --type=net
        NS TYPE NPROCS   PID USER      NETNSID NSFS COMMAND
4026531840 net     118     1 root   unassigned      /sbin/init
4026532237 net      12  3820 lima   unassigned      /proc/self/exe --net=slirp4netns --mtu=65520 --slirp4netns-sandbox=auto --slirp4netns-seccomp=auto --disable-host-loopback --port-driver=bu
4026532314 net      30  4404 lima   unassigned      /sbin/init
4026532406 net       1  5492 lima   unassigned      registry serve /etc/docker/registry/config.yml
4026532472 net       1  5628 lima   unassigned      registry serve /etc/docker/registry/config.yml
4026532543 net       2  6176 165534 unassigned      /pause
4026532602 net       2  6144 165534 unassigned      /pause
4026532665 net       2  6216 165533 unassigned      /pause
4026532724 net       2  6215 165534 unassigned      /pause
$ sudo nsenter -n --target 3820 ip a s br-ae7cbfeb3d9b
4: br-ae7cbfeb3d9b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:e8:51:b5:1f brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-ae7cbfeb3d9b
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::42:e8ff:fe51:b51f/64 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::1/64 scope link
       valid_lft forever preferred_lft forever
pmalek commented 1 year ago

As for the issue at hand:

I understand that with #3241 the ship might have already sailed but perhaps we might still consider using the provider info Cgroup2 field and set the --cgroupns flag only when cgroupv2 is available?

matteosilv commented 1 year ago

Same error happens with Rancher Desktop that is using lima under the hood

marcofranssen commented 1 year ago

Experiencing the same on Rancher Desktop. Downgrading to kind 0.19.0 fixes the issue for now.

Would be great to get a fix for 0.20.0.

The issue I see on Rancher Desktop using Kind 0.20.0 is the following:

$ kind create cluster --name test-cluster --image kindest/node:v1.27.3
Boostrapping cluster…
Creating cluster "test-cluster" ...
 βœ“ Ensuring node image (kindest/node:v1.27.3) πŸ–Ό
 βœ— Preparing nodes πŸ“¦  
Deleted nodes: ["eks-cluster-control-plane"]
ERROR: failed to create cluster: command "docker run --name test-cluster-control-plane --hostname test-cluster-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=test-cluster --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:50566:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.3" failed with error: exit status 125
Command Output: 82623b67d511c7e10ed075323e621ec66befa9047e3c7b56647ca99fd78e0db6
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
BenTheElder commented 1 year ago

Inability to create a container with this docker 20.10.0 feature from 2020-12-08 is still considered a bug in colima / rancher desktop. I'd like to hear a response from those projects before we revert anything. Ensuring private cgroupns is a big benefit for the project.

BenTheElder commented 1 year ago

I understand that with https://github.com/kubernetes-sigs/kind/pull/3241 the ship might have already sailed but perhaps we might still consider using the provider info Cgroup2 field and set the --cgroupns flag only when cgroupv2 is available?

The point of setting this flag is to ensure that this is set on cgroupv1 hosts. cgroupv2 hosts already default to this.

cgroupv1 hosts are the problem. On hosts other than apline/colima/rancher desktop this works great. Alpine and colima / rancher desktop use an unusual init system that doesn't seem to set this up properly.

acuteaura commented 1 year ago

the reason for this is most likely docker in lima ubuntu VM using cgroup v2, which causes kind network to land in a separate net namespace (but that's a guess).

You may have some eBPF component in the path (which are attached to cgroup2), which without unsharing cgroup2 will attach bits to your host namespace that were meant to go on nodes, thus creating incidental routability. I had a similar issue forwarding ports in kind with Cilium.

williamokano-dh commented 1 year ago

Yeah, same issue here. brew install doesn't support kind@0.19.0 so I had to install it through the go approach. Running go install sigs.k8s.io/kind@v0.19.0 seems to have temporarily fixed the issue.

marcofranssen commented 1 year ago

Yup did same.

newtondev commented 1 year ago

I can confirm it works on kind@0.19.0 and fails to work on kind@0.20.0 when using colima.

docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
benmoss commented 1 year ago

Switching to an Ubuntu image with regular lima instead of colima worked for me:

limactl start template://docker
the-gigi commented 1 year ago

FYI, same error when using rancher-desktop

$ docker info
Client:
 Context:    rancher-desktop
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.11.0)
  compose: Docker Compose (Docker Inc., v2.19.0)
$ kind create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.27.3) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
Deleted nodes: ["kind-control-plane"]
ERROR: failed to create cluster: command "docker run --name kind-control-plane --hostname kind-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:64634:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.3@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72" failed with error: exit status 125
Command Output: d27129e82d852cf6a2e43132ed42b147e5a7a47a518a6bb528f53f7194bbc659
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.
BenTheElder commented 1 year ago

Yes this is known. The root issue is alpine Linux used by Colima and Rancher Desktop appears to have broken cgroups which is not overly surprising given the unusual init system. https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1632333425

This issue doesn't appear to be limited to kind and similar errors are happening with buildx. I remain hopeful that Colima, Rancher Desktop, or Alpine will fix this as it doesn't appear to be an issue on other hosts excepting a few with very very old kernels (RHEL7) and doesn't appear to be limited to kind.

BenTheElder commented 1 year ago

https://github.com/rancher-sandbox/rancher-desktop/issues/5363

afbjorklund commented 1 year ago

Alpine is unlikely to start using systemd, but maybe they can find a way to still support cgroups v2 (somehow)

BenTheElder commented 1 year ago

Colima and rancher desktop should also reconsider alpine for the purposes of running containers. See also: DNS issues with the simple muslc resolver. I have brought this up with at least Colima already

BenTheElder commented 1 year ago

But switching to systemd isn't necessary if the existing init is fixed. We're not depending on anything systemd specific, just working cgroupns. However systemd is the best tested and would be my recommendation.

afbjorklund commented 1 year ago

I think I will stick with Ubuntu LTS for the default kubeadm template (k8s.yaml), even if Debian is also a possibility.

ryancurrah commented 1 year ago

I read somewhere that Rancher might switch to some other distribution for the VM, maybe OpenSUSE.

the-gigi commented 1 year ago

@BenTheElder what are the current options on Mac given that colima and rancher-desktop are based on Alpine and don't support cgroup v2? is it just pinning kind to v0.19.0 and waiting for one of these projects to fix the issue?

BenTheElder commented 1 year ago

@BenTheElder what are the current options on Mac given that colima and rancher-desktop are based on Alpine and don't support cgroup v2? is it just pinning kind to v0.19.0 and waiting for one of these projects to fix the issue?

The tool both colima and rancher-desktop are built on, lima, supports other distros / templates, and should work fine. Aside from e.g. docker desktop or running docker in other VM tools that are not pinned to Alpine. Podman desktop also supports kind, though kind needs some improvements around podman still.

limactl start template://docker should work https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1680876276

BenTheElder commented 1 year ago

Sticking to kind 0.19 is also reasonable in the short term, and we'll want an answer here before 0.21. EDIT: Currently I'd recommend using lima instead.

The most desirable outcome is a fix in rancher desktop / colima so we can continue to roll forward. Enabling cgroupns helps us deal with issues like https://github.com/kubernetes-sigs/kind/issues/3223 / keeping compatibility between the layered container runtimes. cgroups v2 is an even stronger fix but we have no immediate plans to require that as v1 + cgroupns gets us most of the way there.

If we can't get a fix in rancher desktop / colima, we are considering a fallback to no cgroupns when cgroup v1 + cgroupns container create fails, with a warning because this won't be well tested / supported and may leave other difficult to resolve issues like lingering problems with https://github.com/kubernetes-sigs/kind/issues/3223.

@aojea and I are very aware of this problem, for the k8s 1.28 release we made new images available to both kind 0.19 and 0.20 as a small stopgap related to this issue (see the updated release notes, also announced in #kind slack.k8s.io).

afbjorklund commented 1 year ago

Lima has support for running containerd, and Docker, and Podman, and Kubernetes out-of-the-box...

It was deemed unnecessary to have a all-in-one example of kind (or k3d), in addition to kubeadm (and k3s).

But that is also possible, if you want to run kind but don't have access to Docker Engine or Podman Engine:

BenTheElder commented 1 year ago

It was deemed unnecessary to have a all-in-one example of kind (or k3d), in addition to kubeadm (and k3s).

Right, colima and rancher desktop don't have or need kind specific examples either to my knowledge.

kind just needs docker (or podman), so just the example for running docker with a functioning VM guest distro is sufficient.

The standard docker template currently uses ubuntu and is reported to work fine in an earlier comment https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1680876276, as I understand it

Depending on your use case, it may make sense to use the kubeadm or K3s templates instead, but that's a little out of scope here πŸ˜…

limactl start template://docker is briefly mentioned in https://github.com/lima-vm/lima#advanced-usage, and the output of that command will give info on how to use docker CLI with it, which is all kind needs. https://github.com/lima-vm/lima/blob/7b7b84a7983a7c26138660ad2db6ca9269963894/examples/docker.yaml#L80-L85

P.S. Thanks for your contributions, lima is a cool project :-)

afbjorklund commented 1 year ago

the output of that command will give info on how to use docker CLI with it, which is all kind needs.

You can use the docker.lima (or podman.lima, or kubectl.lima) wrappers to do all the setup for you.

acuteaura commented 1 year ago

Colima and rancher desktop should also reconsider alpine for the purposes of running containers. See also: DNS issues with the simple muslc resolver. I have brought this up with at least Colima already

DNS over TCP was added to musl and shipped in Alpine 3.18, and it supposedly involved a lot of convincing work upstream. I'm sure the "alpine bad" sentiment will survive it by at least half a decade though.

https://www.openwall.com/lists/musl/2023/05/02/1 https://www.alpinelinux.org/posts/Alpine-3.18.0-released.html

BenTheElder commented 1 year ago

I didn't say "alpine bad" πŸ™„, I am not recommending it for running containers. I'm sure it's an interesting choice for other purposes.

It remains a non-recommended distro for running containers. Kubernetes, podman, docker, runc, crun, and the rest of the ecosystem can only afford to run and maintain so much CI and alpine and its unusual choices are not included and as evidenced by this thread remain broken for this purpose while other distros are not.

EDIT: A working cgroups environment is a hard requirement for KIND, and the responsibility of the distro/kernel/init.

lima + other popular distros (Ubuntu, Debian, Fedora, ...) provide this. We generally haven't seen people using alpine to host container workloads until rancher desktop / colima became popular, and there have been good reasons not to choose it for this task.

cgroupns=private is something we've been working around for years, with the recent runc skew issues we were forced to re-evaluate this, and this improves reliability and makes kind more maintainable on every other distro[^1].

[^1]: RHEL 7 is also broken by way of being too out of date, but RHEL 8 works and has been out for a while and we probably won't be supporting RHEL with the recent changes there anyhow.

jandubois commented 1 year ago

I've been able to switch Alpine to use the unified cgroups v2 layout, which seems to fix the buildkitd issue.

And it fixes the initial problem with kind as well, but fails with a different problem right after:

$ docker logs kind-control-plane
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v2
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: setting iptables to detected mode: legacy
INFO: detected IPv4 address: 172.18.0.2
INFO: detected IPv6 address: fc00:f853:ccd:e793::2
INFO: starting init
systemd 247.3-7+deb11u2 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.

Welcome to Debian GNU/Linux 11 (bullseye)!

Set hostname to <kind-control-plane>.
Failed to create /init.scope control group: Operation not supported
Failed to allocate manager object: Operation not supported
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...
INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v2
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: setting iptables to detected mode: legacy
INFO: detected IPv4 address: 172.18.0.2
INFO: detected old IPv4 address: 172.18.0.2
INFO: detected IPv6 address: fc00:f853:ccd:e793::2
INFO: detected old IPv6 address: fc00:f853:ccd:e793::2
INFO: starting init
systemd 247.3-7+deb11u2 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization docker.
Detected architecture x86-64.

Welcome to Debian GNU/Linux 11 (bullseye)!

Set hostname to <kind-control-plane>.
Failed to create /init.scope control group: Operation not supported
Failed to allocate manager object: Operation not supported
[!!!!!!] Failed to allocate manager object.
Exiting PID 1...

I guess the issue is that cgroups are not writable inside the container.

BenTheElder commented 1 year ago

I guess the issue is that cgroups are not writable inside the container.

yeah, that would be a problem. kind supports cgroups v2 but must be able to write to cgroups. --privileged should be ensuring that and with cgroupns the cgroups should appear as if the root but actually be under the node container from the host side.

v2 always has cgroupns enabled in docker/podman AFAIK. We'd love to see v2 become the norm as the unified hierarchy is a lot less confusing for "nested" and also eliminates the runc awareness-of-controllers skew issue entirely.

BenTheElder commented 1 year ago

I've been able to switch Alpine to use the unified cgroups v2 layout, which seems to fix the buildkitd issue. [...]

When we can't even docker run a container because it fails during setting up the cgroups or similar I'm going to punt to the distro/kernel/init/..., but failing in the entrypoint script is another matter, at that point kind is doing funky things and may need patching.

If this becomes readily runnable somewhere, we can try to investigate.

jandubois commented 1 year ago

If this becomes readily runnable somewhere, we can try to investigate.

You can edit /etc/rc.conf and set rc_cgroup_mode="unified", and then reboot the VM. Afterwards you should have the v2 layout.

On Rancher Desktop you can run

rdctl shell sudo sed -E -i 's/#(rc_cgroup_mode).*/\1="unified"/' /etc/rc.conf

And then restart Rancher Desktop and verify the layout

$ rdctl shell ls /sys/fs/cgroup
acpid                   cgroup.subtree_control  docker
cgroup.controllers      cgroup.threads          io.stat
cgroup.max.depth        cpu.stat                lima-guestagent
cgroup.max.descendants  cpuset.cpus.effective   memory.reclaim
cgroup.procs            cpuset.mems.effective   memory.stat
cgroup.stat             crond                   sshd

The same should be true for lima and colima, but I haven't tested it.

janvda commented 1 year ago

colima seems to be broken too ( https://github.com/abiosoft/colima/issues/792 )

I have tried therc_cgroup_mode fix on colima but this didn't fix it. I am still getting following error when it is trying to build an image:

runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument

The full log:

mac-jan:my-question-generator jan$ make all
docker-compose -f docker-compose.yml -p my-qg up -d --build
[+] Building 1.6s (7/8)                                                                                                                         
 => [internal] load build definition from Dockerfile                                                                                       0.0s
 => => transferring dockerfile: 736B                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/python:3.10-slim                                                                        1.3s
 => [auth] library/python:pull token for registry-1.docker.io                                                                              0.0s
 => [internal] load .dockerignore                                                                                                          0.0s
 => => transferring context: 2B                                                                                                            0.0s
 => CACHED [1/3] FROM docker.io/library/python:3.10-slim@sha256:cc91315c3561d0b87d0525cb814d430cfbc70f10ca54577def184da80e87c1db           0.0s
 => => resolve docker.io/library/python:3.10-slim@sha256:cc91315c3561d0b87d0525cb814d430cfbc70f10ca54577def184da80e87c1db                  0.0s
 => [internal] load build context                                                                                                          0.0s
 => => transferring context: 140B                                                                                                          0.0s
 => ERROR [2/3] RUN apt-get update -y &&     apt-get install -y git nano wget &&     pip install --upgrade pip                             0.2s
------
 > [2/3] RUN apt-get update -y &&     apt-get install -y git nano wget &&     pip install --upgrade pip:
#0 0.149 runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument
------
failed to solve: process "/bin/sh -c apt-get update -y &&     apt-get install -y git nano wget &&     pip install --upgrade pip" did not complete successfully: exit code: 1
make: *** [all] Error 17
mac-jan:my-question-generator jan$ 

FYI my rc_cgroup_mode settings (note that I did restart colima after making the changes).

mac-jan:my-question-generator jan$ colima ssh
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$ grep cgroup_mode /etc/rc.conf
#rc_cgroup_mode="hybrid"
rc_cgroup_mode="unified"
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$ 
jandubois commented 1 year ago

@janvda Please run ls /sys/fs/cgroup after restarting colima to verify that you have the cgroup 2 layout now. It is possible that something else in the image is overriding the rc.conf setting.

jandubois commented 1 year ago

If this becomes readily runnable somewhere, we can try to investigate.

@BenTheElder Have you been able to replicate the setup using Rancher Desktop or do you need more information from me?

janvda commented 1 year ago

@janvda Please run ls /sys/fs/cgroup after restarting colima to verify that you have the cgroup 2 layout now. It is possible that something else in the image is overriding the rc.conf setting.

mac-jan:my-question-generator jan$ colima ssh ls /sys/fs/group
ls: /sys/fs/group: No such file or directory
FATA[0000] exit status 1                                
mac-jan:my-question-generator jan$ colima ssh
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$ ls -l /sys/fs
total 0
dr-xr-xr-x    2 root     root             0 Aug 31 07:21 bpf
drwxr-xr-x   23 root     root           460 Aug 31 06:52 cgroup
drwxr-xr-x    4 root     root             0 Aug 31 07:21 ext4
drwxr-xr-x    3 root     root             0 Aug 31 07:21 fuse
drwxr-x---    2 root     root             0 Aug 31 06:52 pstore
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$ 
jandubois commented 1 year ago

mac-jan:my-question-generator jan$ colima ssh ls /sys/fs/group ls: /sys/fs/group: No such file or directory

@janvda The directory is called cgroup, not group

BenTheElder commented 1 year ago

@BenTheElder Have you been able to replicate the setup using Rancher Desktop or do you need more information from me?

Thanks, I'm able to replicate it but haven't had time to root cause yet. At a glance nothing kind is doing jumps out and the cgroup mount appears rw but systemd failes to create cgroups.

We have kind working on other cgroupsv2 hosts, but none are using openRC.