kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.45k stars 1.56k forks source link

Rancher-Desktop [Alpine] can't create cluster with v0.20.0 [Previously Also Colima] #3277

Closed pmalek closed 9 months ago

pmalek commented 1 year ago

What happened:

After updating to v0.20.0 I cannot create a cluster anymore.

I'm using Mac with colima

Creating cluster "colima" ...
 ✓ Ensuring node image (kindest/node:v1.27.2) đŸ–ŧ
 ✗ Preparing nodes đŸ“Ļ
Deleted nodes: ["colima-control-plane"]
ERROR: failed to create cluster: command "docker run --name colima-control-plane --hostname colima-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=colima --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:52490:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.2@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72" failed with error: exit status 125
Command Output: 3236752928bc442ebdaf6bd3b6b164643987d45b1a120ec3cd20ca14cc7f5dd7
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.

What you expected to happen:

No error and cluster creates successfully

How to reproduce it (as minimally and precisely as possible):

  1. Try to create cluster with kind v0.20.0

Environment:

janvda commented 1 year ago

mac-jan:my-question-generator jan$ colima ssh ls /sys/fs/group ls: /sys/fs/group: No such file or directory

@janvda The directory is called cgroup, not group

Sorry - my fault.

I have re-executed command using correct directory:

mac-jan:my-question-generator jan$ colima ssh
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$ ls /sys/fs/cgroup
acpid            cpuacct          docker           lima-guestagent  net_prio         perf_event       sshd
blkio            cpuset           freezer          memory           networking       pids             udev-postmount
cpu              devices          hugetlb          net_cls          openrc           qemu-binfmt      unified
colima:/Users/jan/Documents/15_iot/nuc/my-question-generator$ 
jandubois commented 1 year ago

That is still the "hybrid" layout. Not sure what colima is doing that breaks this. Is there a /etc/conf.d/cgroups file, and if yes, what is the content?

janvda commented 1 year ago

That is still the "hybrid" layout. Not sure what colima is doing that breaks this. Is there a /etc/conf.d/cgroups file, and if yes, what is the content?

No, there is no such file.

colima:/etc/conf.d$ ls
bootmisc      devfs         fsck          killprocs     logrotate     net-online    rdate         swap          udev-settle
consolefont   dmesg         hwclock       klogd         modloop       netmount      seedrng       swclock       udev-trigger
containerd    docker        ip6tables     loadkmap      modules       ntpd          sshd          syslog        watchdog
crond         ebtables      iptables      localmount    mtab          qemu-binfmt   staticroute   udev
colima:/etc/conf.d$ 
janvda commented 1 year ago

I discovered that it is possible to use ubuntu for colima (see colima FAQ) by following command: colima start --layer=true.

The command colima ssh more /etc/os-release shows that it is indeed running ubuntu. Here the output of that command:

PRETTY_NAME="Ubuntu 23.04"
NAME="Ubuntu"
VERSION_ID="23.04"
VERSION="23.04 (Lunar Lobster)"
VERSION_CODENAME=lunar
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=lunar
LOGO=ubuntu-logo

But this didn't fix the problem. I am still getting same error when building:

 => ERROR [2/3] RUN apt-get update -y &&     apt-get install -y git nano wget &&     pip install --upgrade pip                             0.2s
------
 > [2/3] RUN apt-get update -y &&     apt-get install -y git nano wget &&     pip install --upgrade pip:
#0 0.204 runc run failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/6), flags: 0xf, data: openrc: invalid argument

I am actually wondering if we are looking at the right location. The build is happening in the buildkit container (image moby/buildkit:buildx-stable-1). So I would think that the problem is in this container. I also think that the problem started when it pulled a new version of this image from docker-hub. Maybe this container image is broken/incompatible ?

BenTheElder commented 1 year ago

I discovered that it is possible to use ubuntu for colima (see colima FAQ) by following command: colima start --layer=true.

That's a userspace image on top of the VM, not the VM OS. You can see from your error message that you're still on openrc on the underlying cgroups.

You can start an ubuntu VM with https://github.com/lima-vm/lima instead (which colima is built on), please see previous comments https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1692178393.

janvda commented 1 year ago

You can start an ubuntu VM with https://github.com/lima-vm/lima instead (which colima is built on), please see previous comments #3277 (comment).

Thanks, switching to limactl start template://docker fixed my issue. I am now again able to build docker images without errors.

ivankatliarchuk commented 1 year ago

Do not want to duplicate issues. Running on MacOS Ventura 13.5.1.

Kind version

⚠ī¸  kind --version
> kind version 0.20.0

 $ kind create cluster --config=config/kind/main.yaml
>
Creating cluster "kind-local" ...
 ✓ Ensuring node image (kindest/node:v1.27.3) đŸ–ŧ
 ✗ Preparing nodes đŸ“Ļ đŸ“Ļ
Deleted nodes: ["kind-local-control-plane" "kind-local-worker"]
ERROR: failed to create cluster: command "docker run --name kind-local-control-plane --hostname kind-local-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=kind-local --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=0.0.0.0:30070:30080/TCP --publish=127.0.0.1:62681:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.3@sha256:3966ac761ae0136263ffdb6cfd4db23ef8a83cba8a463690e98317add2c9ba72" failed with error: exit status 125
Command Output: a7174e21d76791171c521a8b7fd09e4fd2122f8f602d0735204f58073478078f
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown.

Docker info

⚠ī¸  docker info
Client:
 Version:    24.0.2-rd
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.0
    Path:     /Users/ik/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.19.0
    Path:     /Users/ik/.docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 22
 Server Version: 23.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1fbd70374134b891f97ce19c70b6e50c7b9f4e0d
 runc version: 860f061b76bb4fc671f0f9e900f7d80ff93d4eb7
 init version:
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 6.1.32-0-virt
 Operating System: Alpine Linux v3.18
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 5.798GiB
 Name: lima-rancher-desktop
 ID: JL2Y:IUE7:SXIV:CD7T:LS7D:PUWN:PAUE:TB6O:ELJP:7JVT:K67A:OSBM
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Rollback to 0.19

$ go install sigs.k8s.io/kind@v0.19.0
$ kind --version
> kind version 0.19.0
$ kind create cluster --config=config/kind/main.yaml
> Creating cluster "kind-local" ...
 ✓ Ensuring node image (kindest/node:v1.27.1) đŸ–ŧ
 ✓ Preparing nodes đŸ“Ļ đŸ“Ļ
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹ī¸
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind-local"
You can now use your cluster with:

kubectl cluster-info --context kind-kind-local
abiosoft commented 11 months ago

Colima v0.6.0 supports kind https://github.com/abiosoft/colima/releases/tag/v0.6.0

BenTheElder commented 11 months ago

Thanks @abiosoft!

marcofranssen commented 11 months ago

@abiosoft does this mean it now also works with latest Rancher Desktop?

jandubois commented 11 months ago

@marcofranssen No, it does not. colima switched from Alpine to Ubuntu to avoid the issue, but Rancher Desktop still uses Alpine.

The best you can do on Rancher Desktop right now is to use k3d instead of kind. It should provide very similar functionality, but uses k3s instead of kubeadm internally.

AkihiroSuda commented 11 months ago

The best you can do on Rancher Desktop right now is to use k3d instead of kind. It should provide very similar functionality, but uses k3s instead of kubeadm internally.

Off-topic question, but why not use Rancher Desktop's Kubernetes? 😄 What are missing in Rancher Desktop's Kubernetes? (Setting custom feature gates, etc.?)

jandubois commented 11 months ago

Off-topic question, but why not use Rancher Desktop's Kubernetes? 😄

For me the only reason to use k3d is when I want to have a multi-node cluster to play around with pod placement strategies like taints and affinity, to make sure the manifests work as expected.

Eventually there should be a config setting in Rancher Desktop to allow multiple nodes. Personally I've also wanted a mixed-architecture cluster with both amd64 and arm64 nodes, but that is more for fun than actual need... 😄

BenTheElder commented 11 months ago

Multi-node is one of the common reasons I see versus the bundled k8s in containers-in-a-vm solutions, the other is more control over the k8s version used.

jandubois commented 11 months ago

the other is more control over the k8s version used.

You can pick any k8s (k3s) version you want in Rancher Desktop and you can also upgrade to any new version and see how it affects your deployed workloads:

CleanShot 2023-11-17 at 10 23 06@2x

I'm not actually sure if versions prior to 1.19 still work properly, but all the more recent releases should be fully functional.

mattfarina commented 11 months ago

To add one more data point to the issues with Alpine (under Rancher Desktop), this is the output that I get from kind after it fails to work...

INFO: ensuring we can execute mount/umount even with userns-remap
INFO: remounting /sys read-only
INFO: making mounts shared
INFO: detected cgroup v1
INFO: detected cgroupns
INFO: clearing and regenerating /etc/machine-id
Initializing machine ID from random generator.
INFO: faking /sys/class/dmi/id/product_name to be "kind"
INFO: faking /sys/class/dmi/id/product_uuid to be random
INFO: faking /sys/devices/virtual/dmi/id/product_uuid as well
INFO: setting iptables to detected mode: legacy
INFO: detected IPv4 address: 172.18.0.2
INFO: detected IPv6 address: fc00:f853:ccd:e793::2
INFO: starting init
Inserted module 'autofs4'
Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
[!!!!!!] Failed to mount API filesystems.
Exiting PID 1...
BenTheElder commented 11 months ago

Right, there's discussion of this above /sys/fs/cgroup, we should have permission to mount here in this privileged container so ... something is odd/broken in that environment.

I can't run rancher desktop at work (VM policy) so I'd appreciate others that use rancher desktop debugging this issue.

BenTheElder commented 10 months ago

Er and to clarify we have code specifically to ensure things run smoothly on non-systemd hosts:

https://github.com/kubernetes-sigs/kind/blob/5549e9178ed153788958b8b45053e8fa7d9d9d4d/images/base/files/usr/local/bin/entrypoint#L223

However, on these particular alpine based hosts we seem to be unable to make mounts, which doesn't make sense. With cgroupns enabled we're getting our own view of cgroups and with privileged we should have permission to make mounts (see e.g. the remount /sys ro earlier in the logs). It's possible we can't make this mount in any environment and receive it as a function of systemd being on the host on other hosts, this requires more root-cause debugging.

I still haven't had time to dig into this myself, currently focused on some follow-ups around https://kubernetes.io/blog/2023/08/31/legacy-package-repository-deprecation/, and this is somewhat outside of @aojea's usual wheelhouse.

In the meantime I recommend lima w/ ubuntu docker profile or colima as free alternatives to docker desktop that work with kind.

I would appreciate help in investigating this bug.

cgroupns will be default on cgroupsv2 hosts under all major container runtimes and is enabled for good reasons, so just reverting enabling cgroupns in an attempt to unbreak alpine isn't a very good option (note: rancher desktop is on v2 with cgroupns enabled by default now anyhow), but I'd love to see other suggested fixes or debugging work from anyone else invested in this support.

jandubois commented 9 months ago

Just wanted to give a quick heads-up that the issue seems to be fixed by Alpine 3.19 (most likely due to the update to OpenRC 0.51+, which has fixed the "unified" cgroups layout):

$ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.27.3) đŸ–ŧ
 ✓ Preparing nodes đŸ“Ļ
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹ī¸
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋

$ k get no
NAME                 STATUS     ROLES           AGE   VERSION
kind-control-plane   NotReady   control-plane   11s   v1.27.3

So this issue can probably be closed, unless you want to wait until a version of Rancher Desktop with Alpine 3.19 is out for verification. That is probably not going to happen until early March though.

aojea commented 9 months ago

So this issue can probably be closed, unless you want to wait until a version of Rancher Desktop with Alpine 3.19

/close

let's close it here, is not anything else we can do and you provided a solution

k8s-ci-robot commented 9 months ago

@aojea: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kind/issues/3277#issuecomment-1915685624): >> So this issue can probably be closed, unless you want to wait until a version of Rancher Desktop with Alpine 3.19 > >/close > >let's close it here, is not anything else we can do and you provided a solution Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
marcindulak commented 8 months ago

This issue is closed, but there is still an open issue in rancher desktop - it's hidden in the collapsed comments, so linking it here again https://github.com/rancher-sandbox/rancher-desktop/issues/5092

BenTheElder commented 2 months ago

Circling back, we have reports of rancher desktop + kind v0.23 working in https://kubernetes.slack.com/archives/CEKK1KTN2/p1723583621985329?thread_ts=1723579586.749849&cid=CEKK1KTN2

FYI @jandubois 🎉

NOTE: you may still run into issues from https://kind.sigs.k8s.io/docs/user/known-issues/, in this case with many clusters, tuning inotify limits was required https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files

(it might? be reasonable to bump the defaults in rancher desktop 😅)