kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.1k stars 1.51k forks source link

cannot create cluster on M1 with amd64 image #2993

Closed qrtt1 closed 1 year ago

qrtt1 commented 1 year ago

What happened:

$ kind create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"

What you expected to happen:

There are no errors and the cluster was created.

How to reproduce it (as minimally and precisely as possible):

Run on m1

Anything else we need to know?:

I found the error messages expect a cgroup v1, but my docker uses cgroup v2. Is it be supported now?

Environment:

Server: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 62 Server Version: 20.10.14 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: 3df54a852345ae127d1fa3092b95168e4a88e2f8 runc version: v1.0.3-0-gf46b6ba init version: de40ad0 Security Options: seccomp Profile: default cgroupns Kernel Version: 5.10.104-linuxkit Operating System: Docker Desktop OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 9.952GiB Name: docker-desktop ID: ZWVT:2FQD:VLYG:LJNP:FTMD:43CI:TZGO:ZQMC:RAES:OXO2:6JFH:POXZ Docker Root Dir: /var/lib/docker Debug Mode: false HTTP Proxy: http.docker.internal:3128 HTTPS Proxy: http.docker.internal:3128 No Proxy: hubproxy.docker.internal Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: hubproxy.docker.internal:5000 127.0.0.0/8 Live Restore Enabled: false


- OS (e.g. from `/etc/os-release`):

$ sw_vers ProductName: macOS ProductVersion: 12.1 BuildVersion: 21C52

qrtt1 commented 1 year ago

Find a workaround in the docker desktop release notes: https://docs.docker.com/desktop/release-notes/#bug-fixes-and-minor-changes-20

Added a deprecated option to settings.json: "deprecatedCgroupv1": true, which switches the Linux environment back to cgroups v1. If your software requires cgroups v1, you should update it to be compatible with cgroups v2. Although cgroups v1 should continue to work, it is likely that some future features will depend on cgroups v2. It is also possible that some Linux kernel bugs will only be fixed with cgroups v2.

$ docker info|grep Cg
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
BenTheElder commented 1 year ago

KIND works with cgroups v2 but only for Kubernetes 1.19+ because Kubernetes doesn't before then.

The error message is about looking for the node startup to become ready, if you can run again in the broken mode with kind create cluster --retain to prevent cleanup and then share the files from kind export logs that would be helpful (after which you can kind delete cluster to cleanup

BenTheElder commented 1 year ago

since you're on M1, is it possible this was was #2718 ?

qrtt1 commented 1 year ago

I tried the retain and export logs

./kind-darwin-arm64_v0.17.0 create cluster --retain -v 6
Creating cluster "kind" ...
DEBUG: docker/images.go:58] Image: kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1 present locally
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
Stack Trace:
sigs.k8s.io/kind/pkg/errors.Errorf
    sigs.k8s.io/kind/pkg/errors/errors.go:41
sigs.k8s.io/kind/pkg/cluster/internal/providers/common.WaitUntilLogRegexpMatches
    sigs.k8s.io/kind/pkg/cluster/internal/providers/common/cgroups.go:84
sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.createContainerWithWaitUntilSystemdReachesMultiUserSystem
    sigs.k8s.io/kind/pkg/cluster/internal/providers/docker/provision.go:407
sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.planCreation.func2
    sigs.k8s.io/kind/pkg/cluster/internal/providers/docker/provision.go:115
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
    sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
    runtime/asm_arm64.s:1172

But the container has dead:

docker ps -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                      PORTS     NAMES
043314367c2e   kindest/node:v1.25.3   "/usr/local/bin/entr…"   54 seconds ago   Exited (1) 51 seconds ago             kind-control-plane

It is not possible to exec:

kind export logs
Exporting logs for cluster "kind" to:
/private/var/folders/m7/4k8xr3ls28l7tvszc7ss9y7c0000gp/T/3766952669
ERROR: [command "docker exec --privileged kind-control-plane sh -c 'tar --hard-dereference -C /var/log/ -chf - . || (r=$?; [ $r -eq 1 ] || exit $r)'" failed with error: exit status 1, [command "docker exec --privileged kind-control-plane journalctl --no-pager -u kubelet.service" failed with error: exit status 1, command "docker exec --privileged kind-control-plane cat /kind/version" failed with error: exit status 1, command "docker exec --privileged kind-control-plane journalctl --no-pager -u containerd.service" failed with error: exit status 1, command "docker exec --privileged kind-control-plane crictl images" failed with error: exit status 1, command "docker exec --privileged kind-control-plane journalctl --no-pager" failed with error: exit status 1]]
qrtt1 commented 1 year ago

since you're on M1, is it possible this was was #2718 ?

I tried

$ unset DOCKER_DEFAULT_PLATFORM
$ ./kind-darwin-arm64_v0.17.0 create cluster --retain -v 6
Creating cluster "kind" ...
DEBUG: docker/images.go:58] Image: kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1 present locally
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
Stack Trace:
sigs.k8s.io/kind/pkg/errors.Errorf
    sigs.k8s.io/kind/pkg/errors/errors.go:41
sigs.k8s.io/kind/pkg/cluster/internal/providers/common.WaitUntilLogRegexpMatches
    sigs.k8s.io/kind/pkg/cluster/internal/providers/common/cgroups.go:84
sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.createContainerWithWaitUntilSystemdReachesMultiUserSystem
    sigs.k8s.io/kind/pkg/cluster/internal/providers/docker/provision.go:407
sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.planCreation.func2
    sigs.k8s.io/kind/pkg/cluster/internal/providers/docker/provision.go:115
sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1
    sigs.k8s.io/kind/pkg/errors/concurrent.go:30
runtime.goexit
    runtime/asm_arm64.s:1172
qrtt1 commented 1 year ago

I have done tests between v0.12.0 and v0.17.0 . It might be broken after v0.15.0

(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.17.0 create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.17.0 version
kind v0.17.0 go1.19.2 darwin/arm64
(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.16.0 create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.25.2) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.16.0 version
kind v0.16.0 go1.19.1 darwin/arm64
(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.15.0 create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.25.0) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.15.0 version
kind v0.15.0 go1.19 darwin/arm64
./kind-darwin-arm64_v0.14.0 create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.24.0) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦
 βœ“ Writing configuration πŸ“œ
 βœ“ Starting control-plane πŸ•ΉοΈ
 βœ“ Installing CNI πŸ”Œ
 βœ“ Installing StorageClass πŸ’Ύ
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊

(⎈ |kind-kind:default)(base) ➜  bin ./kind-darwin-arm64_v0.14.0 version
kind v0.14.0 go1.18.2 darwin/arm64
(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.13.0 create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.24.0) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦
 βœ“ Writing configuration πŸ“œ
 βœ“ Starting control-plane πŸ•ΉοΈ
 βœ“ Installing CNI πŸ”Œ
 βœ“ Installing StorageClass πŸ’Ύ
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Not sure what to do next? πŸ˜…  Check out https://kind.sigs.k8s.io/docs/user/quick-start/
(⎈ |kind-kind:default)(base) ➜  bin ./kind-darwin-arm64_v0.13.0 version
kind v0.13.0 go1.18 darwin/arm64
(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.12.0 create cluster
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.23.4) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦
 βœ“ Writing configuration πŸ“œ
 βœ“ Starting control-plane πŸ•ΉοΈ
 βœ“ Installing CNI πŸ”Œ
 βœ“ Installing StorageClass πŸ’Ύ
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊
(⎈ |kind-kind:default)(base) ➜  bin ./kind-darwin-arm64_v0.12.0 version
kind v0.12.0 go1.17.8 darwin/arm64
(⎈ |kind-kind:default)(base) ➜  bin
BenTheElder commented 1 year ago

It is not possible to exec:

It should still produce valuable logs

BenTheElder commented 1 year ago

v0.14.0..v0.15.0 doesn't present much related to node bringup, there's some changes on ZFS, and with rebooting existing nodes, and changes not related to starting clusters (like optimizing kind load ...).

qrtt1 commented 1 year ago

It is not possible to exec:

It should still produce valuable logs

does any suggestions to make valuable logs ?

aojea commented 1 year ago

kind export logs

qrtt1 commented 1 year ago

kind export logs

I tried, but get nothing

(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.17.0 create cluster --retain
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 βœ— Preparing nodes πŸ“¦
ERROR: failed to create cluster: could not find a log line that matches "Reached target .*Multi-User System.*|detected cgroup v1"
(⎈ |N/A:default)(base) ➜  bin docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.17.0 export logs
Exporting logs for cluster "kind" to:
/private/var/folders/m7/4k8xr3ls28l7tvszc7ss9y7c0000gp/T/3251996515
ERROR: [command "docker exec --privileged kind-control-plane sh -c 'tar --hard-dereference -C /var/log/ -chf - . || (r=$?; [ $r -eq 1 ] || exit $r)'" failed with error: exit status 1, [command "docker exec --privileged kind-control-plane cat /kind/version" failed with error: exit status 1, command "docker exec --privileged kind-control-plane journalctl --no-pager" failed with error: exit status 1, command "docker exec --privileged kind-control-plane journalctl --no-pager -u containerd.service" failed with error: exit status 1, command "docker exec --privileged kind-control-plane crictl images" failed with error: exit status 1, command "docker exec --privileged kind-control-plane journalctl --no-pager -u kubelet.service" failed with error: exit status 1]]
(⎈ |N/A:default)(base) ➜  bin docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
(⎈ |N/A:default)(base) ➜  bin docker ps -a
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                      PORTS     NAMES
d4d310349650   kindest/node:v1.25.3   "/usr/local/bin/entr…"   33 seconds ago   Exited (1) 30 seconds ago             kind-control-plane
BenTheElder commented 1 year ago

Even though it prints an error it should have collected the other logs. Check the directory printed

BenTheElder commented 1 year ago

Exporting logs for cluster "kind" to: /private/var/folders/m7/4k8xr3ls28l7tvszc7ss9y7c0000gp/T/3251996515

that part, that path,

qrtt1 commented 1 year ago

Exporting logs for cluster "kind" to: /private/var/folders/m7/4k8xr3ls28l7tvszc7ss9y7c0000gp/T/3251996515

that part, that path,

The path is coming :D 3251996515.zip

BenTheElder commented 1 year ago

Sorry yesterday was also Kubernetes 1.26 code freeze.

So from the node serial log:

INFO: setting iptables to detected mode: legacy iptables-save v1.8.7 (legacy): Cannot initialize: iptables who? (do you need to insmod?)

We usually see this when the container is of the wrong architecture ...

BenTheElder commented 1 year ago

$ unset DOCKER_DEFAULT_PLATFORM $ ./kind-darwin-arm64_v0.17.0 create cluster --retain -v 6 Creating cluster "kind" ... DEBUG: docker/images.go:58] Image: kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1 present locally

In this case the image was already pulled though from a previous attempt, which would have resolved it to the desired architecture IIRC, if DOCKER_DEFAULT_PLATFORM that may still have left it in a broken state.

Cann you delete the image or better yet pull it again like docker pull --platform=linux/arm64 kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d then run create cluster with DOCKER_DEFAULT_PLATFORM not set.

BenTheElder commented 1 year ago

or better yet kind create cluster --image=kindest/node@sha256:af315b64716448015172000226416dfc6eb7eb66efa5451b4fce198a22b39c90 for specifically the arm64 version of kindest/node:v1.25.3@sha256:f52781bc0d7a19fb6c405c2af83abfeb311f130707a0e219175677e366cc45d1

qrtt1 commented 1 year ago

./kind-darwin-arm64_v0.17.0 create cluster --retain -v 6

It works !!!

(⎈ |N/A:default)(base) ➜  bin ./kind-darwin-arm64_v0.17.0 create cluster --retain
Creating cluster "kind" ...
 βœ“ Ensuring node image (kindest/node:v1.25.3) πŸ–Ό
 βœ“ Preparing nodes πŸ“¦
 βœ“ Writing configuration πŸ“œ
 βœ“ Starting control-plane πŸ•ΉοΈ
 βœ“ Installing CNI πŸ”Œ
 βœ“ Installing StorageClass πŸ’Ύ
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community πŸ™‚
(⎈ |kind-kind:default)(base) ➜  bin k get node
NAME                 STATUS     ROLES           AGE   VERSION
kind-control-plane   NotReady   control-plane   12s   v1.25.3
(⎈ |kind-kind:default)(base) ➜  bin
BenTheElder commented 1 year ago

OK, so this is probably a variant on https://github.com/kubernetes-sigs/kind/issues/2718, and cgroupsv2 is a red herring πŸ˜…

BenTheElder commented 1 year ago

we'll roll up into #2718 for further discussion on which approach to take for mitigating or at least warning about this