Closed caniko closed 2 years ago
@caniko can you provide a little more detail or context? Kind should work fine with cgroupv2.
There were some older releases that did not. Can you provide the version (kind version
) you're using?
cgroupv2 is supported, but Kubernetes does not support it before v1.19
https://kind.sigs.k8s.io/docs/user/known-issues/#failure-to-create-cluster-with-cgroups-v2
KIND has has cgroupv2 since v0.10.0 / January 2021, not long after Kubernetes, with some fixes since then https://github.com/kubernetes-sigs/kind/releases/tag/v0.10.0
We test cgroupsv2 on every PR.
Oh, it says cgroup v2 issues on the website. I understand that I interpreted it wrongly now.
kind version: kind v0.14.0 go1.18.2 linux/amd64
I am on Arch Linux, and behind a proxy. Docker and K8 runs fine. I succesfully ran minikube in kvm2 mode. Could the proxy be blocking the health check URL?
Stack trace:
âĻ [đ´] Ã make kc
kind create cluster && make td
Creating cluster "kind" ...
â Ensuring node image (kindest/node:v1.24.0) đŧ
â Preparing nodes đĻ
â Writing configuration đ
â Starting control-plane đšī¸
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged kind-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Command Output: I0802 15:13:00.477226 142 initconfiguration.go:255] loading configuration from "/kind/kubeadm.conf"
W0802 15:13:00.477755 142 initconfiguration.go:332] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
[init] Using Kubernetes version: v1.24.0
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0802 15:13:00.482701 142 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0802 15:13:00.592328 142 certs.go:522] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kind-control-plane kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost] and IPs [10.96.0.1 172.18.0.2 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0802 15:13:00.894601 142 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0802 15:13:01.034046 142 certs.go:522] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0802 15:13:01.081085 142 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0802 15:13:01.134989 142 certs.go:522] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.18.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kind-control-plane localhost] and IPs [172.18.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0802 15:13:01.665251 142 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0802 15:13:01.871422 142 kubeconfig.go:103] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0802 15:13:02.242946 142 kubeconfig.go:103] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0802 15:13:02.475033 142 kubeconfig.go:103] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0802 15:13:02.570362 142 kubeconfig.go:103] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
I0802 15:13:02.640364 142 kubelet.go:65] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0802 15:13:02.738776 142 manifests.go:99] [control-plane] getting StaticPodSpecs
I0802 15:13:02.739201 142 certs.go:522] validating certificate period for CA certificate
I0802 15:13:02.739245 142 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0802 15:13:02.739250 142 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0802 15:13:02.739254 142 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0802 15:13:02.739258 142 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0802 15:13:02.739261 142 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0802 15:13:02.740793 142 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I0802 15:13:02.740801 142 manifests.go:99] [control-plane] getting StaticPodSpecs
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0802 15:13:02.740905 142 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0802 15:13:02.740911 142 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0802 15:13:02.740915 142 manifests.go:125] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0802 15:13:02.740919 142 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0802 15:13:02.740923 142 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0802 15:13:02.740927 142 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0802 15:13:02.740931 142 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
I0802 15:13:02.741438 142 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I0802 15:13:02.741445 142 manifests.go:99] [control-plane] getting StaticPodSpecs
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0802 15:13:02.741550 142 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0802 15:13:02.741873 142 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0802 15:13:02.742778 142 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0802 15:13:02.742793 142 waitcontrolplane.go:83] [wait-control-plane] Waiting for the API server to be healthy
I0802 15:13:02.743155 142 loader.go:372] Config loaded from file: /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0802 15:13:02.743871 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
I0802 15:13:03.244407 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0
...
I0802 15:17:00.744949 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
I0802 15:17:01.244899 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
I0802 15:17:01.744759 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
I0802 15:17:02.244476 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
I0802 15:17:02.745332 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
I0802 15:17:02.745615 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:108
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
cmd/kubeadm/app/cmd/init.go:153
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
cmd/kubeadm/app/kubeadm.go:50
main.main
cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:250
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1571
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
cmd/kubeadm/app/cmd/init.go:153
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
cmd/kubeadm/app/kubeadm.go:50
main.main
cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:250
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1571
make: *** [Makefile:6: kc] Error 1
Can you share the logs from kind create cluster --retain; kind export logs; kind delete cluster
It's possible it's the proxy but we should be setting sufficient no_proxy, we can check what the component logs show.
The stack trace is too long. Should I have saved it to a file?
Here is the tail:
I0802 15:50:20.389803 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
I0802 15:50:20.390123 142 round_trippers.go:553] GET https://kind-control-plane:6443/healthz?timeout=10s in 0 milliseconds
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
couldn't initialize a Kubernetes cluster
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase
cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:108
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
cmd/kubeadm/app/cmd/phases/workflow/runner.go:234
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
cmd/kubeadm/app/cmd/init.go:153
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
cmd/kubeadm/app/kubeadm.go:50
main.main
cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:250
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1571
error execution phase wait-control-plane
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
cmd/kubeadm/app/cmd/phases/workflow/runner.go:235
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
cmd/kubeadm/app/cmd/phases/workflow/runner.go:421
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
cmd/kubeadm/app/cmd/phases/workflow/runner.go:207
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdInit.func1
cmd/kubeadm/app/cmd/init.go:153
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
vendor/github.com/spf13/cobra/command.go:856
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
vendor/github.com/spf13/cobra/command.go:974
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
vendor/github.com/spf13/cobra/command.go:902
k8s.io/kubernetes/cmd/kubeadm/app.Run
cmd/kubeadm/app/kubeadm.go:50
main.main
cmd/kubeadm/kubeadm.go:25
runtime.main
/usr/local/go/src/runtime/proc.go:250
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1571
Exporting logs for cluster "kind" to:
/tmp/2561045446
Deleting cluster "kind" ...
The stack trace is too long. Should I have saved it to a file?
The stack trace does not contain all the node logs, just kind / kubeadm but https://github.com/kubernetes-sigs/kind/issues/2852#issuecomment-1202835966 would export many more log files to inspect
The error / trace tells us kubelet didn't become healthy, but to find out why kubelet didn't become healthy we need to dig deeper into the system.
There were no log files, could you be more explicit on what I should provide? I made sure to run the command you provided.
Maybe you should break out the commands to one at a time.
kind create cluster --retain
first to attempt to create the cluster but not to delete it on failure.
kind export logs
- this should give you all the log files that should (hopefully) provide the clues needed to figure out what is happening.
kind delete cluster
can be run once the logs are successfully captured.
I just noticed one line where it said exported to /tmp/random number
. Got everything!
Root of logs: kind-version.txt docker-info.txt
Control-plane logs: serial.log kubernetes-version.txt kubelet.log images.log containerd.log alternatives.log journal.log
could you upload a zip or tarball with the full directory? there's some other useful files and github will accept a zip or tarball.
Two things jump out from what we have so far:
And then there's this in the kubelet logs
Aug 02 15:46:20 kind-control-plane kubelet[212]: E0802 15:46:20.638146 212 kubelet.go:1378] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubelet kubepods] doesn't exist"
Aug 02 15:46:21 kind-control-plane kubelet[305]: E0802 15:46:21.939139 305 cgroup_manager_linux.go:473] cgroup manager.Set failed: openat2 /sys/fs/cgroup/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/cpu.weight: no such file or directory
(similar logs in containerd)
Does not look proxy related.
I did not know that the kernel was an important component for K8! I am using linux-lqx-bmq, which is downstream from zen-kernel. Switching back to zen fixed the issue.
Garuda is Arch with some desktop environment sugar. I made the switch from zen to lqx, it is my fault.
It's possible the custom kernel lacks some cgroups support (CPU weight?), IIRC previous problems with Arch kernels were that sort of thing.
Since the kind "nodes" are running on the shared host kernel, it must meet Kubernetes's requirements in terms of e.g. supported cgroup controllers.
This is a known limitation in the kernel https://github.com/damentz/liquorix-package/issues/101#issuecomment-1208165488
Is Groups v2 a hard requirement? Could we create the kind cluster with cgroups v1?
It isn't yet, some parts of the ecosystem may eventually drop support.
Cgroups of either v1 or v2 are.
does it mean, some systems which support only cgroups v1 can use kind?
does it mean, some systems which support only cgroups v1 can use kind?
yes, but has to supports all the cgroups "groups" required by kubelet, some arch like old raspberry models doesn't support it per example
This issue is now resolved with Liquorix by implementing the missing stubs that kind
and many other container orchestration platforms blindly depend on.
Relevant links:
Why is this needed ~cgroupv2 is supported by k8, should also be supported by kind.~
Revised issue: I am having an issue setting up a kind cluster