Closed meilihao closed 3 years ago
last try with full log: kk.log systemd.log
According to these logs, there something wrong in the kubelet:
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
You can follow the log instructions for troubleshooting
tried again: kubelet is running, and no container started.
# root@chen-aliyun:~# crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
# root@chen-aliyun:~# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sat 2021-09-04 09:21:29 CST; 9min ago
Docs: http://kubernetes.io/docs/
Main PID: 58297 (kubelet)
Tasks: 13 (limit: 4482)
Memory: 57.9M
CGroup: /system.slice/kubelet.service
└─58297 /usr/local/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet>
Sep 04 09:30:33 chen-aliyun kubelet[58297]: E0904 09:30:33.501649 58297 kubelet.go:2407] "Error getting node" err="node \"chen-aliyun\" not found"
Sep 04 09:30:33 chen-aliyun kubelet[58297]: E0904 09:30:33.602309 58297 kubelet.go:2407] "Error getting node" err="node \"chen-aliyun\" not found"
Sep 04 09:30:33 chen-aliyun kubelet[58297]: E0904 09:30:33.631627 58297 certificate_manager.go:471] kubernetes.io/kube-apiserver-client-kubelet: Failed while requesting a >
Sep 04 09:30:33 chen-aliyun kubelet[58297]: E0904 09:30:33.702851 58297 kubelet.go:2407] "Error getting node" err="node \"chen-aliyun\" not found"
Sep 04 09:30:33 chen-aliyun kubelet[58297]: E0904 09:30:33.803263 58297 kubelet.go:2407] "Error getting node" err="node \"chen-aliyun\" not found"
Sep 04 09:30:33 chen-aliyun kubelet[58297]: E0904 09:30:33.903889 58297 kubelet.go:2407] "Error getting node" err="node \"chen-aliyun\" not found"
'journalctl -xeu kubelet' log: kubelet.log
I got containerd error log with debug level:
Sep 04 10:28:04 chen-aliyun containerd[58865]: time="2021-09-04T10:28:04.520051030+08:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-chen-aliyun,Uid:a80ce4480dd54dda09001653c9dce4df,Namespace:kube-system,Attempt:0,} failed, error" error="failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp 108.177.97.82:443: i/o timeout"
container conf:
# containerd --version
containerd containerd.io 1.4.9 e25210fe30a0a703442421b0f60afac609f950a3
# cat /etc/containerd/config.toml |grep san
sandbox_image = "kubesphere/pause:3.2"
# systemctl restart containerd.service
# containerd config dump |grep san
sandbox_image = "k8s.gcr.io/pause:3.2"
# crictl images |grep pause
docker.io/kubesphere/pause 3.2 80d28bedfe5de 299kB
docker.io/kubesphere/pause 3.5 ed210e3e4a5ba 301kB
container config is not working.
i upgraded containterd from 1.4.9 to 1.5.5. it still is not work but got a warning.
# containerd config dump |grep san
WARN[0000] deprecated version : `1`, please switch to version `2`
sandbox_image = "k8s.gcr.io/pause:3.5"
So add version = 2
to /etc/containerd/config.toml
, then restart containerd. it works.
# containerd config dump |grep san
sandbox_image = "kubesphere/pause:3.5"
i don't think simply just switch this version to 2 can solve this problem. i had just very similar log ouput and configuration as issue raiser.
pvesc:~# containerd config dump|grep -Ei "version|san" version = 2 sandbox_image = "k8s.gcr.io/pause:3.5"
trim down to simplest config-sample.yaml to only 2 nodes,1 etcd 1 master 2worker
still cannot bootstrap the kubelet by those default confs
root@pvesc:~# arch x86_64 root@pvesc:~# runc --version runc version 1.1.1 commit: v1.1.0-20-g52de29d7 spec: 1.0.2-dev go: go1.17.6 libseccomp: 2.5.3 root@pvesc:~# containerd --version containerd github.com/containerd/containerd v1.6.4 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
What is version of KubeKey has the issue?
1.2.0-alpha.3
What is your os environment?
Ubuntu 20.04.3 LTS
KubeKey config file
A clear and concise description of what happend.
Can't deloy kubesphere.
I tried kubekey v1.1.1 with k8s v1.20.6 and kubekey v1.2.0-alpha.3 with k8s v1.22.1, go the same problem.
Relevant log output
Additional information
env: