labring / sealos

Sealos is a production-ready Kubernetes distribution. You can run any Docker image on sealos, start high availability databases like mysql/pgsql/redis/mongo, develop applications using any Programming language.
https://cloud.sealos.io
Apache License 2.0
14.11k stars 2.08k forks source link

God help me! ! ! Kirin sp3 system arm architecture machine, deploying k8s is stuck here in kubelet startup/etc/kubernetes/manifest static pod, kubelet reports error getting node xxx not found #4871

Closed SupRenekton closed 3 months ago

SupRenekton commented 4 months ago

Sealos Version

sealos_4.3.7_linux_arm64.tar.gz

How to reproduce the bug?

操作系统:麒麟V10(SP3)/(Lance)-aarch64-Build23/20230324 内核版本:4.19.90-52.22.v2207.ky10.aarch64 sealos版本:4.3.7_linux_arm64、其他版本的也都试过了 k8s版本:1.23.17、1.24.2、1.25.5、1.25.6、1.25.16、1.29.6都试过了

What is the expected behavior?

卡在Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests" 静态pod如kube-apiserver、kube-controller-manager、kube-scheduler这些都起不来;kubelet状态是running,报错静态pod容器捡不起来,getting node xxx not found;containerd状态running,报错failed to get sandbox container task: no running task found

然后比较奇怪的是我之前用麒麟V10(SP2)的系统利用sealos4.3.7是能够正常拉起k8s集群的,只不过有个cgroup的问题导致coredns起不来,需要改containerd的cgropu配置,改的和kubelet的cgroup不一致才行,这样我觉得有隐患,所以听取网上意见更换成麒麟SP3系统,其他的都不变,结果连集群都拉不起来了!!!!!! 来个大神救我,卡2天了,难受

What do you see instead?

No response

Operating environment

- Sealos version:4.3.7
- Docker version:
- Kubernetes version:1.25.16
- Operating system:
- Runtime environment:
- Cluster size:3个master
- Additional information:

Additional information

No response

SupRenekton commented 4 months ago

微信图片_20240709161958

sealos-ci-robot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


WeChat picture_20240709161958

SupRenekton commented 4 months ago

微信图片_202407091619581 微信图片_202407091619582 微信图片_202407091619587 微信图片_202407091619588 微信图片_202407091619584 微信图片_202407091619585 微信图片_202407091619586

sealos-ci-robot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


WeChat picture_202407091619581 WeChat picture_202407091619582 WeChat picture_202407091619587 WeChat picture_202407091619588 WeChat picture_202407091619584 WeChat picture_202407091619585 WeChat picture_202407091619586

SupRenekton commented 4 months ago

微信图片_20240709165155

sealos-ci-robot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


WeChat picture_20240709165155

bxy4543 commented 4 months ago

crictl ps -a crictl logs 看看容器日志

sealos-ci-robot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


crictl ps -a crictl logs look at the container logs

SupRenekton commented 4 months ago

crictl ps -a crictl logs 看看容器日志

微信图片_20240709171140 报错是没有日志路径,应该是容器根本就没建成功,所以还没有容器日志吧

sealos-ci-robot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


crictl ps -a crictl logs look at the container logs

WeChat picture_20240709171140 The error reported is that the symbolic link failed to be parsed. I don’t know what it means.

bxy4543 commented 4 months ago

containerd日志排查下呢

sealos-ci-robot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Check containerd logs

SupRenekton commented 4 months ago

自己解决了,应该是runc的问题,sp3麒麟系统自带的/usr/local/bin/runc换成sp2的runc后就正常了

sealos-ci-robot commented 4 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I solved it myself. It should be a runc problem. After replacing the /usr/local/bin/runc that comes with the sp3 Kirin system with sp2's runc, it became normal.