labring / sealos

Sealos is a production-ready Kubernetes distribution. You can run any Docker image on sealos, start high availability databases like mysql/pgsql/redis/mongo, develop applications using any Programming language.
https://cloud.sealos.io
Apache License 2.0
13.98k stars 2.08k forks source link

BUG: 升级集群失败 #4591

Closed cnfzh66 closed 5 months ago

cnfzh66 commented 7 months ago

Sealos Version

v4.3.7

How to reproduce the bug?

从1.24升级到1.25是正常的,然后1.25再升级到1.26就失败了: kubelet报这个错:

Mar 14 14:16:39 k8s-node01 kubelet[97846]: W0314 14:16:39.290204   97846 feature_gate.go:241] Setting GA feature gate EphemeralContainers=true. It will be removed in a future release.
Mar 14 14:16:39 k8s-node01 kubelet[97846]: W0314 14:16:39.290557   97846 feature_gate.go:241] Setting GA feature gate EphemeralContainers=true. It will be removed in a future release.
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.298115   97846 server.go:412] "Kubelet version" kubeletVersion="v1.26.14"
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.298220   97846 server.go:414] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Mar 14 14:16:39 k8s-node01 kubelet[97846]: W0314 14:16:39.298496   97846 feature_gate.go:241] Setting GA feature gate EphemeralContainers=true. It will be removed in a future release.
Mar 14 14:16:39 k8s-node01 kubelet[97846]: W0314 14:16:39.298918   97846 feature_gate.go:241] Setting GA feature gate EphemeralContainers=true. It will be removed in a future release.
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.299684   97846 server.go:836] "Client rotation is on, will bootstrap in background"
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.312526   97846 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.318997   97846 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.351549   97846 server.go:659] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.353833   97846 container_manager_linux.go:266] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.354266   97846 container_manager_linux.go:271] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:1Gi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] ExperimentalTopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.354365   97846 topology_manager.go:134] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.354488   97846 container_manager_linux.go:307] "Creating device plugin manager"
Mar 14 14:16:39 k8s-node01 kubelet[97846]: I0314 14:16:39.354626   97846 state_mem.go:36] "Initialized new in-memory state store"
Mar 14 14:16:39 k8s-node01 kubelet[97846]: E0314 14:16:39.358875   97846 run.go:74] "command failed" err="failed to run Kubelet: validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/cri-dockerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
Mar 14 14:16:39 k8s-node01 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Mar 14 14:16:39 k8s-node01 kubelet-post-stop.sh[97861]: Thu Mar 14 14:16:39 CST 2024
Mar 14 14:16:39 k8s-node01 systemd[1]: Unit kubelet.service entered failed state.
Mar 14 14:16:39 k8s-node01 systemd[1]: kubelet.service failed.
Mar 14 14:16:49 k8s-node01 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Mar 14 14:16:49 k8s-node01 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished shutting down

What is the expected behavior?

No response

What do you see instead?

No response

Operating environment

- Sealos version: v4.3.7
- Docker version: 25.0.4
- Kubernetes version: 1.26
- Operating system: centos7
- Runtime environment:
- Cluster size: 3master,1worker
- Additional information:

Additional information

No response

cnfzh66 commented 7 months ago

直接执行的命令: sealos run docker.io/labring/kubernetes-docker:v1.24.16-4.3.7

sealos-ci-robot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Directly executed commands: sealos run docker.io/labring/kubernetes-docker:v1.24.16-4.3.7

cnfzh66 commented 7 months ago

测试了一下用containerd的就没问题

sealos-ci-robot commented 7 months ago

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I tested it using containerd and it was fine.

willzhang commented 6 months ago

看日志,像这个原因

"command failed" err="failed to run Kubelet: validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/cri-dockerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"

类似ISSUE: https://github.com/labring/sealos/issues/2669