opsnull / follow-me-install-kubernetes-cluster

和我一步步部署 kubernetes 集群
Other
7.39k stars 2.9k forks source link

安装kubelet时失败 提示【Failed to start ContainerManager failed to get rootfs info: unable to find data in memory cache】 #620

Closed thinkin closed 3 years ago

thinkin commented 3 years ago

文档版本 master v1.16.6

现象描述 【系统】 Linux 4.14.0_1-0-0-31 x86_64 GNU/Linux CentOS release 7.5 (Final) Docker Version: 18.09.6 CNI:flannel 机器负载正常、关闭swap分区 已安装apiserver、controller-manager、scheduler等master组件,且按照文档验证均正常

【问题】 当我尝试启动kubelet时

systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet

会发现node一直卡在NotReady状态,describe详情时发现提示以下错误

container runtime status check may not have completed yet, missing node capacity for resources: ephemeral-storage

排查kubelet启动日志时发现有: Fatal日志

63454 kubelet.go:1380] Failed to start ContainerManager failed to get rootfs info: unable to find data in memory cache

Warning日志

Failed to update stats for container "/system.slice/docker.service": failure - /sys/fs/cgroup/cpuacct/system.slice/docker.service/cpuacct.stat is expected to have 4 fields, continuing to push stats

并且发现kubelet一直在重启

/etc/systemd/system/kubelet.service 启动配置文件如下

[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Requires=docker.service

[Service]
WorkingDirectory=/home/disk3/apps/k8s/kubelet
ExecStart=/opt/kube/bin/kubelet \
  --bootstrap-kubeconfig=/home/disk3/apps/k8s/kubelet-bootstrap.kubeconfig \
  --cert-dir=/home/disk3/apps/k8s/cert \
  --root-dir=/home/disk3/apps/k8s/kubelet \
  --kubeconfig=/home/disk3/apps/k8s/kubelet.kubeconfig \
  --config=/home/disk3/apps/k8s/kubelet-config.yaml \
  --hostname-override=dev05\
  --image-pull-progress-deadline=15m \
  --volume-plugin-dir=/home/disk3/apps/k8s/kubelet/kubelet-plugins/volume/exec/ \
  --logtostderr=true \
  --v=2
Restart=always
RestartSec=5
StartLimitInterval=0

[Install]
WantedBy=multi-user.target

求助

thinkin commented 3 years ago

@opsnull

thinkin commented 3 years ago

已解决

这是由于cAdvisor读取rootfs异常导致的,在kubelet启动配置中添加

featureGates:
  LocalStorageCapacityIsolation: false
  SupportNodePidsLimit: false
  SupportPodPidsLimit: false

即可

ikingye commented 3 years ago
vi /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf

Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml --feature-gates=\"LocalStorageCapacityIsolation=false,SupportNodePidsLimit=false,SupportPodPidsLimit=false\""
ikingye commented 3 years ago
# 监控 kubelet 错误
journalctl -xefu kubelet | egrep ": [F][0-9]" -B 1
pacoxu commented 1 year ago

LocalStorageCapacityIsolation: false SupportNodePidsLimit: false SupportPodPidsLimit: false

LocalStorageCapacityIsolation was GAed since v1.25. SupportPodPidsLimit and SupportNodePidsLimit were GAed in v1.20 and removed in v1.25.

GAed feature gate cannot be set to false.