kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.75k stars 713 forks source link

kubeadm init failed #3038

Closed hillbun closed 6 months ago

hillbun commented 6 months ago

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"28", GitVersion:"v1.28.2", GitCommit:"89a4ea3e1e4ddd7f7572286090359983e0387b2f", GitTreeState:"clean", BuildDate:"2023-09-13T09:34:32Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

Linux node3 5.15.0-101-generic #111-Ubuntu SMP Tue Mar 5 20:16:58 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

containerd -version containerd github.com/containerd/containerd v1.7.14 dcf2847247e18caba8dce86522029642f60fe96b

What happened?

kubeadm init --control-plane-endpoint=192.168.56.102 --apiserver-advertise-address=192.168.56.102 --pod-network-cidr=10.244.0.0/16 --image-repository registry.aliyuncs.com/google_containers

can not finish installation

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a 
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID              POD

journalctl -xeu kubelet

Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.400844    4352 server.go:467] "Kubelet version" kubeletVersion="v1.28.0"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.400930    4352 server.go:469] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.401298    4352 server.go:630] "Standalone mode, no API client"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.411649    4352 server.go:518] "No api server defined - no events will be sent to API server"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.411678    4352 server.go:725] "--cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.411904    4352 container_manager_linux.go:265] "Container manager verified user specified cgroup-root exists" cgroupRoot=[]
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.412052    4352 container_manager_linux.go:270] "Creating Container Manager object based on Node Config" nodeConfig={"RuntimeCgroupsName":"","SystemCgroupsName":"","KubeletCgroupsName":"","KubeletOOMScoreAdj":-999,"ContainerRuntime":"","CgroupsPerQOS":true,"CgroupRoot":"/","CgroupDriver":"cgroupfs","KubeletRootDir":"/var/lib/kubelet","ProtectKernelDefaults":false,"KubeReservedCgroupName":"","SystemReservedCgroupName":"","ReservedSystemCPUs":{},"EnforceNodeAllocatable":{"pods":{}},"KubeReserved":null,"SystemReserved":null,"HardEvictionThresholds":[],"QOSReserved":{},"CPUManagerPolicy":"none","CPUManagerPolicyOptions":null,"TopologyManagerScope":"container","CPUManagerReconcilePeriod":10000000000,"ExperimentalMemoryManagerPolicy":"None","ExperimentalMemoryManagerReservedMemory":null,"PodPidsLimit":-1,"EnforceCPULimits":true,"CPUCFSQuotaPeriod":100000000,"TopologyManagerPolicy":"none","TopologyManagerPolicyOptions":null}
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.412084    4352 topology_manager.go:138] "Creating topology manager with none policy"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.412093    4352 container_manager_linux.go:301] "Creating device plugin manager"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.412131    4352 state_mem.go:36] "Initialized new in-memory state store"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.412188    4352 kubelet.go:399] "Kubelet is running in standalone mode, will skip API server sync"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.412721    4352 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.7.14" apiVersion="v1"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.413152    4352 volume_host.go:74] "KubeClient is nil. Skip initialization of CSIDriverLister"
Mar 27 07:43:05 node3 kubelet[4352]: W0327 07:43:05.413471    4352 csi_plugin.go:189] kubernetes.io/csi: kubeclient not set, assuming standalone kubelet
Mar 27 07:43:05 node3 kubelet[4352]: W0327 07:43:05.413560    4352 csi_plugin.go:266] Skipping CSINode initialization, kubelet running in standalone mode
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.413919    4352 server.go:1232] "Started kubelet"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.414082    4352 server.go:162] "Starting to listen" address="0.0.0.0" port=10250
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.414102    4352 kubelet.go:1579] "No API server defined - no node status update will be sent"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.414142    4352 server.go:194] "Starting to listen read-only" address="0.0.0.0" port=10255
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.416827    4352 server.go:462] "Adding debug handlers to kubelet server"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.414281    4352 ratelimit.go:65] "Setting rate limiting for podresources endpoint" qps=100 burstTokens=10
Mar 27 07:43:05 node3 kubelet[4352]: E0327 07:43:05.414778    4352 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.417066    4352 fs_resource_analyzer.go:67] "Starting FS ResourceAnalyzer"
Mar 27 07:43:05 node3 kubelet[4352]: E0327 07:43:05.420382    4352 kubelet.go:1431] "Image garbage collection failed once. Stats initialization may not have completed yet" err="invalid capacity 0 on image filesystem"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.421253    4352 volume_manager.go:291] "Starting Kubelet Volume Manager"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.421437    4352 desired_state_of_world_populator.go:151] "Desired state populator starts to run"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.421525    4352 reconciler_new.go:29] "Reconciler: start to sync state"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.421806    4352 server.go:233] "Starting to serve the podresources API" endpoint="unix:/var/lib/kubelet/pod-resources/kubelet.sock"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.455426    4352 kubelet_network_linux.go:50] "Initialized iptables rules." protocol="IPv4"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.457843    4352 kubelet_network_linux.go:50] "Initialized iptables rules." protocol="IPv6"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.457941    4352 status_manager.go:213] "Kubernetes client is nil, not starting status manager"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.457961    4352 kubelet.go:2303] "Starting kubelet main sync loop"
Mar 27 07:43:05 node3 kubelet[4352]: E0327 07:43:05.458067    4352 kubelet.go:2327] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.462269    4352 cpu_manager.go:214] "Starting CPU manager" policy="none"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.462452    4352 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.462704    4352 state_mem.go:36] "Initialized new in-memory state store"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.463055    4352 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.463221    4352 state_mem.go:96] "Updated CPUSet assignments" assignments={}
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.463333    4352 policy_none.go:49] "None policy: Start"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.464364    4352 memory_manager.go:169] "Starting memorymanager" policy="None"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.464401    4352 state_mem.go:35] "Initializing new in-memory state store"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.464618    4352 state_mem.go:75] "Updated machine memory state"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.465336    4352 manager.go:471] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.465674    4352 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
Mar 27 07:43:05 node3 kubelet[4352]: I0327 07:43:05.528405    4352 desired_state_of_world_populator.go:159] "Finished populating initial desired state of world"

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

neolit123 commented 6 months ago

you'd have to check deeper in the logs for Exxx and Fxxx entries, your logs only show Ixxx.

one of the common causes is cgroup driver problems and not following the steps at: https://kubernetes.io/docs/setup/production-environment/container-runtimes/

but we don't provide support on github, see the message from the bot bellow.

/support

github-actions[bot] commented 6 months ago

Hello, @hillbun :robot: :wave:

You seem to have troubles using Kubernetes and kubeadm. Note that our issue trackers should not be used for providing support to users. There are special channels for that purpose.

Please see:

neolit123 commented 6 months ago

"Kubelet is running in standalone mode, will skip API server sync"

this may hint that something is broken in the kubelet setup and it's not a standard setup. kubelet should be running in non-standalone mode - i.e. it has --kubeconfig and talks to kubeapiserver https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/#the-kubelet-drop-in-file-for-systemd

this is the only up-to-date guide for setuping kubeadm clusters https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/