kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.75k stars 713 forks source link

kubelet won't restart after reboot - Unable to register node with API server: connection refused #1026

Closed PierrickLozach closed 6 years ago

PierrickLozach commented 6 years ago

Is this a request for help?

It is but I have searched StackOverflow and googled many times without finding the issue. Also, this seems to affect more people.

What keywords did you search in kubeadm issues before filing this one?

The error messages I see in journalctl

Is this a BUG REPORT or FEATURE REQUEST?

Bug report

Versions

kubeadm version: kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

- **Kernel**: `Linux kubernetes 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux`

## What happened?
Kubelet service does not start

## What you expected to happen?
Kubelet service should start

## How to reproduce it (as minimally and precisely as possible)?
* Used kubeadm to deploy kubernetes
* Deployed multiple services and could confirm that everything was working fine
* Rebooted
* Kubelet service no longer starts

## Anything else we need to know?

Journalctl logs:

Jul 27 14:46:17 kubernetes systemd[1]: Starting kubelet: The Kubernetes Node Agent... -- Subject: Unit kubelet.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

-- Unit kubelet.service has begun starting up. Jul 27 14:46:17 kubernetes kubelet[1619]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more informatio n. Jul 27 14:46:17 kubernetes kubelet[1619]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.608612 1619 server.go:408] Version: v1.11.1 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.609679 1619 plugins.go:97] No cloud provider specified. Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.613651 1619 certificate_store.go:131] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem". Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.709720 1619 server.go:648] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to / Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710299 1619 container_manager_linux.go:243] container manager verified user specified cgroup-root exists: [] Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710322 1619 container_manager_linux.go:248] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity: Percentage:0.1} GracePeriod:0s MinReclaim:} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity: Percentage:0.05} GracePeriod:0s MinReclaim:} {Signal:imagefs.available Operator:LessThan Value:{Quantity: Percentage:0.15} GracePeriod:0s MinReclaim:} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true} Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710457 1619 container_manager_linux.go:267] Creating device plugin manager: true Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710515 1619 state_mem.go:36] [cpumanager] initializing new in-memory state store Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710600 1619 state_mem.go:84] [cpumanager] updated default cpuset: "" Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710617 1619 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]" Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710751 1619 kubelet.go:274] Adding pod path: /etc/kubernetes/manifests Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710814 1619 kubelet.go:299] Watching apiserver Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.711655 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list v1.Service: Get https://192.168.1.19:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.711661 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list v1.Node: Get https://192.168.1.19:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.711752 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.1.19:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.717242 1619 client.go:75] Connecting to docker on unix:///var/run/docker.sock Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.717277 1619 client.go:104] Start docker client with request timeout=2m0s Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.718726 1619 docker_service.go:545] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth" Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.718756 1619 docker_service.go:238] Hairpin mode set to "hairpin-veth" Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.721656 1619 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup. Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.721975 1619 docker_service.go:253] Docker cri networking managed by cni Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.733083 1619 docker_service.go:258] Docker Info: &{ID:V36L:ETJO:IECX:PJF4:G3GB:JHA6:LGCF:VQBJ:D2GY:PVFO:567O:545Y Containers:66 ContainersRunning:0 ContainersPaused:0 ContainersStopped:66 Images:21 Driver:overlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[] Log:[]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:15 OomKillDisable:true NGoroutines:22 SystemTime:2018-07-27T14:46:17.727178862+02:00 LoggingDriver:journald CgroupDriver:systemd NEventsListener:0 KernelVersion:3.10.0-862.9.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc420ebd110 NCPU:12 MemTotal:33386934272 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:kubernetes Labels:[] ExperimentalBuild:false ServerVersion:1.13.1 ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[]} docker-runc:{Path:/usr/libexec/docker/docker-runc-current Args:[]}] DefaultRuntime:docker-runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:0xc421016140} LiveRestoreEnabled:false Isolation: InitBinary:/usr/libexec/docker/docker-init-current ContainerdCommit:{ID: Expected:aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1} RuncCommit:{ID:5eda6f6fd0c2884c2c8e78a6e7119e8d0ecedb77 Expected:9df8b306d01f59d3a8029be411de015b7304dd8f} InitCommit:{ID:fec3683b971d9c3ef73f284f176672c44b448662 Expected:949e6facb77383876aeff8a6944dde66b3089574} SecurityOptions:[name=seccomp,profile=/etc/docker/seccomp.json name=selinux]} Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.733181 1619 docker_service.go:271] Setting cgroupDriver to systemd Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.825381 1619 kuberuntime_manager.go:186] Container runtime docker initialized, version: 1.13.1, apiVersion: 1.26.0 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.839306 1619 csi_plugin.go:111] kubernetes.io/csi: plugin initializing... Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.840955 1619 server.go:129] Starting to listen on 0.0.0.0:10250 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841036 1619 server.go:986] Started kubelet Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841423 1619 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841448 1619 status_manager.go:152] Starting to sync pod status with apiserver Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841462 1619 kubelet.go:1758] Starting kubelet main sync loop. Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841479 1619 kubelet.go:1775] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s] Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841710 1619 volume_manager.go:247] Starting Kubelet Volume Manager Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841754 1619 desired_state_of_world_populator.go:130] Desired state populator starts to run Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.842653 1619 server.go:302] Adding debug handlers to kubelet server. Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.868316 1619 kubelet.go:1261] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data for container / Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.872508 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-hostnamed.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.872925 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-journal-flush.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.873312 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-logind.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.873703 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-remount-fs.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.874064 1619 container.go:393] Failed to create summary reader for "/system.slice/rsyslog.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.874452 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-readahead-collect.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.874765 1619 container.go:393] Failed to create summary reader for "/system.slice": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.875097 1619 container.go:393] Failed to create summary reader for "/system.slice/kmod-static-nodes.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.875392 1619 container.go:393] Failed to create summary reader for "/system.slice/irqbalance.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.875679 1619 container.go:393] Failed to create summary reader for "/system.slice/rhel-dmesg.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.876007 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-readahead-replay.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.876289 1619 container.go:393] Failed to create summary reader for "/system.slice/NetworkManager.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.876567 1619 container.go:393] Failed to create summary reader for "/system.slice/auditd.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.876913 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-udev-trigger.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.877200 1619 container.go:393] Failed to create summary reader for "/system.slice/kubelet.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.877503 1619 container.go:393] Failed to create summary reader for "/system.slice/network.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.877792 1619 container.go:393] Failed to create summary reader for "/system.slice/system-getty.slice": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.878118 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-journald.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.878486 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-user-sessions.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.878912 1619 container.go:393] Failed to create summary reader for "/system.slice/polkit.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.879312 1619 container.go:393] Failed to create summary reader for "/system.slice/rhel-domainname.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.879802 1619 container.go:393] Failed to create summary reader for "/system.slice/lvm2-monitor.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.880172 1619 container.go:393] Failed to create summary reader for "/system.slice/tuned.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.880491 1619 container.go:393] Failed to create summary reader for "/system.slice/dbus.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.880788 1619 container.go:393] Failed to create summary reader for "/system.slice/docker.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.881112 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-udevd.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.881402 1619 container.go:393] Failed to create summary reader for "/system.slice/kdump.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.881710 1619 container.go:393] Failed to create summary reader for "/system.slice/rhel-import-state.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.882166 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-random-seed.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.882509 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-tmpfiles-setup-dev.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.882806 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-tmpfiles-setup.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.883115 1619 container.go:393] Failed to create summary reader for "/system.slice/rhel-readonly.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.883420 1619 container.go:393] Failed to create summary reader for "/system.slice/NetworkManager-dispatcher.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.883704 1619 container.go:393] Failed to create summary reader for "/system.slice/NetworkManager-wait-online.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.884005 1619 container.go:393] Failed to create summary reader for "/system.slice/crond.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.884329 1619 container.go:393] Failed to create summary reader for "/system.slice/system-selinux\x2dpolicy\x2dmigrate\x2dlocal\x2dchanges.slice": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.884617 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-sysctl.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.884907 1619 container.go:393] Failed to create summary reader for "/system.slice/k8s-self-hosted-recover.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.885213 1619 container.go:393] Failed to create summary reader for "/system.slice/lvm2-lvmetad.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.885466 1619 container.go:393] Failed to create summary reader for "/user.slice": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.885730 1619 container.go:393] Failed to create summary reader for "/system.slice/sshd.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.886098 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-update-utmp.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.886384 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-vconsole-setup.service": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.913789 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.917905 1619 cpu_manager.go:155] [cpumanager] starting with none policy Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.917923 1619 cpu_manager.go:156] [cpumanager] reconciling every 10s Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.917935 1619 policy_none.go:42] [cpumanager] none policy: Start Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.926164 1619 event.go:212] Unable to write event: 'Post https://192.168.1.19:6443/api/v1/namespaces/default/events: dial tcp 192.168.1.19:6443: connect: connection refused' (may retry after sleeping) Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.932356 1619 container.go:393] Failed to create summary reader for "/libcontainer_1619_systemd_test_default.slice": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.941592 1619 kubelet.go:1775] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s] Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.941762 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.944471 1619 kubelet_node_status.go:79] Attempting to register node kubernetes Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.944714 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:17 kubernetes kubelet[1619]: Starting Device Plugin manager Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.986308 1619 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "kubernetes" not found Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986668 1619 container_manager_linux.go:792] CPUAccounting not enabled for pid: 998 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986680 1619 container_manager_linux.go:795] MemoryAccounting not enabled for pid: 998 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986749 1619 container_manager_linux.go:792] CPUAccounting not enabled for pid: 1619 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986755 1619 container_manager_linux.go:795] MemoryAccounting not enabled for pid: 1619 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.144855 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.148528 1619 kubelet_node_status.go:79] Attempting to register node kubernetes Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.148933 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.158503 1619 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "rook-ceph-mon0-4txgr_rook-ceph": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "5b910771d1fd895b3b8d2feabdeb564cc57b213ae712416bdffec4a414dc4747" Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.300596 1619 pod_container_deletor.go:75] Container "5b910771d1fd895b3b8d2feabdeb564cc57b213ae712416bdffec4a414dc4747" not found in pod's containers Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.323729 1619 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "rook-ceph-osd-id-0-54d59fc64b-c5tw4_rook-ceph": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "a73305551840113b16cedd206109a837f57c6c3b2c8b1864ed5afab8b40b186d" Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.516802 1619 pod_container_deletor.go:75] Container "a73305551840113b16cedd206109a837f57c6c3b2c8b1864ed5afab8b40b186d" not found in pod's containers Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.549067 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.552841 1619 kubelet_node_status.go:79] Attempting to register node kubernetes Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.553299 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.674143 1619 pod_container_deletor.go:75] Container "96b85439f089170cf6161f5410f8970de67f0609d469105dff4e3d5ec2d10351" not found in pod's containers Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.712440 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list v1.Service: Get https://192.168.1.19:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.713284 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list v1.Node: Get https://192.168.1.19:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.714397 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list v1.Pod: Get https://192.168.1.19:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:19 kubernetes kubelet[1619]: W0727 14:46:19.139032 1619 pod_container_deletor.go:75] Container "7b9757b85bc8ee4ce6ac954acf0bcd5c06b2ceb815aee802a8f53f9de18d967f" not found in pod's containers Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.932356 1619 container.go:393] Failed to create summary reader for "/libcontainer_1619_systemd_test_default.slice": none of the resources are being tracked. Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.941592 1619 kubelet.go:1775] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s] Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.941762 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.944471 1619 kubelet_node_status.go:79] Attempting to register node kubernetes Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.944714 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:17 kubernetes kubelet[1619]: Starting Device Plugin manager Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.986308 1619 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "kubernetes" not found Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986668 1619 container_manager_linux.go:792] CPUAccounting not enabled for pid: 998 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986680 1619 container_manager_linux.go:795] MemoryAccounting not enabled for pid: 998 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986749 1619 container_manager_linux.go:792] CPUAccounting not enabled for pid: 1619 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986755 1619 container_manager_linux.go:795] MemoryAccounting not enabled for pid: 1619 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.144855 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.148528 1619 kubelet_node_status.go:79] Attempting to register node kubernetes Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.148933 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.158503 1619 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "rook-ceph-mon0-4txgr_rook-ceph": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "5b910771d1fd895b3b8d2feabdeb564cc57b213ae712416bdffec4a414dc4747" Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.300596 1619 pod_container_deletor.go:75] Container "5b910771d1fd895b3b8d2feabdeb564cc57b213ae712416bdffec4a414dc4747" not found in pod's containers Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.323729 1619 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "rook-ceph-osd-id-0-54d59fc64b-c5tw4_rook-ceph": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "a73305551840113b16cedd206109a837f57c6c3b2c8b1864ed5afab8b40b186d" Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.516802 1619 pod_container_deletor.go:75] Container "a73305551840113b16cedd206109a837f57c6c3b2c8b1864ed5afab8b40b186d" not found in pod's containers Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.549067 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.552841 1619 kubelet_node_status.go:79] Attempting to register node kubernetes Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.553299 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.674143 1619 pod_container_deletor.go:75] Container "96b85439f089170cf6161f5410f8970de67f0609d469105dff4e3d5ec2d10351" not found in pod's containers Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.712440 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list v1.Service: Get https://192.168.1.19:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.713284 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://192.168.1.19:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused


And it goes on and one about not being able to register `kubernetes` (that's my host name) and failing to list kubernetes resources.

From the start, I applied the self-hosted-recover script (https://github.com/xetys/k8s-self-hosted-recovery) to not be affected by a reboot. Here are the logs:

Jul 27 14:46:09 kubernetes systemd[1]: Starting Recovers self-hosted k8s after reboot... Jul 27 14:46:09 kubernetes k8s-self-hosted-recover[1001]: [k8s-self-hosted-recover] Restoring old plane... Jul 27 14:46:12 kubernetes k8s-self-hosted-recover[1001]: [controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml" Jul 27 14:46:12 kubernetes k8s-self-hosted-recover[1001]: [controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml" Jul 27 14:46:12 kubernetes k8s-self-hosted-recover[1001]: [controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml" Jul 27 14:46:17 kubernetes k8s-self-hosted-recover[1001]: [k8s-self-hosted-recover] Waiting while the api server is back..



I am running out of ideas and would welcome any help you can bring.
neolit123 commented 4 years ago

@cjbottaro could it be that your kubelet client certificates have expired?

see the second warning here: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration

On nodes created with kubeadm init, prior to kubeadm version 1.17...

DanielIvaylov commented 4 years ago

same problem running Ubuntu 16 on VMWare.

I am running cluster in vmware too, what did resolve your problem? Thanks

voodoonofx commented 4 years ago

Had the same issues today after editing the service-cidr settings on my new kube cluster. The issue for me was the kube-apiserver docker container was flapping. After looking at the logs using docker logs , I saw: Error: error determining service IP ranges for primary service cidr: The service cluster IP range must be at least 8 IP addresses. I had though I could provision a smaller cidr of /30 to services, but needed to open it to a /29.

neolit123 commented 4 years ago

not sure if related but note that 1.17.0 has CIDR related bugs, so hopefully you are running a more recent patch of .17.

fishhead2zju commented 4 years ago

u may need to change the rules of iptables. run CMD:iptables -L --line-numbers to find the reject-with icmp-host-prohibited then,iptables -D INPUT 153 ,delete it #153 means line-number at last,restart kubelet

Faceless28 commented 4 years ago

I have a simmilar promblem using kubeadm cluster. I've just use 2x times docker restart $(docker ps -qa)

ChoppinBlockParty commented 4 years ago

Have the same problem that @PierrickI3 does. After reboot the node control plane is down. kubelet service is running trying to connect to non-running apiserver. etcd is running. There are no CNI network interfaces, only loopback, ethernet, and docker.

There is no particular error seen anywhere, it just does not start, though worked fine for a long time before the reboot. Tried everything mention in this thread as well as in many others.

I am utterly confused on what starts what here to investigate the issue further. Does kubelet service start CNI network and then start the control plane (e.g. api, scheduler, etc.) pods? I have checked docker ps -a and all control plane containers has not been attempted to be started, as well, there is no restart policy on those. So why does kubelet tries to talk to api server when it has not started it?

kandyp commented 3 years ago

Had similar issue, its not resolved for me yet but the issue occurred due to docker getting upgraded to an incompatible version. Just check of your docker service is running or not.

cr6588 commented 3 years ago

@neolit123 '@cjbottaro' could it be that your kubelet client certificates have expired?

see the second warning here: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration

On nodes created with kubeadm init, prior to kubeadm version 1.17...

Thanks.My machine has been unable to connect after restarting. After checking the certificate, it was found that 3 certificates had expired.

[root@localhost home]# kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

W0128 14:02:50.815166   21689 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Sep 17, 2021 06:56 UTC   232d                                    no      
apiserver                  Sep 17, 2021 06:55 UTC   232d            ca                      no      
apiserver-etcd-client      Sep 17, 2021 06:55 UTC   232d            etcd-ca                 no      
apiserver-kubelet-client   Sep 17, 2021 06:55 UTC   232d            ca                      no      
controller-manager.conf    Sep 17, 2021 06:56 UTC   232d                                    no      
etcd-healthcheck-client    Dec 19, 2020 07:56 UTC   <invalid>       etcd-ca                 no      
etcd-peer                  Dec 19, 2020 07:56 UTC   <invalid>       etcd-ca                 no      
etcd-server                Dec 19, 2020 07:56 UTC   <invalid>       etcd-ca                 no      
front-proxy-client         Sep 17, 2021 06:55 UTC   232d            front-proxy-ca          no      
scheduler.conf             Sep 17, 2021 06:56 UTC   232d                                    no   

So renew the certificate

kubeadm alpha certs renew etcd-healthcheck-client
kubeadm alpha certs renew etcd-peer
kubeadm alpha certs renew etcd-server

Restart service

systemctl daemon-reload
systemctl restart kubelet
systemctl restart docker

It works fine

rrana2208 commented 3 years ago

Hi, Please try to see if you have swap enable on the master and worker node. Please disable it and the restart service.

neelshah1617 commented 3 years ago

In my case, kubelet could not find the node because, /etc/hostname file got edited, which was being reflected with hostname, and the newer hostname kube-apiserver could not resolve. I had to correct the node hostname with hostnamectl set-hostname <correct-hostname-fqdn>. After that, I restarted the kubelet and docker services, and all the nodes were got into Ready state.

88plug commented 2 years ago

In my case, kubelet could not find the node because, /etc/hostname file got edited, which was being reflected with hostname, and the newer hostname kube-apiserver could not resolve. I had to correct the node hostname with hostnamectl set-hostname <correct-hostname-fqdn>. After that, I restarted the kubelet and docker services, and all the nodes were got into Ready state.

This works ! Had the problem after kubespray and was able to start node1 again with hostnamectl set-hostname node1

adarshvn commented 2 years ago

I have the similar issue, My set up have 2 master nodes 2 worker nodes and HA proxy. VM's got rebooted after that kubelet service is not able communicate with API server. Swapoff -a , restart of services and iptables stop was done to recover . Also the certificates not expired.

eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"cn-manager1\" not found" kubelet[4948]: E0505 02:15:16.075428 4948 kubelet.go:2422] "Error getting node" err="node \"cn-manager1\" not found"

HA Proxy service is up and running systemctl status haproxy ● haproxy.service - HAProxy Load Balancer Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2022-05-05 04:37:58 EDT; 35min ago Main PID: 10946 (haproxy-systemd) Tasks: 3 CGroup: /system.slice/haproxy.service ├─10946 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid ├─10948 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds └─10949 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

---------------------------------------------------------------------

apiserver frontend which proxys to the control plane nodes

---------------------------------------------------------------------

frontend Ingress bind 192.168.56.14:443 mode tcp option tcplog default_backend apiserver

---------------------------------------------------------------------

round robin balancing for apiserver

---------------------------------------------------------------------

backend apiserver option httpchk GET /healthz http-check expect status 200 mode tcp option ssl-hello-chk balance roundrobin server CN-manager1 192.168.56.10:6443 check server CN-manager2 192.168.56.11:6443 check

adarshvn commented 2 years ago

Have the same problem that @PierrickI3 does. After reboot the node control plane is down. kubelet service is running trying to connect to non-running apiserver. etcd is running. There are no CNI network interfaces, only loopback, ethernet, and docker.

There is no particular error seen anywhere, it just does not start, though worked fine for a long time before the reboot. Tried everything mention in this thread as well as in many others.

I am utterly confused on what starts what here to investigate the issue further. Does kubelet service start CNI network and then start the control plane (e.g. api, scheduler, etc.) pods? I have checked docker ps -a and all control plane containers has not been attempted to be started, as well, there is no restart policy on those. So why does kubelet tries to talk to api server when it has not started it?

Hi, Looks like similar issue we have , Do you have any RCA for your issue?

punitporwal07 commented 2 years ago

have same issue - kube-apiserver container is not stable and is exited due to following error -

F0508 21:29:01.915056 1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry [https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt true true 1000 0xc00011d8c0 <nil> 5m0s 1m0s}), err (context deadline exceeded)

after checking etcd container it is getting terminated due to following error -

2022-05-08 21:30:03.127369 N | pkg/osutil: received terminated signal, shutting down... WARNING: 2022/05/08 21:30:03 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"; Reconnecting to {127.0.0.1:2379 0 <nil>}

apreciate help on this ; ta

punitporwal07 commented 2 years ago

this is resolved now after I renewed my certificates for etcd which got expired.

harveyjing commented 3 months ago

Same issue. After I forcely shutdown my control node, all containers are Exited so api-server is not up. Block for more than two days.

image