PierrickLozach commented 6 years ago

Is this a request for help?

It is but I have searched StackOverflow and googled many times without finding the issue. Also, this seems to affect more people.

What keywords did you search in kubeadm issues before filing this one?

The error messages I see in journalctl

Is this a BUG REPORT or FEATURE REQUEST?

Bug report

Versions

kubeadm version: kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Kubernetes version: Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:53:20Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"} The server is also 1.11 but since it's not starting at the moment, kubectl version won't show it
Cloud provider or hardware configuration: Local hardware (self-hosted)

OS:


NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

- **Kernel**: `Linux kubernetes 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux`

## What happened?
Kubelet service does not start

## What you expected to happen?
Kubelet service should start

## How to reproduce it (as minimally and precisely as possible)?
* Used kubeadm to deploy kubernetes
* Deployed multiple services and could confirm that everything was working fine
* Rebooted
* Kubelet service no longer starts

## Anything else we need to know?

Journalctl logs:

Jul 27 14:46:17 kubernetes systemd[1]: Starting kubelet: The Kubernetes Node Agent... -- Subject: Unit kubelet.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

-- Unit kubelet.service has begun starting up. Jul 27 14:46:17 kubernetes kubelet[1619]: Flag --cgroup-driver n. Jul 27 14:46:17 kubernetes kubelet[1619]: Flag --cgroup-driver Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.608612 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.609679 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.613651 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.709720 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710299 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710322 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710457 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710515 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710600 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710617 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710751 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.710814 Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.711655 Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.711661 Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.711752 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.717242 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.717277 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.718726 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.718756 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.721656 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.721975 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.733083 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.733181 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.825381 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.839306 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.840955 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841036 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841423 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841448 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841462 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841479 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841710 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.841754 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.842653 Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.868316 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.872508 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.872925 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.873312 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.873703 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.874064 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.874452 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.874765 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.875097 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.875392 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.875679 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.876007 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.876289 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.876567 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.876913 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.877200 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.877503 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.877792 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.878118 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.878486 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.878912 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.879312 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.879802 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.880172 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.880491 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.880788 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.881112 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.881402 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.881710 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.882166 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.882509 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.882806 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.883115 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.883420 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.883704 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.884005 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.884329 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.884617 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.884907 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.885213 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.885466 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.885730 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.886098 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.886384 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.913789 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.917905 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.917923 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.917935 Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.926164 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.932356 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.941592 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.941762 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.944471 Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.944714 Jul 27 14:46:17 kubernetes kubelet[1619]: Starting Device Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.986308 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986668 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986680 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986749 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986755 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.144855 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.148528 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.148933 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.158503 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.300596 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.323729 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.516802 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.549067 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.552841 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.553299 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.674143 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.712440 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.713284 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.714397 Jul 27 14:46:19 kubernetes kubelet[1619]: W0727 14:46:19.139032 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.932356 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.941592 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.941762 Jul 27 14:46:17 kubernetes kubelet[1619]: I0727 14:46:17.944471 Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.944714 Jul 27 14:46:17 kubernetes kubelet[1619]: Starting Device Jul 27 14:46:17 kubernetes kubelet[1619]: E0727 14:46:17.986308 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986668 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986680 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986749 Jul 27 14:46:17 kubernetes kubelet[1619]: W0727 14:46:17.986755 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.144855 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.148528 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.148933 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.158503 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.300596 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.323729 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.516802 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.549067 Jul 27 14:46:18 kubernetes kubelet[1619]: I0727 14:46:18.552841 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.553299 Jul 27 14:46:18 kubernetes kubelet[1619]: W0727 14:46:18.674143 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.712440 Jul 27 14:46:18 kubernetes kubelet[1619]: E0727 14:46:18.713284 has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more informatio has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. 1619 server.go:408] Version: v1.11.1 1619 plugins.go:97] No cloud provider specified. 1619 certificate_store.go:131] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem". 1619 server.go:648] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to / 1619 container_manager_linux.go:243] container manager verified user specified cgroup-root exists: [] 1619 container_manager_linux.go:248] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity: Percentage:0.1} GracePeriod:0s MinReclaim:} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity: Percentage:0.05} GracePeriod:0s MinReclaim:} {Signal:imagefs.available Operator:LessThan Value:{Quantity: Percentage:0.15} GracePeriod:0s MinReclaim:} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true} 1619 container_manager_linux.go:267] Creating device plugin manager: true 1619 state_mem.go:36] [cpumanager] initializing new in-memory state store 1619 state_mem.go:84] [cpumanager] updated default cpuset: "" 1619 state_mem.go:92] [cpumanager] updated cpuset assignments: "map[]" 1619 kubelet.go:274] Adding pod path: /etc/kubernetes/manifests 1619 kubelet.go:299] Watching apiserver 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list v1.Service: Get https://192.168.1.19:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list v1.Node: Get https://192.168.1.19:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.1.19:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused 1619 client.go:75] Connecting to docker on unix:///var/run/docker.sock 1619 client.go:104] Start docker client with request timeout=2m0s 1619 docker_service.go:545] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth" 1619 docker_service.go:238] Hairpin mode set to "hairpin-veth" 1619 hostport_manager.go:68] The binary conntrack is not installed, this can cause failures in network connection cleanup. 1619 docker_service.go:253] Docker cri networking managed by cni 1619 docker_service.go:258] Docker Info: &{ID:V36L:ETJO:IECX:PJF4:G3GB:JHA6:LGCF:VQBJ:D2GY:PVFO:567O:545Y Containers:66 ContainersRunning:0 ContainersPaused:0 ContainersStopped:66 Images:21 Driver:overlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[] Log:[]} MemoryLimit:true SwapLimit:true KernelMemory:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:15 OomKillDisable:true NGoroutines:22 SystemTime:2018-07-27T14:46:17.727178862+02:00 LoggingDriver:journald CgroupDriver:systemd NEventsListener:0 KernelVersion:3.10.0-862.9.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc420ebd110 NCPU:12 MemTotal:33386934272 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:kubernetes Labels:[] ExperimentalBuild:false ServerVersion:1.13.1 ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[]} docker-runc:{Path:/usr/libexec/docker/docker-runc-current Args:[]}] DefaultRuntime:docker-runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:0xc421016140} LiveRestoreEnabled:false Isolation: InitBinary:/usr/libexec/docker/docker-init-current ContainerdCommit:{ID: Expected:aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1} RuncCommit:{ID:5eda6f6fd0c2884c2c8e78a6e7119e8d0ecedb77 Expected:9df8b306d01f59d3a8029be411de015b7304dd8f} InitCommit:{ID:fec3683b971d9c3ef73f284f176672c44b448662 Expected:949e6facb77383876aeff8a6944dde66b3089574} SecurityOptions:[name=seccomp,profile=/etc/docker/seccomp.json name=selinux]} 1619 docker_service.go:271] Setting cgroupDriver to systemd 1619 kuberuntime_manager.go:186] Container runtime docker initialized, version: 1.13.1, apiVersion: 1.26.0 1619 csi_plugin.go:111] kubernetes.io/csi: plugin initializing... 1619 server.go:129] Starting to listen on 0.0.0.0:10250 1619 server.go:986] Started kubelet 1619 fs_resource_analyzer.go:66] Starting FS ResourceAnalyzer 1619 status_manager.go:152] Starting to sync pod status with apiserver 1619 kubelet.go:1758] Starting kubelet main sync loop. 1619 kubelet.go:1775] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s] 1619 volume_manager.go:247] Starting Kubelet Volume Manager 1619 desired_state_of_world_populator.go:130] Desired state populator starts to run 1619 server.go:302] Adding debug handlers to kubelet server. 1619 kubelet.go:1261] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data for container / 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-hostnamed.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-journal-flush.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-logind.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-remount-fs.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/rsyslog.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-readahead-collect.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/kmod-static-nodes.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/irqbalance.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/rhel-dmesg.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-readahead-replay.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/NetworkManager.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/auditd.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-udev-trigger.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/kubelet.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/network.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/system-getty.slice": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-journald.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-user-sessions.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/polkit.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/rhel-domainname.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/lvm2-monitor.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/tuned.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/dbus.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/docker.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-udevd.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/kdump.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/rhel-import-state.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-random-seed.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-tmpfiles-setup-dev.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-tmpfiles-setup.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/rhel-readonly.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/NetworkManager-dispatcher.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/NetworkManager-wait-online.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/crond.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/system-selinux\x2dpolicy\x2dmigrate\x2dlocal\x2dchanges.slice": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-sysctl.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/k8s-self-hosted-recover.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/lvm2-lvmetad.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/user.slice": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/sshd.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-update-utmp.service": none of the resources are being tracked. 1619 container.go:393] Failed to create summary reader for "/system.slice/systemd-vconsole-setup.service": none of the resources are being tracked. 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach 1619 cpu_manager.go:155] [cpumanager] starting with none policy 1619 cpu_manager.go:156] [cpumanager] reconciling every 10s 1619 policy_none.go:42] [cpumanager] none policy: Start 1619 event.go:212] Unable to write event: 'Post https://192.168.1.19:6443/api/v1/namespaces/default/events: dial tcp 192.168.1.19:6443: connect: connection refused' (may retry after sleeping) 1619 container.go:393] Failed to create summary reader for "/libcontainer_1619_systemd_test_default.slice": none of the resources are being tracked. 1619 kubelet.go:1775] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s] 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach 1619 kubelet_node_status.go:79] Attempting to register node kubernetes 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused Plugin manager 1619 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "kubernetes" not found 1619 container_manager_linux.go:792] CPUAccounting not enabled for pid: 998 1619 container_manager_linux.go:795] MemoryAccounting not enabled for pid: 998 1619 container_manager_linux.go:792] CPUAccounting not enabled for pid: 1619 1619 container_manager_linux.go:795] MemoryAccounting not enabled for pid: 1619 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach 1619 kubelet_node_status.go:79] Attempting to register node kubernetes 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused 1619 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "rook-ceph-mon0-4txgr_rook-ceph": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "5b910771d1fd895b3b8d2feabdeb564cc57b213ae712416bdffec4a414dc4747" 1619 pod_container_deletor.go:75] Container "5b910771d1fd895b3b8d2feabdeb564cc57b213ae712416bdffec4a414dc4747" not found in pod's containers 1619 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "rook-ceph-osd-id-0-54d59fc64b-c5tw4_rook-ceph": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "a73305551840113b16cedd206109a837f57c6c3b2c8b1864ed5afab8b40b186d" 1619 pod_container_deletor.go:75] Container "a73305551840113b16cedd206109a837f57c6c3b2c8b1864ed5afab8b40b186d" not found in pod's containers 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach 1619 kubelet_node_status.go:79] Attempting to register node kubernetes 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused 1619 pod_container_deletor.go:75] Container "96b85439f089170cf6161f5410f8970de67f0609d469105dff4e3d5ec2d10351" not found in pod's containers 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list v1.Service: Get https://192.168.1.19:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list v1.Node: Get https://192.168.1.19:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list v1.Pod: Get https://192.168.1.19:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused 1619 pod_container_deletor.go:75] Container "7b9757b85bc8ee4ce6ac954acf0bcd5c06b2ceb815aee802a8f53f9de18d967f" not found in pod's containers 1619 container.go:393] Failed to create summary reader for "/libcontainer_1619_systemd_test_default.slice": none of the resources are being tracked. 1619 kubelet.go:1775] skipping pod synchronization - [container runtime is down PLEG is not healthy: pleg was last seen active 2562047h47m16.854775807s ago; threshold is 3m0s] 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach 1619 kubelet_node_status.go:79] Attempting to register node kubernetes 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused Plugin manager 1619 eviction_manager.go:243] eviction manager: failed to get get summary stats: failed to get node info: node "kubernetes" not found 1619 container_manager_linux.go:792] CPUAccounting not enabled for pid: 998 1619 container_manager_linux.go:795] MemoryAccounting not enabled for pid: 998 1619 container_manager_linux.go:792] CPUAccounting not enabled for pid: 1619 1619 container_manager_linux.go:795] MemoryAccounting not enabled for pid: 1619 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach 1619 kubelet_node_status.go:79] Attempting to register node kubernetes 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused 1619 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "rook-ceph-mon0-4txgr_rook-ceph": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "5b910771d1fd895b3b8d2feabdeb564cc57b213ae712416bdffec4a414dc4747" 1619 pod_container_deletor.go:75] Container "5b910771d1fd895b3b8d2feabdeb564cc57b213ae712416bdffec4a414dc4747" not found in pod's containers 1619 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "rook-ceph-osd-id-0-54d59fc64b-c5tw4_rook-ceph": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "a73305551840113b16cedd206109a837f57c6c3b2c8b1864ed5afab8b40b186d" 1619 pod_container_deletor.go:75] Container "a73305551840113b16cedd206109a837f57c6c3b2c8b1864ed5afab8b40b186d" not found in pod's containers 1619 kubelet_node_status.go:269] Setting node annotation to enable volume controller attach/detach 1619 kubelet_node_status.go:79] Attempting to register node kubernetes 1619 kubelet_node_status.go:103] Unable to register node "kubernetes" with API server: Post https://192.168.1.19:6443/api/v1/nodes: dial tcp 192.168.1.19:6443: connect: connection refused 1619 pod_container_deletor.go:75] Container "96b85439f089170cf6161f5410f8970de67f0609d469105dff4e3d5ec2d10351" not found in pod's containers 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:455: Failed to list v1.Service: Get https://192.168.1.19:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused 1619 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:464: Failed to list *v1.Node: Get https://192.168.1.19:6443/api/v1/nodes?fieldSelector=metadata.name%3Dkubernetes&limit=500&resourceVersion=0: dial tcp 192.168.1.19:6443: connect: connection refused


And it goes on and one about not being able to register `kubernetes` (that's my host name) and failing to list kubernetes resources.

From the start, I applied the self-hosted-recover script (https://github.com/xetys/k8s-self-hosted-recovery) to not be affected by a reboot. Here are the logs:

Jul 27 14:46:09 kubernetes systemd[1]: Starting Recovers self-hosted k8s after reboot... Jul 27 14:46:09 kubernetes k8s-self-hosted-recover[1001]: [k8s-self-hosted-recover] Restoring old plane... Jul 27 14:46:12 kubernetes k8s-self-hosted-recover[1001]: [controlplane] wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml" Jul 27 14:46:12 kubernetes k8s-self-hosted-recover[1001]: [controlplane] wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml" Jul 27 14:46:12 kubernetes k8s-self-hosted-recover[1001]: [controlplane] wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml" Jul 27 14:46:17 kubernetes k8s-self-hosted-recover[1001]: [k8s-self-hosted-recover] Waiting while the api server is back..



I am running out of ideas and would welcome any help you can bring.

neolit123 commented 5 years ago

@cjbottaro could it be that your kubelet client certificates have expired?

see the second warning here: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration

On nodes created with kubeadm init, prior to kubeadm version 1.17...

DanielIvaylov commented 5 years ago

same problem running Ubuntu 16 on VMWare.

I am running cluster in vmware too, what did resolve your problem? Thanks

voodoonofx commented 4 years ago

Had the same issues today after editing the service-cidr settings on my new kube cluster. The issue for me was the kube-apiserver docker container was flapping. After looking at the logs using docker logs , I saw: Error: error determining service IP ranges for primary service cidr: The service cluster IP range must be at least 8 IP addresses. I had though I could provision a smaller cidr of /30 to services, but needed to open it to a /29.

neolit123 commented 4 years ago

not sure if related but note that 1.17.0 has CIDR related bugs, so hopefully you are running a more recent patch of .17.

fishhead2zju commented 4 years ago

u may need to change the rules of iptables. run CMD:iptables -L --line-numbers to find the reject-with icmp-host-prohibited then，iptables -D INPUT 153 ，delete it #153 means line-number at last，restart kubelet

Faceless28 commented 4 years ago

I have a simmilar promblem using kubeadm cluster. I've just use 2x times docker restart $(docker ps -qa)

ChoppinBlockParty commented 4 years ago

Have the same problem that @PierrickI3 does. After reboot the node control plane is down. kubelet service is running trying to connect to non-running apiserver. etcd is running. There are no CNI network interfaces, only loopback, ethernet, and docker.

There is no particular error seen anywhere, it just does not start, though worked fine for a long time before the reboot. Tried everything mention in this thread as well as in many others.

I am utterly confused on what starts what here to investigate the issue further. Does kubelet service start CNI network and then start the control plane (e.g. api, scheduler, etc.) pods? I have checked docker ps -a and all control plane containers has not been attempted to be started, as well, there is no restart policy on those. So why does kubelet tries to talk to api server when it has not started it?

kandyp commented 4 years ago

Had similar issue, its not resolved for me yet but the issue occurred due to docker getting upgraded to an incompatible version. Just check of your docker service is running or not.

cr6588 commented 3 years ago

@neolit123 '@cjbottaro' could it be that your kubelet client certificates have expired?

see the second warning here: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#check-certificate-expiration

On nodes created with kubeadm init, prior to kubeadm version 1.17...

Thanks.My machine has been unable to connect after restarting. After checking the certificate, it was found that 3 certificates had expired.

[root@localhost home]# kubeadm alpha certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

W0128 14:02:50.815166   21689 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Sep 17, 2021 06:56 UTC   232d                                    no      
apiserver                  Sep 17, 2021 06:55 UTC   232d            ca                      no      
apiserver-etcd-client      Sep 17, 2021 06:55 UTC   232d            etcd-ca                 no      
apiserver-kubelet-client   Sep 17, 2021 06:55 UTC   232d            ca                      no      
controller-manager.conf    Sep 17, 2021 06:56 UTC   232d                                    no      
etcd-healthcheck-client    Dec 19, 2020 07:56 UTC   <invalid>       etcd-ca                 no      
etcd-peer                  Dec 19, 2020 07:56 UTC   <invalid>       etcd-ca                 no      
etcd-server                Dec 19, 2020 07:56 UTC   <invalid>       etcd-ca                 no      
front-proxy-client         Sep 17, 2021 06:55 UTC   232d            front-proxy-ca          no      
scheduler.conf             Sep 17, 2021 06:56 UTC   232d                                    no

So renew the certificate

kubeadm alpha certs renew etcd-healthcheck-client
kubeadm alpha certs renew etcd-peer
kubeadm alpha certs renew etcd-server

Restart service

systemctl daemon-reload
systemctl restart kubelet
systemctl restart docker

It works fine

rrana2208 commented 3 years ago

Hi, Please try to see if you have swap enable on the master and worker node. Please disable it and the restart service.

neelshah1617 commented 3 years ago

In my case, kubelet could not find the node because, /etc/hostname file got edited, which was being reflected with hostname, and the newer hostname kube-apiserver could not resolve. I had to correct the node hostname with hostnamectl set-hostname <correct-hostname-fqdn>. After that, I restarted the kubelet and docker services, and all the nodes were got into Ready state.

88plug commented 2 years ago

In my case, kubelet could not find the node because, /etc/hostname file got edited, which was being reflected with hostname, and the newer hostname kube-apiserver could not resolve. I had to correct the node hostname with hostnamectl set-hostname <correct-hostname-fqdn>. After that, I restarted the kubelet and docker services, and all the nodes were got into Ready state.

This works ! Had the problem after kubespray and was able to start node1 again with hostnamectl set-hostname node1

adarshvn commented 2 years ago

I have the similar issue, My set up have 2 master nodes 2 worker nodes and HA proxy. VM's got rebooted after that kubelet service is not able communicate with API server. Swapoff -a , restart of services and iptables stop was done to recover . Also the certificates not expired.

eviction_manager.go:254] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"cn-manager1\" not found" kubelet[4948]: E0505 02:15:16.075428 4948 kubelet.go:2422] "Error getting node" err="node \"cn-manager1\" not found"

HA Proxy service is up and running systemctl status haproxy ● haproxy.service - HAProxy Load Balancer Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2022-05-05 04:37:58 EDT; 35min ago Main PID: 10946 (haproxy-systemd) Tasks: 3 CGroup: /system.slice/haproxy.service ├─10946 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid ├─10948 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds └─10949 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

---------------------------------------------------------------------

apiserver frontend which proxys to the control plane nodes

---------------------------------------------------------------------

frontend Ingress bind 192.168.56.14:443 mode tcp option tcplog default_backend apiserver

---------------------------------------------------------------------

round robin balancing for apiserver

---------------------------------------------------------------------

backend apiserver option httpchk GET /healthz http-check expect status 200 mode tcp option ssl-hello-chk balance roundrobin server CN-manager1 192.168.56.10:6443 check server CN-manager2 192.168.56.11:6443 check

adarshvn commented 2 years ago

Have the same problem that @PierrickI3 does. After reboot the node control plane is down. kubelet service is running trying to connect to non-running apiserver. etcd is running. There are no CNI network interfaces, only loopback, ethernet, and docker.

There is no particular error seen anywhere, it just does not start, though worked fine for a long time before the reboot. Tried everything mention in this thread as well as in many others.

I am utterly confused on what starts what here to investigate the issue further. Does kubelet service start CNI network and then start the control plane (e.g. api, scheduler, etc.) pods? I have checked docker ps -a and all control plane containers has not been attempted to be started, as well, there is no restart policy on those. So why does kubelet tries to talk to api server when it has not started it?

Hi, Looks like similar issue we have , Do you have any RCA for your issue?

punitporwal07 commented 2 years ago

have same issue - kube-apiserver container is not stable and is exited due to following error -

F0508 21:29:01.915056 1 storage_decorator.go:57] Unable to create storage backend: config (&{ /registry [https://127.0.0.1:2379] /etc/kubernetes/pki/apiserver-etcd-client.key /etc/kubernetes/pki/apiserver-etcd-client.crt /etc/kubernetes/pki/etcd/ca.crt true true 1000 0xc00011d8c0 <nil> 5m0s 1m0s}), err (context deadline exceeded)

after checking etcd container it is getting terminated due to following error -

2022-05-08 21:30:03.127369 N | pkg/osutil: received terminated signal, shutting down... WARNING: 2022/05/08 21:30:03 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"; Reconnecting to {127.0.0.1:2379 0 <nil>}

apreciate help on this ; ta

punitporwal07 commented 2 years ago

this is resolved now after I renewed my certificates for etcd which got expired.

harveyjing commented 4 months ago

Same issue. After I forcely shutdown my control node, all containers are Exited so api-server is not up. Block for more than two days.

kubernetes / kubeadm

kubelet won't restart after reboot - Unable to register node with API server: connection refused #1026

Is this a request for help?

What keywords did you search in kubeadm issues before filing this one?

Is this a BUG REPORT or FEATURE REQUEST?

Versions

Jul 27 14:46:17 kubernetes systemd[1]: Starting kubelet: The Kubernetes Node Agent... -- Subject: Unit kubelet.service has begun start-up -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

---------------------------------------------------------------------

apiserver frontend which proxys to the control plane nodes

---------------------------------------------------------------------

---------------------------------------------------------------------

round robin balancing for apiserver

---------------------------------------------------------------------