Closed GuillaumeDorschner closed 1 year ago
Hello @jiaqiluo, could you help me? I'm currently stuck.
I got the same issue. With internet, rke running without error Without internet, I have full list docker images but always stuck at job rke-network-plugin
Hi @GuillaumeDorschner, if I understand your needs correctly, you need to configure a private registry for RKE to work in air gap env. See RKE docs
Regarding to the rke-network-plugin-deploy-job
error. When RKE returns the error, there should already be a kubeconfig.yaml file in your working directory. Can you use it with kubectl to check the error message on the job or the pod?
If there is no kubeconfig file or it does not work, can you try to ssh into the control plane node to check the corresponding containers and their status/error? (or it may be on the worker nodes. it will be easier if you can make a new cluster with only one node and all roles)
cc @beanbao22
@jiaqiluo I run only on one node with all roles. I have private registry defined on cluster.yml. But it alway stuck at job rke-network-plugin
I download list image from command **rke config --system-images**
.
If I try with internet, it work with any issue
Update: I check logs from pod rke-network-plugin-deploy-job-xxxx
unable to ensure pod container exists: failed to create container for [kubepods besteffort pod7bda81ff-7df8-4b75-b22c-7cf662c272bd] : unable to start unit "kubepods-besteffort-pod7bda81ff_7df8_4b75_b22c_7cf662c272bd.slice" (properties [{Name:Description Value:"libcontainer container kubepods-besteffort-pod7bda81ff_7df8_4b75_b22c_7cf662c272bd.slice"} {Name:Wants Value:["kubepods-besteffort.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Unit kubepods-besteffort.slice not found.
@beanbao22, it looks like you ran into https://github.com/rancher/rke/issues/3160 which mentions the Unit kubepods-besteffort.slice not found
error.
@beanbao22, it looks like you ran into #3160 which mentions the
Unit kubepods-besteffort.slice not found
error.
I think it's not real issue, cause with internet ready, I install without issue. The same OS, docker, rke version, images....
@jiaqiluo I have the kube_config_cluster.yml and I looked in the cluster.
Cluster Information
➜ rke-setup kubectl get nodes
NAME STATUS ROLES AGE VERSION
master NotReady controlplane,etcd 18m v1.26.6
worker1 NotReady worker 18m v1.26.6
➜ rke-setup kubectl get componentstatuses
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
➜ rke-setup kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
rke-network-plugin-deploy-job-b7f6n 0/1 Error 0 13m
rke-network-plugin-deploy-job-bwcxg 0/1 Error 0 7m49s
rke-network-plugin-deploy-job-gp7vq 0/1 Error 0 105s
rke-network-plugin-deploy-job-htz4h 0/1 Error 0 18m
rke-network-plugin-deploy-job-l4f9b 0/1 Error 0 18m
rke-network-plugin-deploy-job-pc5dj 0/1 Error 0 18m
rke-network-plugin-deploy-job-rtpdp 0/1 Error 0 18m
rke-network-plugin-deploy-job-sxpqj 0/1 Error 0 15m
rke-network-plugin-deploy-job-xx28t 0/1 Error 0 17m
Describe
➜ rke-setup kubectl describe pod rke-network-plugin-deploy-job-b7f6n -n kube-system
Name: rke-network-plugin-deploy-job-b7f6n
Namespace: kube-system
Priority: 0
Service Account: rke-job-deployer
Node: master/
Start Time: Wed, 30 Aug 2023 09:54:24 +0200
Labels: controller-uid=b675948d-42a8-4a29-a838-3055a3b57b63
job-name=rke-network-plugin-deploy-job
Annotations: <none>
Status: Failed
IP:
IPs: <none>
Controlled By: Job/rke-network-plugin-deploy-job
Containers:
rke-network-plugin-pod:
Container ID: docker://3832335a3e9ea74ebe08fce60205fa7e903ef62fec017ae7afd97c3daa4ff5e0
Image: repo.labo.bi:8082/rancher/hyperkube:v1.26.6-rancher1
Image ID: docker-pullable://rancher/hyperkube@sha256:45e9e5a04b65afa3a291bbc458f0076cfb1d725bd550d7c71b9dbaa25387091e
Port: <none>
Host Port: <none>
Command:
kubectl
apply
-f
/etc/config/rke-network-plugin.yaml
State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 30 Aug 2023 09:54:25 +0200
Finished: Wed, 30 Aug 2023 09:54:25 +0200
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q9rn7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rke-network-plugin
Optional: false
kube-api-access-q9rn7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 11m kubelet Container image "repo.labo.bi:8082/rancher/hyperkube:v1.26.6-rancher1" already present on machine
Normal Created 11m kubelet Created container rke-network-plugin-pod
Normal Started 11m kubelet Started container rke-network-plugin-pod
Logs of all the pods
➜ rke-setup kubectl logs rke-network-plugin-deploy-job-b7f6n -n kube-system
kubectl logs rke-network-plugin-deploy-job-bwcxg -n kube-system
kubectl logs rke-network-plugin-deploy-job-gp7vq -n kube-system
kubectl logs rke-network-plugin-deploy-job-htz4h -n kube-system
kubectl logs rke-network-plugin-deploy-job-l4f9b -n kube-system
kubectl logs rke-network-plugin-deploy-job-pc5dj -n kube-system
kubectl logs rke-network-plugin-deploy-job-px6vm -n kube-system
kubectl logs rke-network-plugin-deploy-job-rtpdp -n kube-system
kubectl logs rke-network-plugin-deploy-job-sxpqj -n kube-system
kubectl logs rke-network-plugin-deploy-job-xx28t -n kube-system
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
@jiaqiluo, I've tried running RKE again, but this time with the eth interface enabled for internet access. I didn't encounter any errors this time, as you can see from the attached screenshot. However, this setup is only good for testing. My final cluster won't have internet access, so I'm at an impasse.
@GuillaumeDorschner
The error Error from server: no preferred addresses found; known addresses: []
is returned by K8s when it tries to allocate the pod to the specific node ( master
node in your case), but it can't get any adders ( external address
in your case).
This can be proved by 1) the master
node is NotReady and 2) the value Node: master/
in the output of kubectl describe pod rke-network-plugin-deploy-job-b7f6n -n kube-system
, there supposed to be the address after the dash (/
).
Now, the problem becomes why nodes are in the NotReady status.
There could be a variety of reasons, and you can try the following to diagnose it:
k describe node master
- look in the Conditions section for any failed k8s components. rke -d up cluster.yaml
- Again, we are looking for any error or warning that might tell us why the node is not readyk get pods -A
- any failed pod besides the rke-network-plugin-xxx
one? @jiaqiluo
Okay, I don't see anything suspicious in the logs or in the output of describe node master.
Logs:
➜ rke-setup ./rke -d up cluster.yml
DEBU[0000] Loglevel set to [debug]
INFO[0000] Running RKE version: v1.4.7
DEBU[0000] audit log policy found in cluster.yml
INFO[0000] Initiating Kubernetes cluster
DEBU[0000] Loading data.json from local source
DEBU[0000] data.json SHA256 checksum: 5f13312d74be24e7121c58dc299f16a728dac04d4635644e7655db30bdc76f68
DEBU[0000] No DNS provider configured, setting default based on cluster version [1.26.6-rancher1-1]
DEBU[0000] DNS provider set to [coredns]
DEBU[0000] Checking if cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0000] Cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0000] Enabling kube-api audit log for cluster version [v1.26.6-rancher1-1]
DEBU[0000] No input provided for maxUnavailableWorker, setting it to default value of 10 percent
DEBU[0000] No input provided for maxUnavailableControlplane, setting it to default value of 1
DEBU[0000] Checking ingress default backend for cluster version [v1.26.6-rancher1-1]
DEBU[0000] Cluster version [v1.26.6-rancher1-1] needs to have ingress default backend disabled
DEBU[0000] Host: 192.168.137.65 has role: controlplane
DEBU[0000] Host: 192.168.137.65 has role: etcd
DEBU[0000] Host: 192.168.137.62 has role: worker
DEBU[0000] Checking cri-dockerd for cluster version [v1.26.6-rancher1-1]
DEBU[0000] cri-dockerd is enabled for cluster version [v1.26.6-rancher1-1]
DEBU[0000] Checking PodSecurityPolicy for cluster version [v1.26.6-rancher1-1]
DEBU[0000] Checking PodSecurity for cluster version [v1.26.6-rancher1-1]
DEBU[0000] [state] previous state not found, possible legacy cluster
INFO[0000] [dialer] Setup tunnel for host [192.168.137.62]
INFO[0000] [dialer] Setup tunnel for host [192.168.137.65]
DEBU[0000] Connecting to Docker API for host [192.168.137.62]
DEBU[0000] Connecting to Docker API for host [192.168.137.65]
DEBU[0000] Docker Info found for host [192.168.137.65]: types.Info{ID:"d03eca6f-6468-4370-9f1b-008bdb123e0a", Containers:19, ContainersRunning:2, ContainersPaused:0, ContainersStopped:17, Images:28, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "xfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Using metacopy", "false"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:37, OomKillDisable:true, NGoroutines:43, SystemTime:"2023-08-30T23:30:12.206004187+02:00", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"4.18.0-372.9.1.el8.x86_64", OperatingSystem:"AlmaLinux 8.6 (Sky Tiger)", OSVersion:"8.6", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc0001b4150), NCPU:32, MemTotal:269458673664, GenericResources:[]swarm.GenericResource(nil), DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"localhost.localdomain", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"23.0.6", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"8165feabfdfe38c65b599c4993d227328c231fca", Expected:"8165feabfdfe38c65b599c4993d227328c231fca"}, RuncCommit:types.Commit{ID:"v1.1.8-0-g82f18fe", Expected:"v1.1.8-0-g82f18fe"}, InitCommit:types.Commit{ID:"de40ad0", Expected:"de40ad0"}, SecurityOptions:[]string{"name=seccomp,profile=builtin"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}
DEBU[0000] Docker Info found for host [192.168.137.62]: types.Info{ID:"03725a2b-a240-48ad-9a37-fc4b724cc262", Containers:37, ContainersRunning:20, ContainersPaused:0, ContainersStopped:17, Images:32, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "xfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Using metacopy", "false"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:135, OomKillDisable:true, NGoroutines:115, SystemTime:"2023-08-30T23:30:09.477999904+02:00", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"4.18.0-372.9.1.el8.x86_64", OperatingSystem:"AlmaLinux 8.6 (Sky Tiger)", OSVersion:"8.6", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc0001ac000), NCPU:64, MemTotal:269855334400, GenericResources:[]swarm.GenericResource(nil), DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"localhost.localdomain", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"23.0.6", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"8165feabfdfe38c65b599c4993d227328c231fca", Expected:"8165feabfdfe38c65b599c4993d227328c231fca"}, RuncCommit:types.Commit{ID:"v1.1.8-0-g82f18fe", Expected:"v1.1.8-0-g82f18fe"}, InitCommit:types.Commit{ID:"de40ad0", Expected:"de40ad0"}, SecurityOptions:[]string{"name=seccomp,profile=builtin"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}
DEBU[0000] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0000] Finding container [cluster-state-deployer] on host [192.168.137.65], try #1
DEBU[0000] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0000] Finding container [cluster-state-deployer] on host [192.168.137.62], try #1
INFO[0000] [certificates] Generating CA kubernetes certificates
INFO[0000] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates
INFO[0001] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
INFO[0001] [certificates] Generating Kubernetes API server certificates
INFO[0001] [certificates] Generating Service account token key
INFO[0001] [certificates] Generating Kube Controller certificates
INFO[0001] [certificates] Generating Kube Scheduler certificates
INFO[0002] [certificates] Generating Kube Proxy certificates
INFO[0002] [certificates] Generating Node certificate
INFO[0002] [certificates] Generating admin certificates and kubeconfig
INFO[0003] [certificates] Generating Kubernetes API server proxy client certificates
INFO[0003] [certificates] Generating kube-etcd-192-168-137-65 certificate and key
INFO[0004] Successfully Deployed state file at [./cluster.rkestate]
DEBU[0004] Checking if cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0004] Cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0004] Enabling kube-api audit log for cluster version [v1.26.6-rancher1-1]
DEBU[0004] Host: 192.168.137.65 has role: controlplane
DEBU[0004] Host: 192.168.137.65 has role: etcd
DEBU[0004] Host: 192.168.137.62 has role: worker
DEBU[0004] Checking cri-dockerd for cluster version [v1.26.6-rancher1-1]
DEBU[0004] cri-dockerd is enabled for cluster version [v1.26.6-rancher1-1]
DEBU[0004] Checking PodSecurityPolicy for cluster version [v1.26.6-rancher1-1]
DEBU[0004] Checking PodSecurity for cluster version [v1.26.6-rancher1-1]
INFO[0004] Building Kubernetes cluster
INFO[0004] [dialer] Setup tunnel for host [192.168.137.62]
INFO[0004] [dialer] Setup tunnel for host [192.168.137.65]
DEBU[0004] Connecting to Docker API for host [192.168.137.65]
DEBU[0004] Connecting to Docker API for host [192.168.137.62]
DEBU[0004] Docker Info found for host [192.168.137.65]: types.Info{ID:"d03eca6f-6468-4370-9f1b-008bdb123e0a", Containers:19, ContainersRunning:2, ContainersPaused:0, ContainersStopped:17, Images:28, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "xfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Using metacopy", "false"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:38, OomKillDisable:true, NGoroutines:44, SystemTime:"2023-08-30T23:30:16.056844444+02:00", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"4.18.0-372.9.1.el8.x86_64", OperatingSystem:"AlmaLinux 8.6 (Sky Tiger)", OSVersion:"8.6", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc0001ac000), NCPU:32, MemTotal:269458673664, GenericResources:[]swarm.GenericResource(nil), DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"localhost.localdomain", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"23.0.6", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"8165feabfdfe38c65b599c4993d227328c231fca", Expected:"8165feabfdfe38c65b599c4993d227328c231fca"}, RuncCommit:types.Commit{ID:"v1.1.8-0-g82f18fe", Expected:"v1.1.8-0-g82f18fe"}, InitCommit:types.Commit{ID:"de40ad0", Expected:"de40ad0"}, SecurityOptions:[]string{"name=seccomp,profile=builtin"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}
DEBU[0004] Docker Info found for host [192.168.137.62]: types.Info{ID:"03725a2b-a240-48ad-9a37-fc4b724cc262", Containers:37, ContainersRunning:20, ContainersPaused:0, ContainersStopped:17, Images:32, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "xfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Using metacopy", "false"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:136, OomKillDisable:true, NGoroutines:116, SystemTime:"2023-08-30T23:30:13.267904342+02:00", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"4.18.0-372.9.1.el8.x86_64", OperatingSystem:"AlmaLinux 8.6 (Sky Tiger)", OSVersion:"8.6", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc0001ac1c0), NCPU:64, MemTotal:269855334400, GenericResources:[]swarm.GenericResource(nil), DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"localhost.localdomain", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"23.0.6", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"8165feabfdfe38c65b599c4993d227328c231fca", Expected:"8165feabfdfe38c65b599c4993d227328c231fca"}, RuncCommit:types.Commit{ID:"v1.1.8-0-g82f18fe", Expected:"v1.1.8-0-g82f18fe"}, InitCommit:types.Commit{ID:"de40ad0", Expected:"de40ad0"}, SecurityOptions:[]string{"name=seccomp,profile=builtin"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}
INFO[0004] [network] Deploying port listener containers
DEBU[0004] [network] Starting deployListener [rke-etcd-port-listener] on host [192.168.137.65]
DEBU[0004] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0004] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0004] Starting container [rke-etcd-port-listener] on host [192.168.137.65], try #1
INFO[0004] [network] Successfully started [rke-etcd-port-listener] container on host [192.168.137.65]
DEBU[0004] [network] Starting deployListener [rke-cp-port-listener] on host [192.168.137.65]
DEBU[0004] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0004] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0004] Starting container [rke-cp-port-listener] on host [192.168.137.65], try #1
INFO[0005] [network] Successfully started [rke-cp-port-listener] container on host [192.168.137.65]
DEBU[0005] [network] Starting deployListener [rke-worker-port-listener] on host [192.168.137.62]
DEBU[0005] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0005] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0005] Starting container [rke-worker-port-listener] on host [192.168.137.62], try #1
INFO[0005] [network] Successfully started [rke-worker-port-listener] container on host [192.168.137.62]
INFO[0005] [network] Port listener containers deployed successfully
INFO[0005] [network] Running control plane -> etcd port checks
INFO[0005] [network] Checking if host [192.168.137.65] can connect to host(s) [192.168.137.65] on port(s) [2379], try #1
DEBU[0005] [remove/rke-port-checker] Checking if container is running on host [192.168.137.65]
DEBU[0005] [remove/rke-port-checker] Container doesn't exist on host [192.168.137.65]
DEBU[0005] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0005] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0006] Starting container [rke-port-checker] on host [192.168.137.65], try #1
INFO[0006] [network] Successfully started [rke-port-checker] container on host [192.168.137.65]
DEBU[0006] [network] containerLog [] on host: 192.168.137.65
INFO[0006] Removing container [rke-port-checker] on host [192.168.137.65], try #1
DEBU[0006] [network] Length of containerLog is [0] on host: 192.168.137.65
INFO[0006] [network] Running control plane -> worker port checks
INFO[0006] [network] Checking if host [192.168.137.65] can connect to host(s) [192.168.137.62] on port(s) [10250], try #1
DEBU[0006] [remove/rke-port-checker] Checking if container is running on host [192.168.137.65]
DEBU[0006] [remove/rke-port-checker] Container doesn't exist on host [192.168.137.65]
DEBU[0006] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0006] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0006] Starting container [rke-port-checker] on host [192.168.137.65], try #1
INFO[0006] [network] Successfully started [rke-port-checker] container on host [192.168.137.65]
DEBU[0006] [network] containerLog [] on host: 192.168.137.65
INFO[0006] Removing container [rke-port-checker] on host [192.168.137.65], try #1
DEBU[0006] [network] Length of containerLog is [0] on host: 192.168.137.65
INFO[0006] [network] Running workers -> control plane port checks
INFO[0006] [network] Checking if host [192.168.137.62] can connect to host(s) [192.168.137.65] on port(s) [6443], try #1
DEBU[0006] [remove/rke-port-checker] Checking if container is running on host [192.168.137.62]
DEBU[0006] [remove/rke-port-checker] Container doesn't exist on host [192.168.137.62]
DEBU[0006] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0006] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0006] Starting container [rke-port-checker] on host [192.168.137.62], try #1
INFO[0007] [network] Successfully started [rke-port-checker] container on host [192.168.137.62]
DEBU[0007] [network] containerLog [] on host: 192.168.137.62
INFO[0007] Removing container [rke-port-checker] on host [192.168.137.62], try #1
DEBU[0007] [network] Length of containerLog is [0] on host: 192.168.137.62
INFO[0007] [network] Checking KubeAPI port Control Plane hosts
DEBU[0007] [network] Checking KubeAPI port [6443] on host: 192.168.137.65
INFO[0007] [network] Removing port listener containers
DEBU[0007] [remove/rke-etcd-port-listener] Checking if container is running on host [192.168.137.65]
DEBU[0007] [remove/rke-etcd-port-listener] Removing container on host [192.168.137.65]
INFO[0007] Removing container [rke-etcd-port-listener] on host [192.168.137.65], try #1
INFO[0007] [remove/rke-etcd-port-listener] Successfully removed container on host [192.168.137.65]
DEBU[0007] [remove/rke-cp-port-listener] Checking if container is running on host [192.168.137.65]
DEBU[0007] [remove/rke-cp-port-listener] Removing container on host [192.168.137.65]
INFO[0007] Removing container [rke-cp-port-listener] on host [192.168.137.65], try #1
INFO[0007] [remove/rke-cp-port-listener] Successfully removed container on host [192.168.137.65]
DEBU[0007] [remove/rke-worker-port-listener] Checking if container is running on host [192.168.137.62]
DEBU[0007] [remove/rke-worker-port-listener] Removing container on host [192.168.137.62]
INFO[0007] Removing container [rke-worker-port-listener] on host [192.168.137.62], try #1
INFO[0008] [remove/rke-worker-port-listener] Successfully removed container on host [192.168.137.62]
INFO[0008] [network] Port listener containers removed successfully
DEBU[0008] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0008] [certificates] Deploying kubernetes certificates to Cluster nodes
INFO[0008] Finding container [cert-deployer] on host [192.168.137.62], try #1
INFO[0008] Finding container [cert-deployer] on host [192.168.137.65], try #1
DEBU[0008] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0008] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
DEBU[0008] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0008] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0008] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
DEBU[0008] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0008] Starting container [cert-deployer] on host [192.168.137.65], try #1
INFO[0008] Starting container [cert-deployer] on host [192.168.137.62], try #1
DEBU[0008] [certificates] Successfully started Certificate deployer container: cert-deployer
INFO[0008] Finding container [cert-deployer] on host [192.168.137.65], try #1
DEBU[0008] [certificates] Successfully started Certificate deployer container: cert-deployer
INFO[0008] Finding container [cert-deployer] on host [192.168.137.62], try #1
INFO[0013] Finding container [cert-deployer] on host [192.168.137.65], try #1
INFO[0013] Removing container [cert-deployer] on host [192.168.137.65], try #1
INFO[0013] Finding container [cert-deployer] on host [192.168.137.62], try #1
INFO[0013] Removing container [cert-deployer] on host [192.168.137.62], try #1
INFO[0013] [reconcile] Rebuilding and updating local kube config
DEBU[0013] [reconcile] Rebuilding and updating local kube config, creating new kubeconfig
DEBU[0013] Deploying admin Kubeconfig locally at [./kube_config_cluster.yml]
INFO[0013] Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]
DEBU[0013] [version] Using ./kube_config_cluster.yml to connect to Kubernetes cluster..
DEBU[0013] [version] Getting Kubernetes server version..
WARN[0013] [reconcile] host [192.168.137.65] is a control plane node without reachable Kubernetes API endpoint in the cluster
WARN[0013] [reconcile] no control plane node with reachable Kubernetes API endpoint in the cluster found
INFO[0013] [certificates] Successfully deployed kubernetes certificates to Cluster nodes
DEBU[0013] using the default EventRateLimit configuration
DEBU[0013] using the PodSecurity configuration [privileged]
INFO[0013] [file-deploy] Deploying file [/etc/kubernetes/admission.yaml] to node [192.168.137.65]
DEBU[0013] [remove/file-deployer] Checking if container is running on host [192.168.137.65]
DEBU[0013] [remove/file-deployer] Container doesn't exist on host [192.168.137.65]
DEBU[0013] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0013] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0013] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0013] Starting container [file-deployer] on host [192.168.137.65], try #1
INFO[0014] Successfully started [file-deployer] container on host [192.168.137.65]
INFO[0014] Waiting for [file-deployer] container to exit on host [192.168.137.65]
INFO[0014] Waiting for [file-deployer] container to exit on host [192.168.137.65]
INFO[0014] Container [file-deployer] is still running on host [192.168.137.65]: stderr: [], stdout: []
DEBU[0015] Exit code for [file-deployer] container on host [192.168.137.65] is [0]
DEBU[0015] [remove/file-deployer] Checking if container is running on host [192.168.137.65]
DEBU[0015] [remove/file-deployer] Removing container on host [192.168.137.65]
INFO[0015] Removing container [file-deployer] on host [192.168.137.65], try #1
INFO[0015] [remove/file-deployer] Successfully removed container on host [192.168.137.65]
DEBU[0015] [file-deploy] Successfully deployed file [/etc/kubernetes/admission.yaml] on node [192.168.137.65]
INFO[0015] [/etc/kubernetes/admission.yaml] Successfully deployed admission control config to Cluster control nodes
INFO[0015] [file-deploy] Deploying file [/etc/kubernetes/audit-policy.yaml] to node [192.168.137.65]
DEBU[0015] [remove/file-deployer] Checking if container is running on host [192.168.137.65]
DEBU[0015] [remove/file-deployer] Container doesn't exist on host [192.168.137.65]
DEBU[0015] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0015] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0015] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0015] Starting container [file-deployer] on host [192.168.137.65], try #1
INFO[0015] Successfully started [file-deployer] container on host [192.168.137.65]
INFO[0015] Waiting for [file-deployer] container to exit on host [192.168.137.65]
INFO[0015] Waiting for [file-deployer] container to exit on host [192.168.137.65]
INFO[0015] Container [file-deployer] is still running on host [192.168.137.65]: stderr: [], stdout: []
DEBU[0016] Exit code for [file-deployer] container on host [192.168.137.65] is [0]
DEBU[0016] [remove/file-deployer] Checking if container is running on host [192.168.137.65]
DEBU[0016] [remove/file-deployer] Removing container on host [192.168.137.65]
INFO[0016] Removing container [file-deployer] on host [192.168.137.65], try #1
INFO[0016] [remove/file-deployer] Successfully removed container on host [192.168.137.65]
DEBU[0016] [file-deploy] Successfully deployed file [/etc/kubernetes/audit-policy.yaml] on node [192.168.137.65]
INFO[0016] [/etc/kubernetes/audit-policy.yaml] Successfully deployed audit policy file to Cluster control nodes
INFO[0016] [reconcile] Reconciling cluster state
INFO[0016] [reconcile] This is newly generated cluster
DEBU[0016] Encryption is disabled in both current and new spec; no action is required
INFO[0016] Pre-pulling kubernetes images
DEBU[0016] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
DEBU[0016] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62], try #1
INFO[0016] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0016] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62]
INFO[0016] Kubernetes images pulled successfully
DEBU[0016] getDefaultKubernetesServicesOptions: getting serviceOptions for cluster version [v1.26.6-rancher1-1]
DEBU[0016] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0016] getDefaultKubernetesServicesOptions: serviceOptions found for cluster major version [v1.26]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0016] Extracted version [v3.5.6] from image [rancher/mirrored-coreos-etcd:v3.5.6]
DEBU[0016] etcd version [3.5.6] is higher than max version [3.4.3-rancher99] for advertising port 4001, not going to advertise port 4001
DEBU[0016] etcd version [3.5.6] is higher than max version [3.4.14-rancher99] for adding stricter TLS cipher suites, going to add stricter TLS cipher suites arguments to etcd
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] Version [3.5.6] is equal or higher than version [3.2.99]
INFO[0016] [etcd] Building up etcd plane..
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0016] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0016] Starting container [etcd-fix-perm] on host [192.168.137.65], try #1
INFO[0016] Successfully started [etcd-fix-perm] container on host [192.168.137.65]
INFO[0016] Waiting for [etcd-fix-perm] container to exit on host [192.168.137.65]
INFO[0016] Waiting for [etcd-fix-perm] container to exit on host [192.168.137.65]
INFO[0016] Container [etcd-fix-perm] is still running on host [192.168.137.65]: stderr: [], stdout: []
DEBU[0017] Exit code for [etcd-fix-perm] container on host [192.168.137.65] is [0]
DEBU[0017] [remove/etcd-fix-perm] Checking if container is running on host [192.168.137.65]
DEBU[0017] [remove/etcd-fix-perm] Removing container on host [192.168.137.65]
INFO[0017] Removing container [etcd-fix-perm] on host [192.168.137.65], try #1
INFO[0017] [remove/etcd-fix-perm] Successfully removed container on host [192.168.137.65]
DEBU[0017] Checking if image [rancher/mirrored-coreos-etcd:v3.5.6] exists on host [192.168.137.65], try #1
INFO[0017] Image [rancher/mirrored-coreos-etcd:v3.5.6] exists on host [192.168.137.65]
INFO[0018] Starting container [etcd] on host [192.168.137.65], try #1
INFO[0018] [etcd] Successfully started [etcd] container on host [192.168.137.65]
DEBU[0018] Extracted version [v0.1.89] from image [rancher/rke-tools:v0.1.89]
DEBU[0018] Extracted version [v0.1.89] from image [rancher/rke-tools:v0.1.89]
DEBU[0018] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0018] [etcd] Running rolling snapshot container [etcd-snapshot-once] on host [192.168.137.65]
DEBU[0018] [etcd] Using command [/opt/rke-tools/rke-etcd-backup etcd-backup save --cacert /etc/kubernetes/ssl/kube-ca.pem --cert /etc/kubernetes/ssl/kube-node.pem --key /etc/kubernetes/ssl/kube-node-key.pem --name etcd-rolling-snapshots --endpoints=192.168.137.65:2379 --retention=24h --creation=6h] for rolling snapshot container [etcd-rolling-snapshots] on host [192.168.137.65]
DEBU[0018] [remove/etcd-rolling-snapshots] Checking if container is running on host [192.168.137.65]
DEBU[0018] [remove/etcd-rolling-snapshots] Container doesn't exist on host [192.168.137.65]
DEBU[0018] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0018] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0018] Starting container [etcd-rolling-snapshots] on host [192.168.137.65], try #1
INFO[0018] [etcd] Successfully started [etcd-rolling-snapshots] container on host [192.168.137.65]
DEBU[0023] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0023] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0023] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0023] Starting container [rke-bundle-cert] on host [192.168.137.65], try #1
INFO[0023] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.137.65]
INFO[0023] Waiting for [rke-bundle-cert] container to exit on host [192.168.137.65]
INFO[0023] Container [rke-bundle-cert] is still running on host [192.168.137.65]: stderr: [], stdout: []
DEBU[0024] Exit code for [rke-bundle-cert] container on host [192.168.137.65] is [0]
INFO[0024] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.137.65]
INFO[0024] Removing container [rke-bundle-cert] on host [192.168.137.65], try #1
DEBU[0024] [etcd] Creating log link for Container [etcd-rolling-snapshots] on host [192.168.137.65]
DEBU[0024] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0024] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0024] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0024] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0025] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0025] [etcd] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0025] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0025] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0025] [etcd] Successfully created log link for Container [etcd-rolling-snapshots] on host [192.168.137.65]
DEBU[0025] [etcd] Creating log link for Container [etcd] on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0025] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0025] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0025] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0025] [etcd] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0025] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0026] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0026] [etcd] Successfully created log link for Container [etcd] on host [192.168.137.65]
INFO[0026] [etcd] Successfully started etcd plane.. Checking etcd cluster health
DEBU[0026] [etcd] check etcd cluster health on host [192.168.137.65]
INFO[0026] [etcd] etcd host [192.168.137.65] reported healthy=true
DEBU[0026] getDefaultKubernetesServicesOptions: getting serviceOptions for cluster version [v1.26.6-rancher1-1]
DEBU[0026] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0026] getDefaultKubernetesServicesOptions: serviceOptions found for cluster major version [v1.26]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0026] Extracted version [v3.5.6] from image [rancher/mirrored-coreos-etcd:v3.5.6]
DEBU[0026] etcd version [3.5.6] is higher than max version [3.4.3-rancher99] for advertising port 4001, not going to advertise port 4001
DEBU[0026] etcd version [3.5.6] is higher than max version [3.4.14-rancher99] for adding stricter TLS cipher suites, going to add stricter TLS cipher suites arguments to etcd
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] Version [3.5.6] is equal or higher than version [3.2.99]
INFO[0026] [controlplane] Building up Controller Plane..
INFO[0026] Finding container [service-sidekick] on host [192.168.137.65], try #1
DEBU[0026] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0026] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
DEBU[0026] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0026] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0026] Starting container [kube-apiserver] on host [192.168.137.65], try #1
INFO[0026] [controlplane] Successfully started [kube-apiserver] container on host [192.168.137.65]
INFO[0026] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.137.65]
DEBU[0028] [healthcheck] Service [kube-apiserver] is not healthy on host [192.168.137.65]. Response code: [403], response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"kube-apiserver\" cannot get path \"/healthz\"","reason":"Forbidden","details":{},"code":403}
, try #1
INFO[0033] [healthcheck] service [kube-apiserver] on host [192.168.137.65] is healthy
DEBU[0033] [controlplane] Creating log link for Container [kube-apiserver] on host [192.168.137.65]
DEBU[0033] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0033] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0033] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0033] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0033] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0034] [controlplane] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0034] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0034] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0034] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0034] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0034] [controlplane] Successfully created log link for Container [kube-apiserver] on host [192.168.137.65]
DEBU[0034] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0034] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0034] Starting container [kube-controller-manager] on host [192.168.137.65], try #1
INFO[0034] [controlplane] Successfully started [kube-controller-manager] container on host [192.168.137.65]
INFO[0034] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.137.65]
DEBU[0034] [healthcheck] Failed to check https://localhost:10257/healthz for service [kube-controller-manager] on host [192.168.137.65]: Get "https://localhost:10257/healthz": Unable to access the service on localhost:10257. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
INFO[0039] [healthcheck] service [kube-controller-manager] on host [192.168.137.65] is healthy
DEBU[0039] [controlplane] Creating log link for Container [kube-controller-manager] on host [192.168.137.65]
DEBU[0039] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0039] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0039] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0039] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0039] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0040] [controlplane] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0040] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0040] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0040] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0040] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0040] [controlplane] Successfully created log link for Container [kube-controller-manager] on host [192.168.137.65]
DEBU[0040] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0040] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0040] Starting container [kube-scheduler] on host [192.168.137.65], try #1
INFO[0040] [controlplane] Successfully started [kube-scheduler] container on host [192.168.137.65]
INFO[0040] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.137.65]
DEBU[0040] [healthcheck] Failed to check https://localhost:10259/healthz for service [kube-scheduler] on host [192.168.137.65]: Get "https://localhost:10259/healthz": Unable to access the service on localhost:10259. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
INFO[0045] [healthcheck] service [kube-scheduler] on host [192.168.137.65] is healthy
DEBU[0045] [controlplane] Creating log link for Container [kube-scheduler] on host [192.168.137.65]
DEBU[0045] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0045] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0045] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0045] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0045] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0046] [controlplane] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0046] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0046] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0046] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0046] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0046] [controlplane] Successfully created log link for Container [kube-scheduler] on host [192.168.137.65]
INFO[0046] [controlplane] Successfully started Controller Plane..
DEBU[0046] Checking if cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0046] Cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0046] Enabling kube-api audit log for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Host: 192.168.137.65 has role: controlplane
DEBU[0046] Host: 192.168.137.65 has role: etcd
DEBU[0046] Host: 192.168.137.62 has role: worker
DEBU[0046] Checking cri-dockerd for cluster version [v1.26.6-rancher1-1]
DEBU[0046] cri-dockerd is enabled for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Checking PodSecurityPolicy for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Checking PodSecurity for cluster version [v1.26.6-rancher1-1]
INFO[0046] [authz] Creating rke-job-deployer ServiceAccount
INFO[0046] [authz] rke-job-deployer ServiceAccount created successfully
INFO[0046] [authz] Creating system:node ClusterRoleBinding
INFO[0046] [authz] system:node ClusterRoleBinding created successfully
INFO[0046] [authz] Creating kube-apiserver proxy ClusterRole and ClusterRoleBinding
INFO[0046] [authz] kube-apiserver proxy ClusterRole and ClusterRoleBinding created successfully
INFO[0046] Successfully Deployed state file at [./cluster.rkestate]
INFO[0046] [state] Saving full cluster state to Kubernetes
INFO[0046] [state] Successfully Saved full cluster state to Kubernetes ConfigMap: full-cluster-state
DEBU[0046] getDefaultKubernetesServicesOptions: getting serviceOptions for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0046] getDefaultKubernetesServicesOptions: serviceOptions found for cluster major version [v1.26]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0046] Extracted version [v3.5.6] from image [rancher/mirrored-coreos-etcd:v3.5.6]
DEBU[0046] etcd version [3.5.6] is higher than max version [3.4.3-rancher99] for advertising port 4001, not going to advertise port 4001
DEBU[0046] etcd version [3.5.6] is higher than max version [3.4.14-rancher99] for adding stricter TLS cipher suites, going to add stricter TLS cipher suites arguments to etcd
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] Version [3.5.6] is equal or higher than version [3.2.99]
DEBU[0046] getDefaultKubernetesServicesOptions: getting serviceOptions for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0046] getDefaultKubernetesServicesOptions: serviceOptions found for cluster major version [v1.26]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0046] [worker] Building up Worker Plane..
INFO[0046] Finding container [service-sidekick] on host [192.168.137.65], try #1
DEBU[0046] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0046] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
DEBU[0046] [sidekick] Checking if container [service-sidekick] is eligible for upgrade on host [192.168.137.65]
DEBU[0046] [sidekick] Container [service-sidekick] is not eligible for upgrade on host [192.168.137.65]
INFO[0046] [sidekick] Sidekick container already created on host [192.168.137.65]
DEBU[0046] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0046] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0046] Starting container [kubelet] on host [192.168.137.65], try #1
INFO[0046] Starting container [nginx-proxy] on host [192.168.137.62], try #1
INFO[0046] [worker] Successfully started [kubelet] container on host [192.168.137.65]
INFO[0046] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.137.65]
DEBU[0046] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.65]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
INFO[0046] [worker] Successfully started [nginx-proxy] container on host [192.168.137.62]
DEBU[0046] [worker] Creating log link for Container [nginx-proxy] on host [192.168.137.62]
DEBU[0046] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0046] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.62]
DEBU[0046] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0046] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0046] Starting container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0047] [worker] Successfully started [rke-log-linker] container on host [192.168.137.62]
DEBU[0047] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0047] [remove/rke-log-linker] Removing container on host [192.168.137.62]
INFO[0047] Removing container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0047] [remove/rke-log-linker] Successfully removed container on host [192.168.137.62]
DEBU[0047] [worker] Successfully created log link for Container [nginx-proxy] on host [192.168.137.62]
INFO[0047] Finding container [service-sidekick] on host [192.168.137.62], try #1
DEBU[0047] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0047] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
DEBU[0047] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62], try #1
INFO[0047] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62]
INFO[0047] Starting container [kubelet] on host [192.168.137.62], try #1
INFO[0047] [worker] Successfully started [kubelet] container on host [192.168.137.62]
INFO[0047] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.137.62]
DEBU[0048] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.62]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
DEBU[0051] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.65]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #2
DEBU[0053] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.62]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #2
DEBU[0056] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.65]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #3
DEBU[0058] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.62]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #3
INFO[0062] [healthcheck] service [kubelet] on host [192.168.137.65] is healthy
DEBU[0062] [worker] Creating log link for Container [kubelet] on host [192.168.137.65]
DEBU[0062] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0062] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0062] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0062] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0062] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0062] [worker] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0062] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0062] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0062] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0062] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0062] [worker] Successfully created log link for Container [kubelet] on host [192.168.137.65]
DEBU[0062] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0062] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0062] Starting container [kube-proxy] on host [192.168.137.65], try #1
INFO[0062] [worker] Successfully started [kube-proxy] container on host [192.168.137.65]
INFO[0062] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.137.65]
DEBU[0062] [healthcheck] Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [192.168.137.65]: Get "http://localhost:10256/healthz": Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
INFO[0063] [healthcheck] service [kubelet] on host [192.168.137.62] is healthy
DEBU[0063] [worker] Creating log link for Container [kubelet] on host [192.168.137.62]
DEBU[0063] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0063] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.62]
DEBU[0063] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0063] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0063] Starting container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0064] [worker] Successfully started [rke-log-linker] container on host [192.168.137.62]
DEBU[0064] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0064] [remove/rke-log-linker] Removing container on host [192.168.137.62]
INFO[0064] Removing container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0064] [remove/rke-log-linker] Successfully removed container on host [192.168.137.62]
DEBU[0064] [worker] Successfully created log link for Container [kubelet] on host [192.168.137.62]
DEBU[0064] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62], try #1
INFO[0064] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62]
INFO[0064] Starting container [kube-proxy] on host [192.168.137.62], try #1
INFO[0064] [worker] Successfully started [kube-proxy] container on host [192.168.137.62]
INFO[0064] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.137.62]
INFO[0064] [healthcheck] service [kube-proxy] on host [192.168.137.62] is healthy
DEBU[0064] [worker] Creating log link for Container [kube-proxy] on host [192.168.137.62]
DEBU[0064] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0064] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.62]
DEBU[0064] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0064] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0064] Starting container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0065] [worker] Successfully started [rke-log-linker] container on host [192.168.137.62]
DEBU[0065] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0065] [remove/rke-log-linker] Removing container on host [192.168.137.62]
INFO[0065] Removing container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0065] [remove/rke-log-linker] Successfully removed container on host [192.168.137.62]
DEBU[0065] [worker] Successfully created log link for Container [kube-proxy] on host [192.168.137.62]
INFO[0068] [healthcheck] service [kube-proxy] on host [192.168.137.65] is healthy
DEBU[0068] [worker] Creating log link for Container [kube-proxy] on host [192.168.137.65]
DEBU[0068] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0068] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0068] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0068] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0068] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0068] [worker] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0068] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0068] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0068] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0068] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0068] [worker] Successfully created log link for Container [kube-proxy] on host [192.168.137.65]
INFO[0068] [worker] Successfully started Worker Plane..
DEBU[0068] [cleanup] Starting log link cleanup on host [192.168.137.62]
DEBU[0068] [remove/rke-log-cleaner] Checking if container is running on host [192.168.137.62]
DEBU[0068] [cleanup] Starting log link cleanup on host [192.168.137.65]
DEBU[0068] [remove/rke-log-cleaner] Checking if container is running on host [192.168.137.65]
DEBU[0068] [remove/rke-log-cleaner] Container doesn't exist on host [192.168.137.62]
DEBU[0068] [remove/rke-log-cleaner] Container doesn't exist on host [192.168.137.65]
DEBU[0068] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
DEBU[0068] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0068] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0068] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0068] Starting container [rke-log-cleaner] on host [192.168.137.65], try #1
INFO[0068] Starting container [rke-log-cleaner] on host [192.168.137.62], try #1
INFO[0069] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.137.65]
DEBU[0069] [remove/rke-log-cleaner] Checking if container is running on host [192.168.137.65]
DEBU[0069] [remove/rke-log-cleaner] Removing container on host [192.168.137.65]
INFO[0069] Removing container [rke-log-cleaner] on host [192.168.137.65], try #1
INFO[0069] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.137.62]
DEBU[0069] [remove/rke-log-cleaner] Checking if container is running on host [192.168.137.62]
DEBU[0069] [remove/rke-log-cleaner] Removing container on host [192.168.137.62]
INFO[0069] Removing container [rke-log-cleaner] on host [192.168.137.62], try #1
INFO[0069] [remove/rke-log-cleaner] Successfully removed container on host [192.168.137.65]
DEBU[0069] [cleanup] Successfully cleaned up log links on host [192.168.137.65]
INFO[0069] [remove/rke-log-cleaner] Successfully removed container on host [192.168.137.62]
DEBU[0069] [cleanup] Successfully cleaned up log links on host [192.168.137.62]
INFO[0069] [sync] Syncing nodes Labels and Taints
DEBU[0069] worker [9] starting sync for node [master]
DEBU[0069] Checking node list for node [master], try #1
DEBU[0069] worker [4] starting sync for node [worker1]
DEBU[0069] Checking node list for node [worker1], try #1
INFO[0069] [sync] Successfully synced nodes Labels and Taints
DEBU[0069] Checking if cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0069] Cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0069] Enabling kube-api audit log for cluster version [v1.26.6-rancher1-1]
DEBU[0069] Host: 192.168.137.65 has role: controlplane
DEBU[0069] Host: 192.168.137.65 has role: etcd
DEBU[0069] Host: 192.168.137.62 has role: worker
DEBU[0069] Checking cri-dockerd for cluster version [v1.26.6-rancher1-1]
DEBU[0069] cri-dockerd is enabled for cluster version [v1.26.6-rancher1-1]
DEBU[0069] Checking PodSecurityPolicy for cluster version [v1.26.6-rancher1-1]
DEBU[0069] Checking PodSecurity for cluster version [v1.26.6-rancher1-1]
INFO[0069] [network] Setting up network plugin: calico
INFO[0069] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0069] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0069] [addons] Executing deploy job rke-network-plugin
DEBU[0069] Checking node list for node [master], try #1
DEBU[0069] Checking addon job OS label for cluster version [v1.26.6-rancher1-1]
DEBU[0069] Cluster version [v1.26.6-rancher1-1] needs to use new OS label
DEBU[0069] [k8s] waiting for job rke-network-plugin-deploy-job to complete..
FATA[0114] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
Describe Master:
➜ rke-setup kubectl describe node master
Name: master
Roles: controlplane,etcd
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=master
kubernetes.io/os=linux
node-role.kubernetes.io/controlplane=true
node-role.kubernetes.io/etcd=true
Annotations: node.alpha.kubernetes.io/ttl: 0
rke.cattle.io/external-ip: 192.168.137.65
rke.cattle.io/internal-ip: 192.168.137.65
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Wed, 30 Aug 2023 23:31:10 +0200
Taints: node-role.kubernetes.io/etcd=true:NoExecute
node-role.kubernetes.io/controlplane=true:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: master
AcquireTime: <unset>
RenewTime: Wed, 30 Aug 2023 23:34:13 +0200
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 30 Aug 2023 23:31:20 +0200 Wed, 30 Aug 2023 23:31:10 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 30 Aug 2023 23:31:20 +0200 Wed, 30 Aug 2023 23:31:10 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 30 Aug 2023 23:31:20 +0200 Wed, 30 Aug 2023 23:31:10 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 30 Aug 2023 23:31:20 +0200 Wed, 30 Aug 2023 23:31:10 +0200 KubeletNotReady container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
Capacity:
cpu: 32
ephemeral-storage: 71645Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263143236Ki
pods: 110
Allocatable:
cpu: 32
ephemeral-storage: 71645Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263143236Ki
pods: 110
System Info:
Machine ID: 50ca20960ea94552bd5ef84a20ce7e47
System UUID: a1c62d2a-ceb1-11e6-b9b8-0894ef355672
Boot ID: a6ee322d-f8cf-478d-a150-f001906aabc4
Kernel Version: 4.18.0-372.9.1.el8.x86_64
OS Image: AlmaLinux 8.6 (Sky Tiger)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://23.0.6
Kubelet Version: v1.26.6
Kube-Proxy Version: v1.26.6
PodCIDR: 10.42.0.0/24
PodCIDRs: 10.42.0.0/24
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 81s kube-proxy
Normal Starting 85s kubelet Starting kubelet.
Normal NodeHasSufficientMemory 85s (x2 over 85s) kubelet Node master status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 85s (x2 over 85s) kubelet Node master status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 85s (x2 over 85s) kubelet Node master status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 85s kubelet Updated Node Allocatable limit across pods
Normal RegisteredNode 81s node-controller Node master event: Registered Node master in Controller
And finally:
➜ rke-setup kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system rke-network-plugin-deploy-job-d2fmz 0/1 Error 0 3m6s
kube-system rke-network-plugin-deploy-job-klfxp 0/1 Error 0 3m29s
kube-system rke-network-plugin-deploy-job-lbzwm 0/1 Error 0 2m23s
kube-system rke-network-plugin-deploy-job-ltnrk 0/1 Error 0 60s
kube-system rke-network-plugin-deploy-job-rsj56 0/1 Error 0 3m47s
kube-system rke-network-plugin-deploy-job-v2fpm 0/1 Error 0 3m42s
@GuillaumeDorschner, I also can not find any suspicious message that can explain the issue from the information you collected.
let's try this:
docker ps
- check if any container is in error state or keeps restarting docker logs
to check the logs of k8s core containers, like kube-proxy
, kubelet
,kube-scheduler
, kube-controller-manager
, kube-apiserver
- any suspicious message relevant to node not ready? @jiaqiluo
I have restarted the cluster but For the pass 5 min I don't see the pod restarting,
ps
[root@localhost ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5eff84fb67a0 rancher/hyperkube:v1.26.6-rancher1 "/opt/rke-tools/entr…" 5 minutes ago Up 5 minutes kube-proxy
0b17ad56e242 rancher/hyperkube:v1.26.6-rancher1 "/opt/rke-tools/entr…" 5 minutes ago Up 5 minutes kubelet
178289f286e0 rancher/hyperkube:v1.26.6-rancher1 "/opt/rke-tools/entr…" 5 minutes ago Up 5 minutes kube-scheduler
47a6939b9da7 rancher/hyperkube:v1.26.6-rancher1 "/opt/rke-tools/entr…" 6 minutes ago Up 6 minutes kube-controller-manager
154e2eb107be rancher/hyperkube:v1.26.6-rancher1 "/opt/rke-tools/entr…" 6 minutes ago Up 6 minutes kube-apiserver
1d911d9a6739 rancher/rke-tools:v0.1.89 "/docker-entrypoint.…" 6 minutes ago Up 6 minutes etcd-rolling-snapshots
1a202b775ea1 rancher/mirrored-coreos-etcd:v3.5.6 "/usr/local/bin/etcd…" 6 minutes ago Up 6 minutes etcd
the logs:
- kube-scheduler
I0830 22:08:14.206075 1 server.go:152] "Starting Kubernetes Scheduler" version="v1.26.6" I0830 22:08:14.206089 1 server.go:154] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" I0830 22:08:14.209292 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController I0830 22:08:14.209325 1 tlsconfig.go:200] "Loaded serving cert" certName="Generated self signed cert" certDetail="\"localhost@1693433293\" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer=\"localhost-ca@1693433293\" (2023-08-30 21:08:13 +0000 UTC to 2024-08-29 21:08:13 +0000 UTC (now=2023-08-30 22:08:14.209282731 +0000 UTC))" I0830 22:08:14.209332 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" I0830 22:08:14.209378 1 shared_informer.go:270] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0830 22:08:14.209378 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" I0830 22:08:14.209421 1 shared_informer.go:270] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0830 22:08:14.209330 1 shared_informer.go:270] Waiting for caches to sync for RequestHeaderAuthRequestController I0830 22:08:14.209583 1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1693433294\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1693433294\" (2023-08-30 21:08:13 +0000 UTC to 2024-08-29 21:08:13 +0000 UTC (now=2023-08-30 22:08:14.209540415 +0000 UTC))" I0830 22:08:14.209618 1 secure_serving.go:210] Serving securely on [::]:10259 I0830 22:08:14.209685 1 tlsconfig.go:240] "Starting DynamicServingCertificateController" I0830 22:08:14.310276 1 shared_informer.go:277] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file I0830 22:08:14.310337 1 leaderelection.go:248] attempting to acquire leader lease kube-system/kube-scheduler... I0830 22:08:14.310383 1 shared_informer.go:277] Caches are synced for RequestHeaderAuthRequestController I0830 22:08:14.310492 1 shared_informer.go:277] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file I0830 22:08:14.310718 1 tlsconfig.go:178] "Loaded client CA" index=0 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"kube-ca\" [] issuer=\"
\" (2023-08-30 22:05:52 +0000 UTC to 2033-08-27 22:05:52 +0000 UTC (now=2023-08-30 22:08:14.310662214 +0000 UTC))" I0830 22:08:14.310922 1 tlsconfig.go:200] "Loaded serving cert" certName="Generated self signed cert" certDetail="\"localhost@1693433293\" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer=\"localhost-ca@1693433293\" (2023-08-30 21:08:13 +0000 UTC to 2024-08-29 21:08:13 +0000 UTC (now=2023-08-30 22:08:14.310897152 +0000 UTC))" I0830 22:08:14.311079 1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1693433294\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1693433294\" (2023-08-30 21:08:13 +0000 UTC to 2024-08-29 21:08:13 +0000 UTC (now=2023-08-30 22:08:14.311061128 +0000 UTC))" I0830 22:08:14.311152 1 tlsconfig.go:178] "Loaded client CA" index=0 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"kube-ca\" [] issuer=\" \" (2023-08-30 22:05:52 +0000 UTC to 2033-08-27 22:05:52 +0000 UTC (now=2023-08-30 22:08:14.311132642 +0000 UTC))" I0830 22:08:14.311191 1 tlsconfig.go:178] "Loaded client CA" index=1 certName="client-ca::kube-system::extension-apiserver-authentication::client-ca-file,client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" certDetail="\"kube-apiserver-requestheader-ca\" [] issuer=\" \" (2023-08-30 22:05:52 +0000 UTC to 2033-08-27 22:05:52 +0000 UTC (now=2023-08-30 22:08:14.311164164 +0000 UTC))" I0830 22:08:14.311338 1 tlsconfig.go:200] "Loaded serving cert" certName="Generated self signed cert" certDetail="\"localhost@1693433293\" [serving] validServingFor=[127.0.0.1,localhost,localhost] issuer=\"localhost-ca@1693433293\" (2023-08-30 21:08:13 +0000 UTC to 2024-08-29 21:08:13 +0000 UTC (now=2023-08-30 22:08:14.311319494 +0000 UTC))" I0830 22:08:14.311524 1 named_certificates.go:53] "Loaded SNI cert" index=0 certName="self-signed loopback" certDetail="\"apiserver-loopback-client@1693433294\" [serving] validServingFor=[apiserver-loopback-client] issuer=\"apiserver-loopback-client-ca@1693433294\" (2023-08-30 21:08:13 +0000 UTC to 2024-08-29 21:08:13 +0000 UTC (now=2023-08-30 22:08:14.311503663 +0000 UTC))" I0830 22:08:14.315539 1 leaderelection.go:258] successfully acquired lease kube-system/kube-scheduler I0830 22:08:31.363852 1 node_tree.go:65] "Added node in listed group to NodeTree" node="master" zone="" I0830 22:08:32.869630 1 node_tree.go:65] "Added node in listed group to NodeTree" node="worker1" zone=""
- kube-controller-manager
-kube-apiserver
@GuillaumeDorschner ,
The following line from the kubelet logs could indicate the reason for failure: the default router on the host is missing.
E0830 22:08:33.046106 846612 kubelet_node_status.go:701] "Failed to set some node status fields" err="can't get ip address of node master. error: no default routes found in \"/proc/net/route\" or \"/proc/net/ipv6_route\"" node="master"
Yoy can try the following:
ip route
to check if there is a line starting with default via ...
- I expect to see it does not existsudo ip route add default via 192.168.100.1
to add the missing rule Once you finish the above, you may need to run rke up
again to trigger a reconciliation, and I expect the cluster to work now.
ref:
As you suggested, I checked my internet connection and noticed a default IP route was already in place.
[root@localhost ~]# ip route
default via 192.168.1.1 dev eno1 proto dhcp metric 101
However, the default IP route was not present for the on-premise network, I did :
sudo ip route add default via 192.168.137.1
This successfully resolved the issue, allowing me to build my Kubernetes cluster without any problems:
INFO[0086] [addons] Executing deploy job rke-network-plugin
INFO[0096] [addons] Setting up coredns
...
INFO[0126] [addons] User addons deployed successfully
INFO[0126] Finished building Kubernetes cluster successfully
Thank you, @jiaqiluo !
I am encountering a problem with my RKE setup, which I believe is related to DNS or network configurations. Here's an outline of the problem:
RKE version:
Docker version: (
docker version
,docker info
preferred)Operating system and kernel: (
cat /etc/os-release
,uname -r
preferred)Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) Bare-metal
cluster.yml file:
My cluster needs to operate in an Air-Gap environment, so I must resolve this issue. I Air-Gap in a VLAN, maybe that the problem ?
Cluster Creation Output Failed:
I don't know how to fix this. I would appreciate some advice or help in resolving this issue.