rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.2k stars 580 forks source link

RKE Cluster DNS/Network Issue in Air-Gap Environment #3347

Closed GuillaumeDorschner closed 1 year ago

GuillaumeDorschner commented 1 year ago

I am encountering a problem with my RKE setup, which I believe is related to DNS or network configurations. Here's an outline of the problem:

RKE version:

rke version v1.4.7

Docker version: (docker version,docker info preferred)

Client: Docker Engine - Community
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.6
 Git commit:        ced0996
 Built:             Fri Jul 21 20:36:32 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.6
  Git commit:       a61e2b4
  Built:            Fri Jul 21 20:35:32 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

NAME="AlmaLinux"
VERSION="8.8 (Sapphire Caracal)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="AlmaLinux 8.8 (Sapphire Caracal)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:8::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-8"
ALMALINUX_MANTISBT_PROJECT_VERSION="8.8"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.8"

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) Bare-metal

cluster.yml file:

nodes:
  - address: 192.168.137.65
    hostname_override: master
    user: k8s
    role: [controlplane, etcd]
  - address: 192.168.137.62
    hostname_override: worker1
    user: k8s
    role: [worker]

services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h

network:
  plugin: calico

ingress:
  provider: nginx
  network_mode: none

addons: |-
  ---
  apiVersion: v1
  kind: Service
  metadata:
    name: ingress-nginx-external
    namespace: ingress-nginx
  spec:
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 443
    selector:
      app: ingress-nginx
      app.kubernetes.io/instance: ingress-nginx
    sessionAffinity: None
    type: LoadBalancer

My cluster needs to operate in an Air-Gap environment, so I must resolve this issue. I Air-Gap in a VLAN, maybe that the problem ?

Cluster Creation Output Failed:

# Output when isolated within VLAN
➜  rke-setup ./rke up --config cluster.yml
...
FATA[0117] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
➜  rke-setup

I don't know how to fix this. I would appreciate some advice or help in resolving this issue.

GuillaumeDorschner commented 1 year ago

Hello @jiaqiluo, could you help me? I'm currently stuck.

beanbao22 commented 1 year ago

I got the same issue. With internet, rke running without error Without internet, I have full list docker images but always stuck at job rke-network-plugin

jiaqiluo commented 1 year ago

Hi @GuillaumeDorschner, if I understand your needs correctly, you need to configure a private registry for RKE to work in air gap env. See RKE docs

Regarding to the rke-network-plugin-deploy-job error. When RKE returns the error, there should already be a kubeconfig.yaml file in your working directory. Can you use it with kubectl to check the error message on the job or the pod? If there is no kubeconfig file or it does not work, can you try to ssh into the control plane node to check the corresponding containers and their status/error? (or it may be on the worker nodes. it will be easier if you can make a new cluster with only one node and all roles) cc @beanbao22

beanbao22 commented 1 year ago

@jiaqiluo I run only on one node with all roles. I have private registry defined on cluster.yml. But it alway stuck at job rke-network-plugin

I download list image from command **rke config --system-images**.

If I try with internet, it work with any issue

Update: I check logs from pod rke-network-plugin-deploy-job-xxxx

unable to ensure pod container exists: failed to create container for [kubepods besteffort pod7bda81ff-7df8-4b75-b22c-7cf662c272bd] : unable to start unit "kubepods-besteffort-pod7bda81ff_7df8_4b75_b22c_7cf662c272bd.slice" (properties [{Name:Description Value:"libcontainer container kubepods-besteffort-pod7bda81ff_7df8_4b75_b22c_7cf662c272bd.slice"} {Name:Wants Value:["kubepods-besteffort.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Unit kubepods-besteffort.slice not found.

jiaqiluo commented 1 year ago

@beanbao22, it looks like you ran into https://github.com/rancher/rke/issues/3160 which mentions the Unit kubepods-besteffort.slice not found error.

beanbao22 commented 1 year ago

@beanbao22, it looks like you ran into #3160 which mentions the Unit kubepods-besteffort.slice not found error.

I think it's not real issue, cause with internet ready, I install without issue. The same OS, docker, rke version, images....

GuillaumeDorschner commented 1 year ago

@jiaqiluo I have the kube_config_cluster.yml and I looked in the cluster.

Cluster Information

➜  rke-setup kubectl get nodes

NAME      STATUS     ROLES               AGE   VERSION
master    NotReady   controlplane,etcd   18m   v1.26.6
worker1   NotReady   worker              18m   v1.26.6
➜  rke-setup kubectl get componentstatuses

Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE                         ERROR
scheduler            Healthy   ok
controller-manager   Healthy   ok
etcd-0               Healthy   {"health":"true","reason":""}
➜  rke-setup kubectl get pods -n kube-system

NAME                                  READY   STATUS   RESTARTS   AGE
rke-network-plugin-deploy-job-b7f6n   0/1     Error    0          13m
rke-network-plugin-deploy-job-bwcxg   0/1     Error    0          7m49s
rke-network-plugin-deploy-job-gp7vq   0/1     Error    0          105s
rke-network-plugin-deploy-job-htz4h   0/1     Error    0          18m
rke-network-plugin-deploy-job-l4f9b   0/1     Error    0          18m
rke-network-plugin-deploy-job-pc5dj   0/1     Error    0          18m
rke-network-plugin-deploy-job-rtpdp   0/1     Error    0          18m
rke-network-plugin-deploy-job-sxpqj   0/1     Error    0          15m
rke-network-plugin-deploy-job-xx28t   0/1     Error    0          17m

Describe

➜  rke-setup kubectl describe pod rke-network-plugin-deploy-job-b7f6n -n kube-system

Name:             rke-network-plugin-deploy-job-b7f6n
Namespace:        kube-system
Priority:         0
Service Account:  rke-job-deployer
Node:             master/
Start Time:       Wed, 30 Aug 2023 09:54:24 +0200
Labels:           controller-uid=b675948d-42a8-4a29-a838-3055a3b57b63
                  job-name=rke-network-plugin-deploy-job
Annotations:      <none>
Status:           Failed
IP:
IPs:              <none>
Controlled By:    Job/rke-network-plugin-deploy-job
Containers:
  rke-network-plugin-pod:
    Container ID:  docker://3832335a3e9ea74ebe08fce60205fa7e903ef62fec017ae7afd97c3daa4ff5e0
    Image:         repo.labo.bi:8082/rancher/hyperkube:v1.26.6-rancher1
    Image ID:      docker-pullable://rancher/hyperkube@sha256:45e9e5a04b65afa3a291bbc458f0076cfb1d725bd550d7c71b9dbaa25387091e
    Port:          <none>
    Host Port:     <none>
    Command:
      kubectl
      apply
      -f
      /etc/config/rke-network-plugin.yaml
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 30 Aug 2023 09:54:25 +0200
      Finished:     Wed, 30 Aug 2023 09:54:25 +0200
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/config from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q9rn7 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rke-network-plugin
    Optional:  false
  kube-api-access-q9rn7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 op=Exists
Events:
  Type    Reason   Age   From     Message
  ----    ------   ----  ----     -------
  Normal  Pulled   11m   kubelet  Container image "repo.labo.bi:8082/rancher/hyperkube:v1.26.6-rancher1" already present on machine
  Normal  Created  11m   kubelet  Created container rke-network-plugin-pod
  Normal  Started  11m   kubelet  Started container rke-network-plugin-pod

Logs of all the pods

➜  rke-setup kubectl logs rke-network-plugin-deploy-job-b7f6n -n kube-system
kubectl logs rke-network-plugin-deploy-job-bwcxg -n kube-system
kubectl logs rke-network-plugin-deploy-job-gp7vq -n kube-system
kubectl logs rke-network-plugin-deploy-job-htz4h -n kube-system
kubectl logs rke-network-plugin-deploy-job-l4f9b -n kube-system
kubectl logs rke-network-plugin-deploy-job-pc5dj -n kube-system
kubectl logs rke-network-plugin-deploy-job-px6vm -n kube-system
kubectl logs rke-network-plugin-deploy-job-rtpdp -n kube-system
kubectl logs rke-network-plugin-deploy-job-sxpqj -n kube-system
kubectl logs rke-network-plugin-deploy-job-xx28t -n kube-system

Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
Error from server: no preferred addresses found; known addresses: []
GuillaumeDorschner commented 1 year ago

@jiaqiluo, I've tried running RKE again, but this time with the eth interface enabled for internet access. I didn't encounter any errors this time, as you can see from the attached screenshot. However, this setup is only good for testing. My final cluster won't have internet access, so I'm at an impasse.

Screenshot 2023-08-30 at 16 21 32

jiaqiluo commented 1 year ago

@GuillaumeDorschner

The error Error from server: no preferred addresses found; known addresses: [] is returned by K8s when it tries to allocate the pod to the specific node ( master node in your case), but it can't get any adders ( external address in your case).
This can be proved by 1) the master node is NotReady and 2) the value Node: master/ in the output of kubectl describe pod rke-network-plugin-deploy-job-b7f6n -n kube-system, there supposed to be the address after the dash (/).

Now, the problem becomes why nodes are in the NotReady status.

There could be a variety of reasons, and you can try the following to diagnose it:

GuillaumeDorschner commented 1 year ago

@jiaqiluo

Okay, I don't see anything suspicious in the logs or in the output of describe node master.

Logs:

➜  rke-setup ./rke -d up cluster.yml
DEBU[0000] Loglevel set to [debug]
INFO[0000] Running RKE version: v1.4.7
DEBU[0000] audit log policy found in cluster.yml
INFO[0000] Initiating Kubernetes cluster
DEBU[0000] Loading data.json from local source
DEBU[0000] data.json SHA256 checksum: 5f13312d74be24e7121c58dc299f16a728dac04d4635644e7655db30bdc76f68
DEBU[0000] No DNS provider configured, setting default based on cluster version [1.26.6-rancher1-1]
DEBU[0000] DNS provider set to [coredns]
DEBU[0000] Checking if cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0000] Cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0000] Enabling kube-api audit log for cluster version [v1.26.6-rancher1-1]
DEBU[0000] No input provided for maxUnavailableWorker, setting it to default value of 10 percent
DEBU[0000] No input provided for maxUnavailableControlplane, setting it to default value of 1
DEBU[0000] Checking ingress default backend for cluster version [v1.26.6-rancher1-1]
DEBU[0000] Cluster version [v1.26.6-rancher1-1] needs to have ingress default backend disabled
DEBU[0000] Host: 192.168.137.65 has role: controlplane
DEBU[0000] Host: 192.168.137.65 has role: etcd
DEBU[0000] Host: 192.168.137.62 has role: worker
DEBU[0000] Checking cri-dockerd for cluster version [v1.26.6-rancher1-1]
DEBU[0000] cri-dockerd is enabled for cluster version [v1.26.6-rancher1-1]
DEBU[0000] Checking PodSecurityPolicy for cluster version [v1.26.6-rancher1-1]
DEBU[0000] Checking PodSecurity for cluster version [v1.26.6-rancher1-1]
DEBU[0000] [state] previous state not found, possible legacy cluster
INFO[0000] [dialer] Setup tunnel for host [192.168.137.62]
INFO[0000] [dialer] Setup tunnel for host [192.168.137.65]
DEBU[0000] Connecting to Docker API for host [192.168.137.62]
DEBU[0000] Connecting to Docker API for host [192.168.137.65]
DEBU[0000] Docker Info found for host [192.168.137.65]: types.Info{ID:"d03eca6f-6468-4370-9f1b-008bdb123e0a", Containers:19, ContainersRunning:2, ContainersPaused:0, ContainersStopped:17, Images:28, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "xfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Using metacopy", "false"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:37, OomKillDisable:true, NGoroutines:43, SystemTime:"2023-08-30T23:30:12.206004187+02:00", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"4.18.0-372.9.1.el8.x86_64", OperatingSystem:"AlmaLinux 8.6 (Sky Tiger)", OSVersion:"8.6", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc0001b4150), NCPU:32, MemTotal:269458673664, GenericResources:[]swarm.GenericResource(nil), DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"localhost.localdomain", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"23.0.6", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"8165feabfdfe38c65b599c4993d227328c231fca", Expected:"8165feabfdfe38c65b599c4993d227328c231fca"}, RuncCommit:types.Commit{ID:"v1.1.8-0-g82f18fe", Expected:"v1.1.8-0-g82f18fe"}, InitCommit:types.Commit{ID:"de40ad0", Expected:"de40ad0"}, SecurityOptions:[]string{"name=seccomp,profile=builtin"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}
DEBU[0000] Docker Info found for host [192.168.137.62]: types.Info{ID:"03725a2b-a240-48ad-9a37-fc4b724cc262", Containers:37, ContainersRunning:20, ContainersPaused:0, ContainersStopped:17, Images:32, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "xfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Using metacopy", "false"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:135, OomKillDisable:true, NGoroutines:115, SystemTime:"2023-08-30T23:30:09.477999904+02:00", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"4.18.0-372.9.1.el8.x86_64", OperatingSystem:"AlmaLinux 8.6 (Sky Tiger)", OSVersion:"8.6", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc0001ac000), NCPU:64, MemTotal:269855334400, GenericResources:[]swarm.GenericResource(nil), DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"localhost.localdomain", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"23.0.6", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"8165feabfdfe38c65b599c4993d227328c231fca", Expected:"8165feabfdfe38c65b599c4993d227328c231fca"}, RuncCommit:types.Commit{ID:"v1.1.8-0-g82f18fe", Expected:"v1.1.8-0-g82f18fe"}, InitCommit:types.Commit{ID:"de40ad0", Expected:"de40ad0"}, SecurityOptions:[]string{"name=seccomp,profile=builtin"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}
DEBU[0000] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0000] Finding container [cluster-state-deployer] on host [192.168.137.65], try #1
DEBU[0000] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0000] Finding container [cluster-state-deployer] on host [192.168.137.62], try #1
INFO[0000] [certificates] Generating CA kubernetes certificates
INFO[0000] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates
INFO[0001] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
INFO[0001] [certificates] Generating Kubernetes API server certificates
INFO[0001] [certificates] Generating Service account token key
INFO[0001] [certificates] Generating Kube Controller certificates
INFO[0001] [certificates] Generating Kube Scheduler certificates
INFO[0002] [certificates] Generating Kube Proxy certificates
INFO[0002] [certificates] Generating Node certificate
INFO[0002] [certificates] Generating admin certificates and kubeconfig
INFO[0003] [certificates] Generating Kubernetes API server proxy client certificates
INFO[0003] [certificates] Generating kube-etcd-192-168-137-65 certificate and key
INFO[0004] Successfully Deployed state file at [./cluster.rkestate]
DEBU[0004] Checking if cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0004] Cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0004] Enabling kube-api audit log for cluster version [v1.26.6-rancher1-1]
DEBU[0004] Host: 192.168.137.65 has role: controlplane
DEBU[0004] Host: 192.168.137.65 has role: etcd
DEBU[0004] Host: 192.168.137.62 has role: worker
DEBU[0004] Checking cri-dockerd for cluster version [v1.26.6-rancher1-1]
DEBU[0004] cri-dockerd is enabled for cluster version [v1.26.6-rancher1-1]
DEBU[0004] Checking PodSecurityPolicy for cluster version [v1.26.6-rancher1-1]
DEBU[0004] Checking PodSecurity for cluster version [v1.26.6-rancher1-1]
INFO[0004] Building Kubernetes cluster
INFO[0004] [dialer] Setup tunnel for host [192.168.137.62]
INFO[0004] [dialer] Setup tunnel for host [192.168.137.65]
DEBU[0004] Connecting to Docker API for host [192.168.137.65]
DEBU[0004] Connecting to Docker API for host [192.168.137.62]
DEBU[0004] Docker Info found for host [192.168.137.65]: types.Info{ID:"d03eca6f-6468-4370-9f1b-008bdb123e0a", Containers:19, ContainersRunning:2, ContainersPaused:0, ContainersStopped:17, Images:28, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "xfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Using metacopy", "false"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:38, OomKillDisable:true, NGoroutines:44, SystemTime:"2023-08-30T23:30:16.056844444+02:00", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"4.18.0-372.9.1.el8.x86_64", OperatingSystem:"AlmaLinux 8.6 (Sky Tiger)", OSVersion:"8.6", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc0001ac000), NCPU:32, MemTotal:269458673664, GenericResources:[]swarm.GenericResource(nil), DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"localhost.localdomain", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"23.0.6", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"8165feabfdfe38c65b599c4993d227328c231fca", Expected:"8165feabfdfe38c65b599c4993d227328c231fca"}, RuncCommit:types.Commit{ID:"v1.1.8-0-g82f18fe", Expected:"v1.1.8-0-g82f18fe"}, InitCommit:types.Commit{ID:"de40ad0", Expected:"de40ad0"}, SecurityOptions:[]string{"name=seccomp,profile=builtin"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}
DEBU[0004] Docker Info found for host [192.168.137.62]: types.Info{ID:"03725a2b-a240-48ad-9a37-fc4b724cc262", Containers:37, ContainersRunning:20, ContainersPaused:0, ContainersStopped:17, Images:32, Driver:"overlay2", DriverStatus:[][2]string{[2]string{"Backing Filesystem", "xfs"}, [2]string{"Supports d_type", "true"}, [2]string{"Using metacopy", "false"}, [2]string{"Native Overlay Diff", "true"}, [2]string{"userxattr", "false"}}, SystemStatus:[][2]string(nil), Plugins:types.PluginsInfo{Volume:[]string{"local"}, Network:[]string{"bridge", "host", "ipvlan", "macvlan", "null", "overlay"}, Authorization:[]string(nil), Log:[]string{"awslogs", "fluentd", "gcplogs", "gelf", "journald", "json-file", "local", "logentries", "splunk", "syslog"}}, MemoryLimit:true, SwapLimit:true, KernelMemory:true, KernelMemoryTCP:true, CPUCfsPeriod:true, CPUCfsQuota:true, CPUShares:true, CPUSet:true, PidsLimit:true, IPv4Forwarding:true, BridgeNfIptables:true, BridgeNfIP6tables:true, Debug:false, NFd:136, OomKillDisable:true, NGoroutines:116, SystemTime:"2023-08-30T23:30:13.267904342+02:00", LoggingDriver:"json-file", CgroupDriver:"cgroupfs", CgroupVersion:"1", NEventsListener:0, KernelVersion:"4.18.0-372.9.1.el8.x86_64", OperatingSystem:"AlmaLinux 8.6 (Sky Tiger)", OSVersion:"8.6", OSType:"linux", Architecture:"x86_64", IndexServerAddress:"https://index.docker.io/v1/", RegistryConfig:(*registry.ServiceConfig)(0xc0001ac1c0), NCPU:64, MemTotal:269855334400, GenericResources:[]swarm.GenericResource(nil), DockerRootDir:"/var/lib/docker", HTTPProxy:"", HTTPSProxy:"", NoProxy:"", Name:"localhost.localdomain", Labels:[]string{}, ExperimentalBuild:false, ServerVersion:"23.0.6", ClusterStore:"", ClusterAdvertise:"", Runtimes:map[string]types.Runtime{"io.containerd.runc.v2":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}, "runc":types.Runtime{Path:"runc", Args:[]string(nil), Shim:(*types.ShimConfig)(nil)}}, DefaultRuntime:"runc", Swarm:swarm.Info{NodeID:"", NodeAddr:"", LocalNodeState:"inactive", ControlAvailable:false, Error:"", RemoteManagers:[]swarm.Peer(nil), Nodes:0, Managers:0, Cluster:(*swarm.ClusterInfo)(nil), Warnings:[]string(nil)}, LiveRestoreEnabled:false, Isolation:"", InitBinary:"docker-init", ContainerdCommit:types.Commit{ID:"8165feabfdfe38c65b599c4993d227328c231fca", Expected:"8165feabfdfe38c65b599c4993d227328c231fca"}, RuncCommit:types.Commit{ID:"v1.1.8-0-g82f18fe", Expected:"v1.1.8-0-g82f18fe"}, InitCommit:types.Commit{ID:"de40ad0", Expected:"de40ad0"}, SecurityOptions:[]string{"name=seccomp,profile=builtin"}, ProductLicense:"", DefaultAddressPools:[]types.NetworkAddressPool(nil), Warnings:[]string(nil)}
INFO[0004] [network] Deploying port listener containers
DEBU[0004] [network] Starting deployListener [rke-etcd-port-listener] on host [192.168.137.65]
DEBU[0004] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0004] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0004] Starting container [rke-etcd-port-listener] on host [192.168.137.65], try #1
INFO[0004] [network] Successfully started [rke-etcd-port-listener] container on host [192.168.137.65]
DEBU[0004] [network] Starting deployListener [rke-cp-port-listener] on host [192.168.137.65]
DEBU[0004] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0004] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0004] Starting container [rke-cp-port-listener] on host [192.168.137.65], try #1
INFO[0005] [network] Successfully started [rke-cp-port-listener] container on host [192.168.137.65]
DEBU[0005] [network] Starting deployListener [rke-worker-port-listener] on host [192.168.137.62]
DEBU[0005] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0005] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0005] Starting container [rke-worker-port-listener] on host [192.168.137.62], try #1
INFO[0005] [network] Successfully started [rke-worker-port-listener] container on host [192.168.137.62]
INFO[0005] [network] Port listener containers deployed successfully
INFO[0005] [network] Running control plane -> etcd port checks
INFO[0005] [network] Checking if host [192.168.137.65] can connect to host(s) [192.168.137.65] on port(s) [2379], try #1
DEBU[0005] [remove/rke-port-checker] Checking if container is running on host [192.168.137.65]
DEBU[0005] [remove/rke-port-checker] Container doesn't exist on host [192.168.137.65]
DEBU[0005] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0005] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0006] Starting container [rke-port-checker] on host [192.168.137.65], try #1
INFO[0006] [network] Successfully started [rke-port-checker] container on host [192.168.137.65]
DEBU[0006] [network] containerLog [] on host: 192.168.137.65
INFO[0006] Removing container [rke-port-checker] on host [192.168.137.65], try #1
DEBU[0006] [network] Length of containerLog is [0] on host: 192.168.137.65
INFO[0006] [network] Running control plane -> worker port checks
INFO[0006] [network] Checking if host [192.168.137.65] can connect to host(s) [192.168.137.62] on port(s) [10250], try #1
DEBU[0006] [remove/rke-port-checker] Checking if container is running on host [192.168.137.65]
DEBU[0006] [remove/rke-port-checker] Container doesn't exist on host [192.168.137.65]
DEBU[0006] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0006] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0006] Starting container [rke-port-checker] on host [192.168.137.65], try #1
INFO[0006] [network] Successfully started [rke-port-checker] container on host [192.168.137.65]
DEBU[0006] [network] containerLog [] on host: 192.168.137.65
INFO[0006] Removing container [rke-port-checker] on host [192.168.137.65], try #1
DEBU[0006] [network] Length of containerLog is [0] on host: 192.168.137.65
INFO[0006] [network] Running workers -> control plane port checks
INFO[0006] [network] Checking if host [192.168.137.62] can connect to host(s) [192.168.137.65] on port(s) [6443], try #1
DEBU[0006] [remove/rke-port-checker] Checking if container is running on host [192.168.137.62]
DEBU[0006] [remove/rke-port-checker] Container doesn't exist on host [192.168.137.62]
DEBU[0006] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0006] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0006] Starting container [rke-port-checker] on host [192.168.137.62], try #1
INFO[0007] [network] Successfully started [rke-port-checker] container on host [192.168.137.62]
DEBU[0007] [network] containerLog [] on host: 192.168.137.62
INFO[0007] Removing container [rke-port-checker] on host [192.168.137.62], try #1
DEBU[0007] [network] Length of containerLog is [0] on host: 192.168.137.62
INFO[0007] [network] Checking KubeAPI port Control Plane hosts
DEBU[0007] [network] Checking KubeAPI port [6443] on host: 192.168.137.65
INFO[0007] [network] Removing port listener containers
DEBU[0007] [remove/rke-etcd-port-listener] Checking if container is running on host [192.168.137.65]
DEBU[0007] [remove/rke-etcd-port-listener] Removing container on host [192.168.137.65]
INFO[0007] Removing container [rke-etcd-port-listener] on host [192.168.137.65], try #1
INFO[0007] [remove/rke-etcd-port-listener] Successfully removed container on host [192.168.137.65]
DEBU[0007] [remove/rke-cp-port-listener] Checking if container is running on host [192.168.137.65]
DEBU[0007] [remove/rke-cp-port-listener] Removing container on host [192.168.137.65]
INFO[0007] Removing container [rke-cp-port-listener] on host [192.168.137.65], try #1
INFO[0007] [remove/rke-cp-port-listener] Successfully removed container on host [192.168.137.65]
DEBU[0007] [remove/rke-worker-port-listener] Checking if container is running on host [192.168.137.62]
DEBU[0007] [remove/rke-worker-port-listener] Removing container on host [192.168.137.62]
INFO[0007] Removing container [rke-worker-port-listener] on host [192.168.137.62], try #1
INFO[0008] [remove/rke-worker-port-listener] Successfully removed container on host [192.168.137.62]
INFO[0008] [network] Port listener containers removed successfully
DEBU[0008] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0008] [certificates] Deploying kubernetes certificates to Cluster nodes
INFO[0008] Finding container [cert-deployer] on host [192.168.137.62], try #1
INFO[0008] Finding container [cert-deployer] on host [192.168.137.65], try #1
DEBU[0008] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0008] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
DEBU[0008] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0008] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0008] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
DEBU[0008] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0008] Starting container [cert-deployer] on host [192.168.137.65], try #1
INFO[0008] Starting container [cert-deployer] on host [192.168.137.62], try #1
DEBU[0008] [certificates] Successfully started Certificate deployer container: cert-deployer
INFO[0008] Finding container [cert-deployer] on host [192.168.137.65], try #1
DEBU[0008] [certificates] Successfully started Certificate deployer container: cert-deployer
INFO[0008] Finding container [cert-deployer] on host [192.168.137.62], try #1
INFO[0013] Finding container [cert-deployer] on host [192.168.137.65], try #1
INFO[0013] Removing container [cert-deployer] on host [192.168.137.65], try #1
INFO[0013] Finding container [cert-deployer] on host [192.168.137.62], try #1
INFO[0013] Removing container [cert-deployer] on host [192.168.137.62], try #1
INFO[0013] [reconcile] Rebuilding and updating local kube config
DEBU[0013] [reconcile] Rebuilding and updating local kube config, creating new kubeconfig
DEBU[0013] Deploying admin Kubeconfig locally at [./kube_config_cluster.yml]
INFO[0013] Successfully Deployed local admin kubeconfig at [./kube_config_cluster.yml]
DEBU[0013] [version] Using ./kube_config_cluster.yml to connect to Kubernetes cluster..
DEBU[0013] [version] Getting Kubernetes server version..
WARN[0013] [reconcile] host [192.168.137.65] is a control plane node without reachable Kubernetes API endpoint in the cluster
WARN[0013] [reconcile] no control plane node with reachable Kubernetes API endpoint in the cluster found
INFO[0013] [certificates] Successfully deployed kubernetes certificates to Cluster nodes
DEBU[0013] using the default EventRateLimit configuration
DEBU[0013] using the PodSecurity configuration [privileged]
INFO[0013] [file-deploy] Deploying file [/etc/kubernetes/admission.yaml] to node [192.168.137.65]
DEBU[0013] [remove/file-deployer] Checking if container is running on host [192.168.137.65]
DEBU[0013] [remove/file-deployer] Container doesn't exist on host [192.168.137.65]
DEBU[0013] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0013] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0013] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0013] Starting container [file-deployer] on host [192.168.137.65], try #1
INFO[0014] Successfully started [file-deployer] container on host [192.168.137.65]
INFO[0014] Waiting for [file-deployer] container to exit on host [192.168.137.65]
INFO[0014] Waiting for [file-deployer] container to exit on host [192.168.137.65]
INFO[0014] Container [file-deployer] is still running on host [192.168.137.65]: stderr: [], stdout: []
DEBU[0015] Exit code for [file-deployer] container on host [192.168.137.65] is [0]
DEBU[0015] [remove/file-deployer] Checking if container is running on host [192.168.137.65]
DEBU[0015] [remove/file-deployer] Removing container on host [192.168.137.65]
INFO[0015] Removing container [file-deployer] on host [192.168.137.65], try #1
INFO[0015] [remove/file-deployer] Successfully removed container on host [192.168.137.65]
DEBU[0015] [file-deploy] Successfully deployed file [/etc/kubernetes/admission.yaml] on node [192.168.137.65]
INFO[0015] [/etc/kubernetes/admission.yaml] Successfully deployed admission control config to Cluster control nodes
INFO[0015] [file-deploy] Deploying file [/etc/kubernetes/audit-policy.yaml] to node [192.168.137.65]
DEBU[0015] [remove/file-deployer] Checking if container is running on host [192.168.137.65]
DEBU[0015] [remove/file-deployer] Container doesn't exist on host [192.168.137.65]
DEBU[0015] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0015] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0015] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0015] Starting container [file-deployer] on host [192.168.137.65], try #1
INFO[0015] Successfully started [file-deployer] container on host [192.168.137.65]
INFO[0015] Waiting for [file-deployer] container to exit on host [192.168.137.65]
INFO[0015] Waiting for [file-deployer] container to exit on host [192.168.137.65]
INFO[0015] Container [file-deployer] is still running on host [192.168.137.65]: stderr: [], stdout: []
DEBU[0016] Exit code for [file-deployer] container on host [192.168.137.65] is [0]
DEBU[0016] [remove/file-deployer] Checking if container is running on host [192.168.137.65]
DEBU[0016] [remove/file-deployer] Removing container on host [192.168.137.65]
INFO[0016] Removing container [file-deployer] on host [192.168.137.65], try #1
INFO[0016] [remove/file-deployer] Successfully removed container on host [192.168.137.65]
DEBU[0016] [file-deploy] Successfully deployed file [/etc/kubernetes/audit-policy.yaml] on node [192.168.137.65]
INFO[0016] [/etc/kubernetes/audit-policy.yaml] Successfully deployed audit policy file to Cluster control nodes
INFO[0016] [reconcile] Reconciling cluster state
INFO[0016] [reconcile] This is newly generated cluster
DEBU[0016] Encryption is disabled in both current and new spec; no action is required
INFO[0016] Pre-pulling kubernetes images
DEBU[0016] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
DEBU[0016] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62], try #1
INFO[0016] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0016] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62]
INFO[0016] Kubernetes images pulled successfully
DEBU[0016] getDefaultKubernetesServicesOptions: getting serviceOptions for cluster version [v1.26.6-rancher1-1]
DEBU[0016] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0016] getDefaultKubernetesServicesOptions: serviceOptions found for cluster major version [v1.26]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0016] Extracted version [v3.5.6] from image [rancher/mirrored-coreos-etcd:v3.5.6]
DEBU[0016] etcd version [3.5.6] is higher than max version [3.4.3-rancher99] for advertising port 4001, not going to advertise port 4001
DEBU[0016] etcd version [3.5.6] is higher than max version [3.4.14-rancher99] for adding stricter TLS cipher suites, going to add stricter TLS cipher suites arguments to etcd
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] Version [3.5.6] is equal or higher than version [3.2.99]
INFO[0016] [etcd] Building up etcd plane..
DEBU[0016] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0016] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0016] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0016] Starting container [etcd-fix-perm] on host [192.168.137.65], try #1
INFO[0016] Successfully started [etcd-fix-perm] container on host [192.168.137.65]
INFO[0016] Waiting for [etcd-fix-perm] container to exit on host [192.168.137.65]
INFO[0016] Waiting for [etcd-fix-perm] container to exit on host [192.168.137.65]
INFO[0016] Container [etcd-fix-perm] is still running on host [192.168.137.65]: stderr: [], stdout: []
DEBU[0017] Exit code for [etcd-fix-perm] container on host [192.168.137.65] is [0]
DEBU[0017] [remove/etcd-fix-perm] Checking if container is running on host [192.168.137.65]
DEBU[0017] [remove/etcd-fix-perm] Removing container on host [192.168.137.65]
INFO[0017] Removing container [etcd-fix-perm] on host [192.168.137.65], try #1
INFO[0017] [remove/etcd-fix-perm] Successfully removed container on host [192.168.137.65]
DEBU[0017] Checking if image [rancher/mirrored-coreos-etcd:v3.5.6] exists on host [192.168.137.65], try #1
INFO[0017] Image [rancher/mirrored-coreos-etcd:v3.5.6] exists on host [192.168.137.65]
INFO[0018] Starting container [etcd] on host [192.168.137.65], try #1
INFO[0018] [etcd] Successfully started [etcd] container on host [192.168.137.65]
DEBU[0018] Extracted version [v0.1.89] from image [rancher/rke-tools:v0.1.89]
DEBU[0018] Extracted version [v0.1.89] from image [rancher/rke-tools:v0.1.89]
DEBU[0018] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0018] [etcd] Running rolling snapshot container [etcd-snapshot-once] on host [192.168.137.65]
DEBU[0018] [etcd] Using command [/opt/rke-tools/rke-etcd-backup etcd-backup save --cacert /etc/kubernetes/ssl/kube-ca.pem --cert /etc/kubernetes/ssl/kube-node.pem --key /etc/kubernetes/ssl/kube-node-key.pem --name etcd-rolling-snapshots --endpoints=192.168.137.65:2379 --retention=24h --creation=6h] for rolling snapshot container [etcd-rolling-snapshots] on host [192.168.137.65]
DEBU[0018] [remove/etcd-rolling-snapshots] Checking if container is running on host [192.168.137.65]
DEBU[0018] [remove/etcd-rolling-snapshots] Container doesn't exist on host [192.168.137.65]
DEBU[0018] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0018] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0018] Starting container [etcd-rolling-snapshots] on host [192.168.137.65], try #1
INFO[0018] [etcd] Successfully started [etcd-rolling-snapshots] container on host [192.168.137.65]
DEBU[0023] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0023] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0023] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0023] Starting container [rke-bundle-cert] on host [192.168.137.65], try #1
INFO[0023] [certificates] Successfully started [rke-bundle-cert] container on host [192.168.137.65]
INFO[0023] Waiting for [rke-bundle-cert] container to exit on host [192.168.137.65]
INFO[0023] Container [rke-bundle-cert] is still running on host [192.168.137.65]: stderr: [], stdout: []
DEBU[0024] Exit code for [rke-bundle-cert] container on host [192.168.137.65] is [0]
INFO[0024] [certificates] successfully saved certificate bundle [/opt/rke/etcd-snapshots//pki.bundle.tar.gz] on host [192.168.137.65]
INFO[0024] Removing container [rke-bundle-cert] on host [192.168.137.65], try #1
DEBU[0024] [etcd] Creating log link for Container [etcd-rolling-snapshots] on host [192.168.137.65]
DEBU[0024] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0024] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0024] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0024] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0025] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0025] [etcd] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0025] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0025] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0025] [etcd] Successfully created log link for Container [etcd-rolling-snapshots] on host [192.168.137.65]
DEBU[0025] [etcd] Creating log link for Container [etcd] on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0025] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0025] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0025] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0025] [etcd] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0025] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0025] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0026] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0026] [etcd] Successfully created log link for Container [etcd] on host [192.168.137.65]
INFO[0026] [etcd] Successfully started etcd plane.. Checking etcd cluster health
DEBU[0026] [etcd] check etcd cluster health on host [192.168.137.65]
INFO[0026] [etcd] etcd host [192.168.137.65] reported healthy=true
DEBU[0026] getDefaultKubernetesServicesOptions: getting serviceOptions for cluster version [v1.26.6-rancher1-1]
DEBU[0026] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0026] getDefaultKubernetesServicesOptions: serviceOptions found for cluster major version [v1.26]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0026] Extracted version [v3.5.6] from image [rancher/mirrored-coreos-etcd:v3.5.6]
DEBU[0026] etcd version [3.5.6] is higher than max version [3.4.3-rancher99] for advertising port 4001, not going to advertise port 4001
DEBU[0026] etcd version [3.5.6] is higher than max version [3.4.14-rancher99] for adding stricter TLS cipher suites, going to add stricter TLS cipher suites arguments to etcd
DEBU[0026] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0026] Version [3.5.6] is equal or higher than version [3.2.99]
INFO[0026] [controlplane] Building up Controller Plane..
INFO[0026] Finding container [service-sidekick] on host [192.168.137.65], try #1
DEBU[0026] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0026] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
DEBU[0026] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0026] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0026] Starting container [kube-apiserver] on host [192.168.137.65], try #1
INFO[0026] [controlplane] Successfully started [kube-apiserver] container on host [192.168.137.65]
INFO[0026] [healthcheck] Start Healthcheck on service [kube-apiserver] on host [192.168.137.65]
DEBU[0028] [healthcheck] Service [kube-apiserver] is not healthy on host [192.168.137.65]. Response code: [403], response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"kube-apiserver\" cannot get path \"/healthz\"","reason":"Forbidden","details":{},"code":403}
, try #1
INFO[0033] [healthcheck] service [kube-apiserver] on host [192.168.137.65] is healthy
DEBU[0033] [controlplane] Creating log link for Container [kube-apiserver] on host [192.168.137.65]
DEBU[0033] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0033] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0033] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0033] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0033] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0034] [controlplane] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0034] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0034] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0034] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0034] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0034] [controlplane] Successfully created log link for Container [kube-apiserver] on host [192.168.137.65]
DEBU[0034] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0034] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0034] Starting container [kube-controller-manager] on host [192.168.137.65], try #1
INFO[0034] [controlplane] Successfully started [kube-controller-manager] container on host [192.168.137.65]
INFO[0034] [healthcheck] Start Healthcheck on service [kube-controller-manager] on host [192.168.137.65]
DEBU[0034] [healthcheck] Failed to check https://localhost:10257/healthz for service [kube-controller-manager] on host [192.168.137.65]: Get "https://localhost:10257/healthz": Unable to access the service on localhost:10257. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
INFO[0039] [healthcheck] service [kube-controller-manager] on host [192.168.137.65] is healthy
DEBU[0039] [controlplane] Creating log link for Container [kube-controller-manager] on host [192.168.137.65]
DEBU[0039] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0039] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0039] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0039] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0039] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0040] [controlplane] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0040] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0040] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0040] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0040] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0040] [controlplane] Successfully created log link for Container [kube-controller-manager] on host [192.168.137.65]
DEBU[0040] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0040] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0040] Starting container [kube-scheduler] on host [192.168.137.65], try #1
INFO[0040] [controlplane] Successfully started [kube-scheduler] container on host [192.168.137.65]
INFO[0040] [healthcheck] Start Healthcheck on service [kube-scheduler] on host [192.168.137.65]
DEBU[0040] [healthcheck] Failed to check https://localhost:10259/healthz for service [kube-scheduler] on host [192.168.137.65]: Get "https://localhost:10259/healthz": Unable to access the service on localhost:10259. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
INFO[0045] [healthcheck] service [kube-scheduler] on host [192.168.137.65] is healthy
DEBU[0045] [controlplane] Creating log link for Container [kube-scheduler] on host [192.168.137.65]
DEBU[0045] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0045] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0045] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0045] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0045] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0046] [controlplane] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0046] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0046] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0046] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0046] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0046] [controlplane] Successfully created log link for Container [kube-scheduler] on host [192.168.137.65]
INFO[0046] [controlplane] Successfully started Controller Plane..
DEBU[0046] Checking if cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0046] Cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0046] Enabling kube-api audit log for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Host: 192.168.137.65 has role: controlplane
DEBU[0046] Host: 192.168.137.65 has role: etcd
DEBU[0046] Host: 192.168.137.62 has role: worker
DEBU[0046] Checking cri-dockerd for cluster version [v1.26.6-rancher1-1]
DEBU[0046] cri-dockerd is enabled for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Checking PodSecurityPolicy for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Checking PodSecurity for cluster version [v1.26.6-rancher1-1]
INFO[0046] [authz] Creating rke-job-deployer ServiceAccount
INFO[0046] [authz] rke-job-deployer ServiceAccount created successfully
INFO[0046] [authz] Creating system:node ClusterRoleBinding
INFO[0046] [authz] system:node ClusterRoleBinding created successfully
INFO[0046] [authz] Creating kube-apiserver proxy ClusterRole and ClusterRoleBinding
INFO[0046] [authz] kube-apiserver proxy ClusterRole and ClusterRoleBinding created successfully
INFO[0046] Successfully Deployed state file at [./cluster.rkestate]
INFO[0046] [state] Saving full cluster state to Kubernetes
INFO[0046] [state] Successfully Saved full cluster state to Kubernetes ConfigMap: full-cluster-state
DEBU[0046] getDefaultKubernetesServicesOptions: getting serviceOptions for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0046] getDefaultKubernetesServicesOptions: serviceOptions found for cluster major version [v1.26]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0046] Extracted version [v3.5.6] from image [rancher/mirrored-coreos-etcd:v3.5.6]
DEBU[0046] etcd version [3.5.6] is higher than max version [3.4.3-rancher99] for advertising port 4001, not going to advertise port 4001
DEBU[0046] etcd version [3.5.6] is higher than max version [3.4.14-rancher99] for adding stricter TLS cipher suites, going to add stricter TLS cipher suites arguments to etcd
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] Version [3.5.6] is equal or higher than version [3.2.99]
DEBU[0046] getDefaultKubernetesServicesOptions: getting serviceOptions for cluster version [v1.26.6-rancher1-1]
DEBU[0046] Extracted version [v1.26.6-rancher1] from image [rancher/hyperkube:v1.26.6-rancher1]
DEBU[0046] getDefaultKubernetesServicesOptions: serviceOptions found for cluster major version [v1.26]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
DEBU[0046] SemVerMatchRange: Cluster version [v1.26.6-rancher1-1] matches range [>=1.22.0-rancher0]
INFO[0046] [worker] Building up Worker Plane..
INFO[0046] Finding container [service-sidekick] on host [192.168.137.65], try #1
DEBU[0046] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0046] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
DEBU[0046] [sidekick] Checking if container [service-sidekick] is eligible for upgrade on host [192.168.137.65]
DEBU[0046] [sidekick] Container [service-sidekick] is not eligible for upgrade on host [192.168.137.65]
INFO[0046] [sidekick] Sidekick container already created on host [192.168.137.65]
DEBU[0046] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0046] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0046] Starting container [kubelet] on host [192.168.137.65], try #1
INFO[0046] Starting container [nginx-proxy] on host [192.168.137.62], try #1
INFO[0046] [worker] Successfully started [kubelet] container on host [192.168.137.65]
INFO[0046] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.137.65]
DEBU[0046] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.65]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
INFO[0046] [worker] Successfully started [nginx-proxy] container on host [192.168.137.62]
DEBU[0046] [worker] Creating log link for Container [nginx-proxy] on host [192.168.137.62]
DEBU[0046] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0046] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.62]
DEBU[0046] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0046] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0046] Starting container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0047] [worker] Successfully started [rke-log-linker] container on host [192.168.137.62]
DEBU[0047] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0047] [remove/rke-log-linker] Removing container on host [192.168.137.62]
INFO[0047] Removing container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0047] [remove/rke-log-linker] Successfully removed container on host [192.168.137.62]
DEBU[0047] [worker] Successfully created log link for Container [nginx-proxy] on host [192.168.137.62]
INFO[0047] Finding container [service-sidekick] on host [192.168.137.62], try #1
DEBU[0047] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0047] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
DEBU[0047] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62], try #1
INFO[0047] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62]
INFO[0047] Starting container [kubelet] on host [192.168.137.62], try #1
INFO[0047] [worker] Successfully started [kubelet] container on host [192.168.137.62]
INFO[0047] [healthcheck] Start Healthcheck on service [kubelet] on host [192.168.137.62]
DEBU[0048] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.62]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
DEBU[0051] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.65]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #2
DEBU[0053] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.62]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #2
DEBU[0056] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.65]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #3
DEBU[0058] [healthcheck] Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.137.62]: Get "http://localhost:10248/healthz": Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #3
INFO[0062] [healthcheck] service [kubelet] on host [192.168.137.65] is healthy
DEBU[0062] [worker] Creating log link for Container [kubelet] on host [192.168.137.65]
DEBU[0062] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0062] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0062] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0062] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0062] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0062] [worker] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0062] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0062] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0062] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0062] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0062] [worker] Successfully created log link for Container [kubelet] on host [192.168.137.65]
DEBU[0062] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65], try #1
INFO[0062] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.65]
INFO[0062] Starting container [kube-proxy] on host [192.168.137.65], try #1
INFO[0062] [worker] Successfully started [kube-proxy] container on host [192.168.137.65]
INFO[0062] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.137.65]
DEBU[0062] [healthcheck] Failed to check http://localhost:10256/healthz for service [kube-proxy] on host [192.168.137.65]: Get "http://localhost:10256/healthz": Unable to access the service on localhost:10256. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), try #1
INFO[0063] [healthcheck] service [kubelet] on host [192.168.137.62] is healthy
DEBU[0063] [worker] Creating log link for Container [kubelet] on host [192.168.137.62]
DEBU[0063] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0063] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.62]
DEBU[0063] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0063] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0063] Starting container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0064] [worker] Successfully started [rke-log-linker] container on host [192.168.137.62]
DEBU[0064] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0064] [remove/rke-log-linker] Removing container on host [192.168.137.62]
INFO[0064] Removing container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0064] [remove/rke-log-linker] Successfully removed container on host [192.168.137.62]
DEBU[0064] [worker] Successfully created log link for Container [kubelet] on host [192.168.137.62]
DEBU[0064] Checking if image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62], try #1
INFO[0064] Image [rancher/hyperkube:v1.26.6-rancher1] exists on host [192.168.137.62]
INFO[0064] Starting container [kube-proxy] on host [192.168.137.62], try #1
INFO[0064] [worker] Successfully started [kube-proxy] container on host [192.168.137.62]
INFO[0064] [healthcheck] Start Healthcheck on service [kube-proxy] on host [192.168.137.62]
INFO[0064] [healthcheck] service [kube-proxy] on host [192.168.137.62] is healthy
DEBU[0064] [worker] Creating log link for Container [kube-proxy] on host [192.168.137.62]
DEBU[0064] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0064] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.62]
DEBU[0064] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0064] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0064] Starting container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0065] [worker] Successfully started [rke-log-linker] container on host [192.168.137.62]
DEBU[0065] [remove/rke-log-linker] Checking if container is running on host [192.168.137.62]
DEBU[0065] [remove/rke-log-linker] Removing container on host [192.168.137.62]
INFO[0065] Removing container [rke-log-linker] on host [192.168.137.62], try #1
INFO[0065] [remove/rke-log-linker] Successfully removed container on host [192.168.137.62]
DEBU[0065] [worker] Successfully created log link for Container [kube-proxy] on host [192.168.137.62]
INFO[0068] [healthcheck] service [kube-proxy] on host [192.168.137.65] is healthy
DEBU[0068] [worker] Creating log link for Container [kube-proxy] on host [192.168.137.65]
DEBU[0068] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0068] [remove/rke-log-linker] Container doesn't exist on host [192.168.137.65]
DEBU[0068] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
INFO[0068] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0068] Starting container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0068] [worker] Successfully started [rke-log-linker] container on host [192.168.137.65]
DEBU[0068] [remove/rke-log-linker] Checking if container is running on host [192.168.137.65]
DEBU[0068] [remove/rke-log-linker] Removing container on host [192.168.137.65]
INFO[0068] Removing container [rke-log-linker] on host [192.168.137.65], try #1
INFO[0068] [remove/rke-log-linker] Successfully removed container on host [192.168.137.65]
DEBU[0068] [worker] Successfully created log link for Container [kube-proxy] on host [192.168.137.65]
INFO[0068] [worker] Successfully started Worker Plane..
DEBU[0068] [cleanup] Starting log link cleanup on host [192.168.137.62]
DEBU[0068] [remove/rke-log-cleaner] Checking if container is running on host [192.168.137.62]
DEBU[0068] [cleanup] Starting log link cleanup on host [192.168.137.65]
DEBU[0068] [remove/rke-log-cleaner] Checking if container is running on host [192.168.137.65]
DEBU[0068] [remove/rke-log-cleaner] Container doesn't exist on host [192.168.137.62]
DEBU[0068] [remove/rke-log-cleaner] Container doesn't exist on host [192.168.137.65]
DEBU[0068] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65], try #1
DEBU[0068] Checking if image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62], try #1
INFO[0068] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.65]
INFO[0068] Image [rancher/rke-tools:v0.1.89] exists on host [192.168.137.62]
INFO[0068] Starting container [rke-log-cleaner] on host [192.168.137.65], try #1
INFO[0068] Starting container [rke-log-cleaner] on host [192.168.137.62], try #1
INFO[0069] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.137.65]
DEBU[0069] [remove/rke-log-cleaner] Checking if container is running on host [192.168.137.65]
DEBU[0069] [remove/rke-log-cleaner] Removing container on host [192.168.137.65]
INFO[0069] Removing container [rke-log-cleaner] on host [192.168.137.65], try #1
INFO[0069] [cleanup] Successfully started [rke-log-cleaner] container on host [192.168.137.62]
DEBU[0069] [remove/rke-log-cleaner] Checking if container is running on host [192.168.137.62]
DEBU[0069] [remove/rke-log-cleaner] Removing container on host [192.168.137.62]
INFO[0069] Removing container [rke-log-cleaner] on host [192.168.137.62], try #1
INFO[0069] [remove/rke-log-cleaner] Successfully removed container on host [192.168.137.65]
DEBU[0069] [cleanup] Successfully cleaned up log links on host [192.168.137.65]
INFO[0069] [remove/rke-log-cleaner] Successfully removed container on host [192.168.137.62]
DEBU[0069] [cleanup] Successfully cleaned up log links on host [192.168.137.62]
INFO[0069] [sync] Syncing nodes Labels and Taints
DEBU[0069] worker [9] starting sync for node [master]
DEBU[0069] Checking node list for node [master], try #1
DEBU[0069] worker [4] starting sync for node [worker1]
DEBU[0069] Checking node list for node [worker1], try #1
INFO[0069] [sync] Successfully synced nodes Labels and Taints
DEBU[0069] Checking if cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0069] Cluster version [1.26.6-rancher1-1] needs to have kube-api audit log enabled
DEBU[0069] Enabling kube-api audit log for cluster version [v1.26.6-rancher1-1]
DEBU[0069] Host: 192.168.137.65 has role: controlplane
DEBU[0069] Host: 192.168.137.65 has role: etcd
DEBU[0069] Host: 192.168.137.62 has role: worker
DEBU[0069] Checking cri-dockerd for cluster version [v1.26.6-rancher1-1]
DEBU[0069] cri-dockerd is enabled for cluster version [v1.26.6-rancher1-1]
DEBU[0069] Checking PodSecurityPolicy for cluster version [v1.26.6-rancher1-1]
DEBU[0069] Checking PodSecurity for cluster version [v1.26.6-rancher1-1]
INFO[0069] [network] Setting up network plugin: calico
INFO[0069] [addons] Saving ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0069] [addons] Successfully saved ConfigMap for addon rke-network-plugin to Kubernetes
INFO[0069] [addons] Executing deploy job rke-network-plugin
DEBU[0069] Checking node list for node [master], try #1
DEBU[0069] Checking addon job OS label for cluster version [v1.26.6-rancher1-1]
DEBU[0069] Cluster version [v1.26.6-rancher1-1] needs to use new OS label
DEBU[0069] [k8s] waiting for job rke-network-plugin-deploy-job to complete..
FATA[0114] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

Describe Master:

➜  rke-setup kubectl describe node master
Name:               master
Roles:              controlplane,etcd
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=master
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/controlplane=true
                    node-role.kubernetes.io/etcd=true
Annotations:        node.alpha.kubernetes.io/ttl: 0
                    rke.cattle.io/external-ip: 192.168.137.65
                    rke.cattle.io/internal-ip: 192.168.137.65
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 30 Aug 2023 23:31:10 +0200
Taints:             node-role.kubernetes.io/etcd=true:NoExecute
                    node-role.kubernetes.io/controlplane=true:NoSchedule
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  master
  AcquireTime:     <unset>
  RenewTime:       Wed, 30 Aug 2023 23:34:13 +0200
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 30 Aug 2023 23:31:20 +0200   Wed, 30 Aug 2023 23:31:10 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 30 Aug 2023 23:31:20 +0200   Wed, 30 Aug 2023 23:31:10 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 30 Aug 2023 23:31:20 +0200   Wed, 30 Aug 2023 23:31:10 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 30 Aug 2023 23:31:20 +0200   Wed, 30 Aug 2023 23:31:10 +0200   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Addresses:
Capacity:
  cpu:                32
  ephemeral-storage:  71645Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             263143236Ki
  pods:               110
Allocatable:
  cpu:                32
  ephemeral-storage:  71645Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             263143236Ki
  pods:               110
System Info:
  Machine ID:                 50ca20960ea94552bd5ef84a20ce7e47
  System UUID:                a1c62d2a-ceb1-11e6-b9b8-0894ef355672
  Boot ID:                    a6ee322d-f8cf-478d-a150-f001906aabc4
  Kernel Version:             4.18.0-372.9.1.el8.x86_64
  OS Image:                   AlmaLinux 8.6 (Sky Tiger)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://23.0.6
  Kubelet Version:            v1.26.6
  Kube-Proxy Version:         v1.26.6
PodCIDR:                      10.42.0.0/24
PodCIDRs:                     10.42.0.0/24
Non-terminated Pods:          (0 in total)
  Namespace                   Name    CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----    ------------  ----------  ---------------  -------------  ---
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests  Limits
  --------           --------  ------
  cpu                0 (0%)    0 (0%)
  memory             0 (0%)    0 (0%)
  ephemeral-storage  0 (0%)    0 (0%)
  hugepages-1Gi      0 (0%)    0 (0%)
  hugepages-2Mi      0 (0%)    0 (0%)
Events:
  Type    Reason                   Age                From             Message
  ----    ------                   ----               ----             -------
  Normal  Starting                 81s                kube-proxy
  Normal  Starting                 85s                kubelet          Starting kubelet.
  Normal  NodeHasSufficientMemory  85s (x2 over 85s)  kubelet          Node master status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    85s (x2 over 85s)  kubelet          Node master status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     85s (x2 over 85s)  kubelet          Node master status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  85s                kubelet          Updated Node Allocatable limit across pods
  Normal  RegisteredNode           81s                node-controller  Node master event: Registered Node master in Controller

And finally:

➜  rke-setup kubectl get pods -A
NAMESPACE     NAME                                  READY   STATUS   RESTARTS   AGE
kube-system   rke-network-plugin-deploy-job-d2fmz   0/1     Error    0          3m6s
kube-system   rke-network-plugin-deploy-job-klfxp   0/1     Error    0          3m29s
kube-system   rke-network-plugin-deploy-job-lbzwm   0/1     Error    0          2m23s
kube-system   rke-network-plugin-deploy-job-ltnrk   0/1     Error    0          60s
kube-system   rke-network-plugin-deploy-job-rsj56   0/1     Error    0          3m47s
kube-system   rke-network-plugin-deploy-job-v2fpm   0/1     Error    0          3m42s
jiaqiluo commented 1 year ago

@GuillaumeDorschner, I also can not find any suspicious message that can explain the issue from the information you collected.

let's try this:

GuillaumeDorschner commented 1 year ago

@jiaqiluo

I have restarted the cluster but For the pass 5 min I don't see the pod restarting,

ps

[root@localhost ~]# docker ps
CONTAINER ID   IMAGE                                 COMMAND                  CREATED         STATUS         PORTS     NAMES
5eff84fb67a0   rancher/hyperkube:v1.26.6-rancher1    "/opt/rke-tools/entr…"   5 minutes ago   Up 5 minutes             kube-proxy
0b17ad56e242   rancher/hyperkube:v1.26.6-rancher1    "/opt/rke-tools/entr…"   5 minutes ago   Up 5 minutes             kubelet
178289f286e0   rancher/hyperkube:v1.26.6-rancher1    "/opt/rke-tools/entr…"   5 minutes ago   Up 5 minutes             kube-scheduler
47a6939b9da7   rancher/hyperkube:v1.26.6-rancher1    "/opt/rke-tools/entr…"   6 minutes ago   Up 6 minutes             kube-controller-manager
154e2eb107be   rancher/hyperkube:v1.26.6-rancher1    "/opt/rke-tools/entr…"   6 minutes ago   Up 6 minutes             kube-apiserver
1d911d9a6739   rancher/rke-tools:v0.1.89             "/docker-entrypoint.…"   6 minutes ago   Up 6 minutes             etcd-rolling-snapshots
1a202b775ea1   rancher/mirrored-coreos-etcd:v3.5.6   "/usr/local/bin/etcd…"   6 minutes ago   Up 6 minutes             etcd

the logs:

- kube-scheduler
- kube-controller-manager
-kube-apiserver
jiaqiluo commented 1 year ago

@GuillaumeDorschner ,

The following line from the kubelet logs could indicate the reason for failure: the default router on the host is missing.

E0830 22:08:33.046106  846612 kubelet_node_status.go:701] "Failed to set some node status fields" err="can't get ip address of node master. error: no default routes found in \"/proc/net/route\" or \"/proc/net/ipv6_route\"" node="master"

Yoy can try the following:

Once you finish the above, you may need to run rke up again to trigger a reconciliation, and I expect the cluster to work now.

ref:

GuillaumeDorschner commented 1 year ago

As you suggested, I checked my internet connection and noticed a default IP route was already in place.

[root@localhost ~]# ip route
default via 192.168.1.1 dev eno1 proto dhcp metric 101

However, the default IP route was not present for the on-premise network, I did : sudo ip route add default via 192.168.137.1

This successfully resolved the issue, allowing me to build my Kubernetes cluster without any problems:

INFO[0086] [addons] Executing deploy job rke-network-plugin
INFO[0096] [addons] Setting up coredns
...
INFO[0126] [addons] User addons deployed successfully
INFO[0126] Finished building Kubernetes cluster successfully

Thank you, @jiaqiluo !