rancher / rke

Rancher Kubernetes Engine (RKE), an extremely simple, lightning fast Kubernetes distribution that runs entirely within containers.
Apache License 2.0
3.2k stars 580 forks source link

Behavioral change in rke util get-state-file #3668

Open paddy-hack opened 3 weeks ago

paddy-hack commented 3 weeks ago

RKE version:

Docker version: (docker version,docker info preferred)

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)

cluster.yml file:

nodes:
    - address: 1.2.3.4
      user: rancher
      role:
        - controlplane
        - etcd
        - worker

Steps to Reproduce:

rke up --ssh-agent-auth
mv cluster.rkestate{,.bak}
mv kube_config_cluster.yml{,.bak}
rke util get-state-file --ssh-agent-auth

Results:

INFO[0000] Retrieving state file from cluster           
INFO[0000] Unable to connect to server using kubeconfig, trying to get state from Control Plane node(s), error: [state] Failed to create Kubernetes Client: stat ./kube_config_cluster.yml: no such file or directory 
INFO[0000] [dialer] Setup tunnel for host [1.2.3.4] 
INFO[0000] Image [rancher/hyperkube:v1.28.10-rancher1] exists on host [1.2.3.4] 
INFO[0000] Starting container [extract-statefile-configmap] on host [1.2.3.4], try #1 
INFO[0001] Successfully started [extract-statefile-configmap] container on host [1.2.3.4] 
INFO[0001] Waiting for [extract-statefile-configmap] container to exit on host [1.2.3.4] 
INFO[0001] Waiting for [extract-statefile-configmap] container to exit on host [1.2.3.4] 
INFO[0001] Container [extract-statefile-configmap] is still running on host [1.2.3.4]: stderr: [], stdout: [] 
INFO[0002] Removing container [extract-statefile-configmap] on host [1.2.3.4], try #1 
INFO[0002] [remove/extract-statefile-configmap] Successfully removed container on host [1.2.3.4] 
INFO[0002] Could not get ConfigMap with cluster state from host [1.2.3.4] 
FATA[0002] [state] Unable to get ConfigMap with cluster state from any Control Plane host 

With v1.5.9 the rke util get-state-file command succeeds.\ The test results (per scenario 2) for the issue that adds the rke util commands indicate that the intent is for the command to succeed.

Some further testing shows that with v1.5.10 the command succeeds if kube_config_cluster.yml is present.

This behavioral change broke my CI/CD setup :sob:

The change is introduced by the "fix" for CVE-2023-321-91. The release notes mention it but that did not ring a bell for me and I spent the morning figuring out what had happened :tired_face:

Thinking of provisioning my CI/CD job with a copy of kube_config_cluster.yml to make them work again.\ Obviously, that file cannot be added to the git repository I use to maintain my clusters.

Submitting this in the hope it helps someone running into the same :bow:

paddy-hack commented 3 weeks ago

With v1.5.9 the rke util get-state-file command succeeds.

Clarification: This holds for a v1.5.9 deployed cluster. Using v1.5.9 against a v1.5.10 deployed cluster rke util get-state-file fails. This is because the full-cluster-state ConfigMap that the command looks for is no longer present.