rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.51k stars 263 forks source link

secrets-encrypt rotate-keys is not working since the metics server output is not as expected #5535

Closed aganesh-suse closed 5 months ago

aganesh-suse commented 6 months ago

Issue found on master branch with version v1.29.2-rc3+rke2r1

Environment Details

Infrastructure

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA : 1 etcd , 2 cp, 1 agent node

Config.yaml:

ETCD server config:

token: xxxx
disable-apiserver: true
disable-controller-manager: true
disable-scheduler: true
write-kubeconfig-mode: "0644"
secrets-encryption: true
node-external-ip: 1.1.1.1
debug: true

CP only node configs:

token: xxxx
server: https://1.1.1.1:9345
disable-etcd: true
write-kubeconfig-mode: "0644"
secrets-encryption: true
node-external-ip: 1.2.3.4
debug: true

Steps to reproduce:

  1. Copy config.yaml
    $ sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2
  2. Install RKE2
    curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_VERSION='v1.29.2-rc3+rke2r1' INSTALL_RKE2_TYPE='server' INSTALL_RKE2_METHOD=tar sh -
  3. Start the RKE2 service
    $ sudo systemctl enable --now rke2-server
    or 
    $ sudo systemctl enable --now rke2-agent
  4. Verify Cluster Status:
    kubectl get nodes -o wide
    kubectl get pods -A
  5. Run secrets-encrypt rotate-keys:
    sudo rke2 secrets-encrypt rotate-keys
  6. Run metrics server command:
    kubectl get --raw /metrics | grep apiserver_encryption

Reproducing Results/Observations:

$ kubectl get nodes
NAME               STATUS   ROLES                  AGE     VERSION
ip-172-31-16-219   Ready    etcd                   6m15s   v1.29.2+rke2r1
ip-172-31-16-91    Ready    <none>                 4m14s   v1.29.2+rke2r1
ip-172-31-28-150   Ready    control-plane,master   6m21s   v1.29.2+rke2r1
ip-172-31-29-121   Ready    control-plane,master   5m8s    v1.29.2+rke2r1
$ sudo rke2 secrets-encrypt rotate-keys
FATA[0061] see server log for details: https://127.0.0.1:9345/v1-rke2/encrypt/config: 400 Bad Request secrets-encrypt error ID 66168

The file /var/lib/rancher/rke2/server/cred/encryption-config.json seems to get out of sync with the datastore.

The metrics server does not produce the right result and hence the rotate-keys operation never completes:

$ kubectl get --raw /metrics | grep apiserver_encryption

P.S: Another file to keep an eye on: /var/lib/rancher/rke2/server/cred/encryption-state.json

Expected behavior:

$ sudo rke2 secrets-encrypt rotate-keys 
keys rotated, reencryption started

the reencrypt_finished stage should occur on a successful command completion of the same, when we retry the sudo rke2 secrets-encrypt status command after a few seconds. reboot the nodes in order - etcd nodes then cp nodes - and all hashes should match.

aganesh-suse commented 5 months ago

Validated on branch with commit eb2d438a2fe6b426ecd00cb8e829ddc728a246b7

Environment Details

Infrastructure

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA : 1 etcd , 2 cp, 1 agent node

Config.yaml:

ETCD server config:

token: xxxx
disable-apiserver: true
disable-controller-manager: true
disable-scheduler: true
write-kubeconfig-mode: "0644"
secrets-encryption: true
node-external-ip: 1.1.1.1
debug: true

CP only node configs:

token: xxxx
server: https://1.1.1.1:9345
disable-etcd: true
write-kubeconfig-mode: "0644"
secrets-encryption: true
node-external-ip: 1.2.3.4
debug: true

Steps to reproduce:

  1. Copy config.yaml
    $ sudo mkdir -p /etc/rancher/rke2 && sudo cp config.yaml /etc/rancher/rke2
  2. Install RKE2
    curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_COMMIT='eb2d438a2fe6b426ecd00cb8e829ddc728a246b7' INSTALL_RKE2_TYPE='server' INSTALL_RKE2_METHOD=tar sh -
  3. Start the RKE2 service
    $ sudo systemctl enable --now rke2-server
    or 
    $ sudo systemctl enable --now rke2-agent
  4. Verify Cluster Status:
    kubectl get nodes -o wide
    kubectl get pods -A
  5. Run secrets-encrypt rotate-keys:
    sudo rke2 secrets-encrypt rotate-keys
  6. Reboot rke2 services on all nodes and get status.
    sudo rke2 secrets-encrypt status

Validation Results:

$ kubectl get nodes
NAME               STATUS   ROLES                  AGE   VERSION
ip-172-31-17-195   Ready    control-plane,master   22m   v1.29.3+rke2r1
ip-172-31-19-236   Ready    etcd                   22m   v1.29.3+rke2r1
ip-172-31-25-125   Ready    control-plane,master   20m   v1.29.3+rke2r1
ip-172-31-28-204   Ready    <none>                 20m   v1.29.3+rke2r1

Rotate-keys:

$ sudo rke2 secrets-encrypt rotate-keys
keys rotated, reencryption started

Reboot rke2 services and get status:

$ sudo rke2 secrets-encrypt status
Encryption Status: Enabled
Current Rotation Stage: reencrypt_finished
Server Encryption Hashes: All hashes match

Active  Key Type  Name
------  --------  ----
 *      AES-CBC   aescbckey-2024-04-08T21:19:13Z