Closed rancherbot closed 1 year ago
VERSION=v1.24.10+rke2r1
Infrastructure
Node(s) CPU architecture, OS, and version:
Linux 5.4.0-1041-aws x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 20.04.2 LTS"
Cluster Configuration: From Rancher provisioned node
$ sudo cat /etc/rancher/rke2/config.yaml.d/50-rancher.yaml //intentionally falsified
{
"advertise-address": "13.1.7.2",
"agent-token": "shoopdoopdehdoopooohpdeydoopdeywhoopday",
"cni": "calico",
"disable-kube-proxy": false,
"etcd-expose-metrics": false,
"etcd-snapshot-retention": 5,
"etcd-snapshot-schedule-cron": "0 */5 * * *",
"kube-controller-manager-arg": [
"cert-dir=/var/lib/rancher/rke2/server/tls/kube-controller-manager",
"secure-port=10257"
],
"kube-controller-manager-extra-mount": [
"/var/lib/rancher/rke2/server/tls/kube-controller-manager:/var/lib/rancher/rke2/server/tls/kube-controller-manager"
],
"kube-scheduler-arg": [
"cert-dir=/var/lib/rancher/rke2/server/tls/kube-scheduler",
"secure-port=10259"
],
"kube-scheduler-extra-mount": [
"/var/lib/rancher/rke2/server/tls/kube-scheduler:/var/lib/rancher/rke2/server/tls/kube-scheduler"
],
"node-external-ip": [
"13.1.7.2"
],
"node-ip": [
"17.1.1.17"
],
"node-label": [
"rke.cattle.io/machine=79999999-9b68-452a-82a9-444444645"
],
"private-registry": "/etc/rancher/rke2/registries.yaml",
"protect-kernel-defaults": false,
"tls-san": [
"13.1.7.2"
],
"token": "onceamerrymanyminimen"
}
ATTENTION TO: "I plan to test a potential fix, but need time to set up a reliably reproducible environment due to the raciness of this issue, I've only seen this issue twice out of ~ 25 attempts."
v1.24.11-rc1+rke2r1
Infrastructure
Node(s) CPU architecture, OS, and Version:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Cluster Configuration:
$ kubectl get node,pod -A
NAME STATUS ROLES AGE VERSION
node/124issue-cp-540bbe9a-5qrzl Ready control-plane,master 30m v1.24.11+rke2r1
node/124issue-cp-540bbe9a-8xwzh Ready control-plane,master 30m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-fzbsg Ready etcd 29m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-mdql5 Ready etcd 29m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-w9vfn Ready etcd 29m v1.24.11+rke2r1
node/124issue-worker-d42c52f4-6k9l2 Ready worker 27m v1.24.11+rke2r1
node/124issue-worker-d42c52f4-ndrn5 Ready worker 27m v1.24.11+rke2r1
1. Create Rancher cluster using that conf
Deploy a secret: "kubectl create secret generic secret1 -n default --from-literal=mykey=mydata"
On the etcd node, confirm the secret is present and get the secrets encryption key using etcdctl:
# Install etcdctl
# Run command to get encryption of the secret:
$ sudo ETCDCTL_API=3 etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --endpoints https://127.0.0.1:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt get /registry/secrets/default/secret1 | hexdump -C
# Result should include something like:
k8s:enc:aescbc:v1:
On CP node
# Get initial status
$ sudo rke2 secrets-encrypt status
# Run prepare step
$ sudo rke2 secrets-encrypt prepare
# Restart all nodes -- restart ETCD first, then CP NODES, then AGENT NODES
$ sudo systemctl restart rke2-server
# Run rotate
sudo rke2 secrets-encrypt rotate
# Restart all nodes -- restart ETCD first, then CP NODES, then AGENT NODES
$ sudo systemctl restart rke2-server
# Run reencrypt
$ sudo rke2 secrets-encrypt reencrypt
# Restart all nodes -- restart ETCD first, then CP NODES, then AGENT NODES
$ sudo systemctl restart rke2-server
On the etcd node, confirm the secret encryption key has changed using etcdctl:
# Run command to get encryption of the secret:
$ sudo ETCDCTL_API=3 etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --endpoints https://127.0.0.1:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt get /registry/secrets/default/secret1 | hexdump -C
# Result should be different than the initial one and include the timestamp. Something like:
k8s:enc:aescbckey-2021-12-08T21:34:03Z:
Validation Results:
Before start full flow
$ kubectl get node,pod -A
NAME STATUS ROLES AGE VERSION
node/124issue-cp-540bbe9a-5qrzl Ready control-plane,master 15m v1.24.11+rke2r1
node/124issue-cp-540bbe9a-8xwzh Ready control-plane,master 15m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-fzbsg Ready etcd 15m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-mdql5 Ready etcd 15m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-w9vfn Ready etcd 15m v1.24.11+rke2r1
node/124issue-worker-d42c52f4-6k9l2 Ready worker 12m v1.24.11+rke2r1
node/124issue-worker-d42c52f4-ndrn5 Ready worker 12m v1.24.11+rke2r1
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-system pod/calico-kube-controllers-868d8d5ccb-xxrz9 1/1 Running 0 14m
calico-system pod/calico-node-4kngv 1/1 Running 0 14m
calico-system pod/calico-node-b4dqm 1/1 Running 0 12m
calico-system pod/calico-node-k27jg 1/1 Running 0 14m
calico-system pod/calico-node-kdjgf 1/1 Running 0 14m
calico-system pod/calico-node-m7cnw 1/1 Running 0 14m
calico-system pod/calico-node-vk57n 1/1 Running 0 14m
calico-system pod/calico-node-wnhws 1/1 Running 0 12m
calico-system pod/calico-typha-55f7c77b95-ljxsl 1/1 Running 0 14m
calico-system pod/calico-typha-55f7c77b95-p2m2w 1/1 Running 0 14m
calico-system pod/calico-typha-55f7c77b95-zhx2c 1/1 Running 0 14m
cattle-fleet-system pod/fleet-agent-7468ff4fb4-j285w 1/1 Running 0 11m
cattle-system pod/apply-system-agent-upgrader-on-124issue-cp-540bbe9a-5qrzl-kdgb5 0/1 Completed 0 10m
cattle-system pod/apply-system-agent-upgrader-on-124issue-cp-540bbe9a-8xwzh-fmv8x 0/1 Completed 0 10m
cattle-system pod/apply-system-agent-upgrader-on-124issue-etcd-3c6cd715-fzb-8g7tn 0/1 Completed 0 10m
cattle-system pod/apply-system-agent-upgrader-on-124issue-etcd-3c6cd715-mdq-mwnpj 0/1 Completed 0 10m
cattle-system pod/apply-system-agent-upgrader-on-124issue-etcd-3c6cd715-w9v-lfgzg 0/1 Completed 0 10m
cattle-system pod/apply-system-agent-upgrader-on-124issue-worker-d42c52f4-6-2gjkp 0/1 Completed 0 10m
cattle-system pod/apply-system-agent-upgrader-on-124issue-worker-d42c52f4-n-bgx7w 0/1 Completed 0 10m
cattle-system pod/cattle-cluster-agent-547cf7d6fd-bjhkk 1/1 Running 0 12m
cattle-system pod/cattle-cluster-agent-547cf7d6fd-grvdm 1/1 Running 0 13m
cattle-system pod/system-upgrade-controller-79fc9c84b7-fksz7 1/1 Running 0 11m
kube-system pod/cloud-controller-manager-124issue-cp-540bbe9a-5qrzl 1/1 Running 0 15m
kube-system pod/cloud-controller-manager-124issue-cp-540bbe9a-8xwzh 1/1 Running 0 15m
kube-system pod/cloud-controller-manager-124issue-etcd-3c6cd715-fzbsg 1/1 Running 0 14m
kube-system pod/cloud-controller-manager-124issue-etcd-3c6cd715-mdql5 1/1 Running 0 15m
kube-system pod/cloud-controller-manager-124issue-etcd-3c6cd715-w9vfn 1/1 Running 0 15m
kube-system pod/etcd-124issue-etcd-3c6cd715-fzbsg 1/1 Running 0 14m
kube-system pod/etcd-124issue-etcd-3c6cd715-mdql5 1/1 Running 0 14m
kube-system pod/etcd-124issue-etcd-3c6cd715-w9vfn 1/1 Running 0 15m
kube-system pod/helm-install-rke2-calico-cmf7r 0/1 Completed 2 15m
kube-system pod/helm-install-rke2-calico-crd-tz9s5 0/1 Completed 0 15m
kube-system pod/helm-install-rke2-coredns-clg4s 0/1 Completed 0 15m
kube-system pod/helm-install-rke2-ingress-nginx-4zdcq 0/1 Completed 0 15m
kube-system pod/helm-install-rke2-metrics-server-5l9jt 0/1 Completed 0 15m
kube-system pod/kube-apiserver-124issue-cp-540bbe9a-5qrzl 1/1 Running 0 15m
kube-system pod/kube-apiserver-124issue-cp-540bbe9a-8xwzh 1/1 Running 0 15m
kube-system pod/kube-controller-manager-124issue-cp-540bbe9a-5qrzl 1/1 Running 0 15m
kube-system pod/kube-controller-manager-124issue-cp-540bbe9a-8xwzh 1/1 Running 0 15m
kube-system pod/kube-proxy-124issue-cp-540bbe9a-5qrzl 1/1 Running 0 15m
kube-system pod/kube-proxy-124issue-cp-540bbe9a-8xwzh 1/1 Running 0 15m
kube-system pod/kube-proxy-124issue-etcd-3c6cd715-fzbsg 1/1 Running 0 14m
kube-system pod/kube-proxy-124issue-etcd-3c6cd715-mdql5 1/1 Running 0 14m
kube-system pod/kube-proxy-124issue-etcd-3c6cd715-w9vfn 1/1 Running 0 14m
kube-system pod/kube-proxy-124issue-worker-d42c52f4-6k9l2 1/1 Running 0 12m
kube-system pod/kube-proxy-124issue-worker-d42c52f4-ndrn5 1/1 Running 0 12m
kube-system pod/kube-scheduler-124issue-cp-540bbe9a-5qrzl 1/1 Running 0 15m
kube-system pod/kube-scheduler-124issue-cp-540bbe9a-8xwzh 1/1 Running 0 15m
kube-system pod/rke2-coredns-rke2-coredns-58fd75f64b-mdrl5 1/1 Running 0 14m
kube-system pod/rke2-coredns-rke2-coredns-58fd75f64b-t66wz 1/1 Running 0 13m
kube-system pod/rke2-coredns-rke2-coredns-autoscaler-768bfc5985-qrkk2 1/1 Running 0 14m
kube-system pod/rke2-ingress-nginx-controller-55vpx 1/1 Running 0 11m
kube-system pod/rke2-ingress-nginx-controller-dxp5l 1/1 Running 0 11m
kube-system pod/rke2-metrics-server-74f878b999-pl6vk 1/1 Running 0 11m
tigera-operator pod/tigera-operator-7cc7df76d5-lmdmv 1/1 Running 0 14m
$ sudo ETCDCTL_API=3 etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --endpoints https://127.0.0.1:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt get /registry/secrets/default/secret1 | hexdump -C
00000000 2f 72 65 67 69 73 74 72 79 2f 73 65 63 72 65 74 |/registry/secret|
00000010 73 2f 64 65 66 61 75 6c 74 2f 73 65 63 72 65 74 |s/default/secret|
00000020 31 0a 6b 38 73 3a 65 6e 63 3a 61 65 73 63 62 63 |1.k8s:enc:aescbc|
00000030 3a 76 31 3a 61 65 73 63 62 63 6b 65 79 3a a4 48 |:v1:aescbckey:.H|
00000040 93 da 57 d0 c2 cd ef ec 47 1f 52 4e 53 f2 92 a2 |..W.....G.RNS...|
00000050 84 82 1a 02 82 3d b5 e0 71 3c 52 b5 2d 34 62 76 |.....=..q<R.-4bv|
00000060 63 1f 24 30 c8 38 e9 0f de 79 94 f8 a5 be cc ef |c.$0.8...y......|
00000070 fd c0 0d 37 11 8c 78 93 c3 c4 72 e9 a9 bf ba 73 |...7..x...r....s|
00000080 18 22 63 87 0a 94 1a c6 f7 a0 d3 28 09 9b fa 01 |."c........(....|
00000090 ca 41 cd a1 ea 2a f1 e5 e8 13 37 e5 30 de 2f 9c |.A...*....7.0./.|
000000a0 d1 52 1b 17 d6 fb f2 f3 67 49 93 aa 5c cf 73 c2 |.R......gI..\.s.|
000000b0 bb a3 93 81 06 51 c2 d2 12 a4 54 aa 50 9d cf 63 |.....Q....T.P..c|
000000c0 63 9e 04 00 1b 4d cc 60 df 61 f7 20 2d 5f 9a 81 |c....M.`.a. -_..|
000000d0 5c 4e ce f2 6d 2f 6a 79 12 71 ee 58 4c 0a 27 3c |\N..m/jy.q.XL.'<|
000000e0 a9 ae 40 2a e0 22 02 e5 a9 8f 2b a8 e0 fe 97 21 |..@*."....+....!|
000000f0 d1 39 27 a7 b9 5b d4 ab 11 63 83 55 b9 22 13 3e |.9'..[...c.U.".>|
00000100 23 1f 2e 7c 72 78 ea 52 27 d6 29 0d 38 da dc e3 |#..|rx.R'.).8...|
00000110 41 ee c5 34 d8 99 3d f8 8b c6 22 5e 97 cf f0 79 |A..4..=..."^...y|
00000120 ee 35 b3 6c eb 24 4f f9 d0 86 ec e3 78 a8 56 ff |.5.l.$O.....x.V.|
00000130 f5 6f 11 de e8 33 4e 2e 8d 66 b6 3a bd a7 0a |.o...3N..f.:...|
0000013f
After full flow
$ sudo ETCDCTL_API=3 etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --endpoints https://127.0.0.1:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt get /registry/secrets/default/secret1 | hexdump -C
00000000 2f 72 65 67 69 73 74 72 79 2f 73 65 63 72 65 74 |/registry/secret|
00000010 73 2f 64 65 66 61 75 6c 74 2f 73 65 63 72 65 74 |s/default/secret|
00000020 31 0a 6b 38 73 3a 65 6e 63 3a 61 65 73 63 62 63 |1.k8s:enc:aescbc|
00000030 3a 76 31 3a 61 65 73 63 62 63 6b 65 79 2d 32 30 |:v1:aescbckey-20|
00000040 32 33 2d 30 33 2d 30 39 54 32 32 3a 30 32 3a 34 |23-03-09T22:02:4|
00000050 32 5a 3a 9d 01 08 c1 7d 5b e8 c8 ec 0c e3 ca 36 |2Z:....}[......6|
00000060 f5 8a f8 0f f2 23 2c e5 ad c1 84 db eb fc 86 fc |.....#,.........|
00000070 83 8b 2e 71 41 f0 da df b5 0c eb c9 2c 9d 96 91 |...qA.......,...|
00000080 12 09 df 1b 7d 13 4f a0 cd 1f b4 e8 79 9b 1d 2b |....}.O.....y..+|
00000090 ee 04 86 0c 9b bd cb b4 a3 32 af 4c bc 99 5f 04 |.........2.L.._.|
000000a0 4f 12 5a c7 f7 37 58 c9 3d 47 2f cb 14 a2 59 83 |O.Z..7X.=G/...Y.|
000000b0 64 e4 20 23 ab 94 94 88 52 03 37 74 0a 2f 30 65 |d. #....R.7t./0e|
000000c0 7f 8c f3 e4 3f 87 9a 2f 28 11 06 b1 ac ca 84 33 |....?../(......3|
000000d0 fe 24 a3 da 2e d3 79 d1 fd 31 9a 0c 4f eb da a5 |.$....y..1..O...|
000000e0 d8 cc 18 5b 59 9d 72 5f 76 c4 89 ec 88 0e 1a 14 |...[Y.r_v.......|
000000f0 0c 5a d1 2d 45 d9 78 b1 44 52 3d 1e d4 15 81 e3 |.Z.-E.x.DR=.....|
00000100 ee f8 49 86 88 5a 6c 68 7b 75 13 ed 0c 14 94 41 |..I..Zlh{u.....A|
00000110 ec fe 0b 74 d5 43 c2 5f 8d 2b 21 cb ec 9a 3d bb |...t.C._.+!...=.|
00000120 3f e7 ba 79 a5 50 25 2d 84 d0 71 25 5f d7 10 d7 |?..y.P%-..q%_...|
00000130 e1 31 b4 63 6c f8 d8 3d ec 90 b1 61 f4 e7 88 67 |.1.cl..=...a...g|
00000140 f3 24 eb 78 4b a1 78 b8 bb 2c ef 80 ab 96 38 79 |.$.xK.x..,....8y|
00000150 c7 dd 7c 0a |..|.|
00000154
$ kubectl get node,pod -A
NAME STATUS ROLES AGE VERSION
node/124issue-cp-540bbe9a-5qrzl Ready control-plane,master 30m v1.24.11+rke2r1
node/124issue-cp-540bbe9a-8xwzh Ready control-plane,master 30m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-fzbsg Ready etcd 29m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-mdql5 Ready etcd 29m v1.24.11+rke2r1
node/124issue-etcd-3c6cd715-w9vfn Ready etcd 29m v1.24.11+rke2r1
node/124issue-worker-d42c52f4-6k9l2 Ready worker 27m v1.24.11+rke2r1
node/124issue-worker-d42c52f4-ndrn5 Ready worker 27m v1.24.11+rke2r1
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-system pod/calico-kube-controllers-868d8d5ccb-xxrz9 1/1 Running 0 29m
calico-system pod/calico-node-4kngv 1/1 Running 0 29m
calico-system pod/calico-node-b4dqm 1/1 Running 0 27m
calico-system pod/calico-node-k27jg 1/1 Running 0 29m
calico-system pod/calico-node-kdjgf 1/1 Running 0 29m
calico-system pod/calico-node-m7cnw 1/1 Running 0 29m
calico-system pod/calico-node-vk57n 1/1 Running 0 29m
calico-system pod/calico-node-wnhws 1/1 Running 0 27m
calico-system pod/calico-typha-55f7c77b95-ljxsl 1/1 Running 0 29m
calico-system pod/calico-typha-55f7c77b95-p2m2w 1/1 Running 0 29m
calico-system pod/calico-typha-55f7c77b95-zhx2c 1/1 Running 0 29m
cattle-fleet-system pod/fleet-agent-7468ff4fb4-j285w 1/1 Running 0 26m
cattle-system pod/cattle-cluster-agent-79b6c5c88b-cjnfx 1/1 Running 0 11m
cattle-system pod/cattle-cluster-agent-79b6c5c88b-qtq8q 1/1 Running 0 10m
cattle-system pod/system-upgrade-controller-79fc9c84b7-fksz7 1/1 Running 0 26m
kube-system pod/cloud-controller-manager-124issue-cp-540bbe9a-5qrzl 1/1 Running 2 (8m29s ago) 30m
kube-system pod/cloud-controller-manager-124issue-cp-540bbe9a-8xwzh 1/1 Running 3 (10m ago) 30m
kube-system pod/cloud-controller-manager-124issue-etcd-3c6cd715-fzbsg 1/1 Running 0 29m
kube-system pod/cloud-controller-manager-124issue-etcd-3c6cd715-mdql5 1/1 Running 0 29m
kube-system pod/cloud-controller-manager-124issue-etcd-3c6cd715-w9vfn 1/1 Running 0 29m
kube-system pod/etcd-124issue-etcd-3c6cd715-fzbsg 1/1 Running 0 29m
kube-system pod/etcd-124issue-etcd-3c6cd715-mdql5 1/1 Running 0 29m
kube-system pod/etcd-124issue-etcd-3c6cd715-w9vfn 1/1 Running 0 29m
kube-system pod/helm-install-rke2-calico-cmf7r 0/1 Completed 2 29m
kube-system pod/helm-install-rke2-calico-crd-tz9s5 0/1 Completed 0 29m
kube-system pod/helm-install-rke2-coredns-clg4s 0/1 Completed 0 29m
kube-system pod/helm-install-rke2-ingress-nginx-4zdcq 0/1 Completed 0 29m
kube-system pod/helm-install-rke2-metrics-server-5l9jt 0/1 Completed 0 29m
kube-system pod/kube-apiserver-124issue-cp-540bbe9a-5qrzl 1/1 Running 3 (4m47s ago) 30m
kube-system pod/kube-apiserver-124issue-cp-540bbe9a-8xwzh 1/1 Running 3 (3m59s ago) 30m
kube-system pod/kube-controller-manager-124issue-cp-540bbe9a-5qrzl 1/1 Running 5 (4m52s ago) 30m
kube-system pod/kube-controller-manager-124issue-cp-540bbe9a-8xwzh 1/1 Running 7 (4m15s ago) 30m
kube-system pod/kube-proxy-124issue-cp-540bbe9a-5qrzl 1/1 Running 0 30m
kube-system pod/kube-proxy-124issue-cp-540bbe9a-8xwzh 1/1 Running 0 29m
kube-system pod/kube-proxy-124issue-etcd-3c6cd715-fzbsg 1/1 Running 0 29m
kube-system pod/kube-proxy-124issue-etcd-3c6cd715-mdql5 1/1 Running 3 (5m35s ago) 29m
kube-system pod/kube-proxy-124issue-etcd-3c6cd715-w9vfn 1/1 Running 2 (5m56s ago) 29m
kube-system pod/kube-proxy-124issue-worker-d42c52f4-6k9l2 1/1 Running 2 (6m56s ago) 27m
kube-system pod/kube-proxy-124issue-worker-d42c52f4-ndrn5 1/1 Running 3 (3m36s ago) 27m
kube-system pod/kube-scheduler-124issue-cp-540bbe9a-5qrzl 1/1 Running 2 (5m5s ago) 30m
kube-system pod/kube-scheduler-124issue-cp-540bbe9a-8xwzh 1/1 Running 3 (4m17s ago) 30m
kube-system pod/rke2-coredns-rke2-coredns-58fd75f64b-mdrl5 1/1 Running 0 29m
kube-system pod/rke2-coredns-rke2-coredns-58fd75f64b-t66wz 1/1 Running 0 28m
kube-system pod/rke2-coredns-rke2-coredns-autoscaler-768bfc5985-qrkk2 1/1 Running 0 29m
kube-system pod/rke2-ingress-nginx-controller-55vpx 1/1 Running 0 25m
kube-system pod/rke2-ingress-nginx-controller-dxp5l 1/1 Running 0 25m
kube-system pod/rke2-metrics-server-74f878b999-pl6vk 1/1 Running 0 26m
tigera-operator pod/tigera-operator-7cc7df76d5-lmdmv 1/1 Running 0 29m
This is a backport issue for https://github.com/rancher/rke2/issues/3801, automatically created via rancherbot by @brandond
Original issue description:
Environmental Info: RKE2 Version: v1.25.5+rke2r2
Cluster Configuration:
1 server (rancher provisioned, but should be reproducible standalone)
Describe the bug:
Ocassionally,
rke2 secrets-encrypt reencrypt
will fail to update a secret as it may be updated outside of rke2, which cause secrets encryption to fail.Steps To Reproduce:
rke2 secrets encrypt prepare
,systemctl restart rke2-server
,rke2 secrets-encrypt rotate
,systemctl restart rke2-server
)rke2 secrets-encrypt reencrypt
commandExpected behavior:
rke2 secrets-encrypt reencrypt
completes successfully.Actual behavior:
The following error is recorded:
Failed to reencrypted secret: Operation cannot be fulfilled on secrets "serving-cert": the object has been modified; please apply your changes to the latest version and try again
.Additional context / logs:
The secret in question is the dynamiclistener
serving-cert
, which is likely being updated asynchronously.I've diagnosed the issue as coming from here: https://github.com/k3s-io/k3s/blob/1c17f05b8ee669ad309ad344dc443b0ae919328a/pkg/secretsencrypt/controller.go#L224-L228
I plan to test a potential fix, but need time to set up a reliably reproducible environment due to the raciness of this issue, I've only seen this issue twice out of ~ 25 attempts.