Open brandond opened 1 year ago
I am unable to reproduce this. I followed the above procedure and I see that the local snapshots show the new node.
root@k3s-server-3:/# cat /etc/rancher/k3s/config.yaml
node-name: k3s-server-4
root@k3s-server-3:/# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3s-server-1 Ready control-plane,etcd,master 9m42s v1.28.2+k3s-3db1d332 172.17.0.3 <none> Ubuntu 22.04.3 LTS 5.19.0-1019-aws containerd://1.7.7-k3s1
k3s-server-2 Ready control-plane,etcd,master 9m26s v1.28.2+k3s-3db1d332 172.17.0.4 <none> Ubuntu 22.04.3 LTS 5.19.0-1019-aws containerd://1.7.7-k3s1
k3s-server-4 Ready control-plane,etcd,master 4m56s v1.28.2+k3s-3db1d332 172.17.0.5 <none> Ubuntu 22.04.3 LTS 5.19.0-1019-aws containerd://1.7.7-k3s1
root@k3s-server-3:/# kubectl get etcdsnapshotfile
NAME SNAPSHOTNAME NODE LOCATION SIZE CREATIONTIME
local-on-demand-k3s-server-3-1697657808-80c501 on-demand-k3s-server-3-1697657808 k3s-server-4 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697657808 2355232 2023-10-18T19:36:48Z
local-on-demand-k3s-server-3-1697657865-732f1f on-demand-k3s-server-3-1697657865 k3s-server-4 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697657865 2699296 2023-10-18T19:37:45Z
local-on-demand-k3s-server-3-1697658041-5cd74f on-demand-k3s-server-3-1697658041 k3s-server-4 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697658041 3354656 2023-10-18T19:40:41Z
local-on-demand-k3s-server-3-1697658179-86ca69 on-demand-k3s-server-3-1697658179 k3s-server-4 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697658179 3727392 2023-10-18T19:42:59Z
local-on-demand-k3s-server-4-1697658218-0b3ff6 on-demand-k3s-server-4-1697658218 k3s-server-4 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-4-1697658218 3854368 2023-10-18T19:43:38Z
I rejoined the node a second time and see everything getting updated properly then as well:
root@k3s-server-3:/# cat /etc/rancher/k3s/config.yaml
node-name: k3s-server-5
root@k3s-server-3:/# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3s-server-1 Ready control-plane,etcd,master 16m v1.28.2+k3s-3db1d332 172.17.0.3 <none> Ubuntu 22.04.3 LTS 5.19.0-1019-aws containerd://1.7.7-k3s1
k3s-server-2 Ready control-plane,etcd,master 16m v1.28.2+k3s-3db1d332 172.17.0.4 <none> Ubuntu 22.04.3 LTS 5.19.0-1019-aws containerd://1.7.7-k3s1
k3s-server-5 Ready control-plane,etcd,master 61s v1.28.2+k3s-3db1d332 172.17.0.5 <none> Ubuntu 22.04.3 LTS 5.19.0-1019-aws containerd://1.7.7-k3s1
root@k3s-server-3:/# k3s etcd-snapshot list
Name Location Size Created
on-demand-k3s-server-3-1697657808 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697657808 2355232 2023-10-18T19:36:48Z
on-demand-k3s-server-3-1697657865 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697657865 2699296 2023-10-18T19:37:45Z
on-demand-k3s-server-3-1697658041 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697658041 3354656 2023-10-18T19:40:41Z
on-demand-k3s-server-3-1697658179 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697658179 3727392 2023-10-18T19:42:59Z
on-demand-k3s-server-4-1697658218 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-4-1697658218 3854368 2023-10-18T19:43:38Z
on-demand-k3s-server-5-1697658713 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-5-1697658713 2588704 2023-10-18T19:51:53Z
root@k3s-server-3:/# kubectl get etcdsnapshotfile
NAME SNAPSHOTNAME NODE LOCATION SIZE CREATIONTIME
local-on-demand-k3s-server-3-1697657808-aed2c6 on-demand-k3s-server-3-1697657808 k3s-server-5 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697657808 2355232 2023-10-18T19:36:48Z
local-on-demand-k3s-server-3-1697657865-d882e4 on-demand-k3s-server-3-1697657865 k3s-server-5 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697657865 2699296 2023-10-18T19:37:45Z
local-on-demand-k3s-server-3-1697658041-19ec7a on-demand-k3s-server-3-1697658041 k3s-server-5 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697658041 3354656 2023-10-18T19:40:41Z
local-on-demand-k3s-server-3-1697658179-4ed454 on-demand-k3s-server-3-1697658179 k3s-server-5 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-3-1697658179 3727392 2023-10-18T19:42:59Z
local-on-demand-k3s-server-4-1697658218-99c6ff on-demand-k3s-server-4-1697658218 k3s-server-5 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-4-1697658218 3854368 2023-10-18T19:43:38Z
local-on-demand-k3s-server-5-1697658713-7e1f33 on-demand-k3s-server-5-1697658713 k3s-server-5 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-k3s-server-5-1697658713 2588704 2023-10-18T19:51:53Z
Ubuntu 22.04. HA: 3 server/1 agent setup Recording the test results i shared offline with Brad. P.S: I have 2 setups with 1 working fine and 1 with this issue seen on.
Config file:
$ cat /etc/rancher/k3s/config.yaml
token: secret
node-name: "server1"
etcd-snapshot-retention: 2
etcd-snapshot-schedule-cron: "* * * * *"
etcd-s3: true
etcd-s3-access-key: xxxx
etcd-s3-secret-key: xxxx
etcd-s3-bucket: bucket
etcd-s3-folder: folder
etcd-s3-region: us-east-2
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server
2) Install k3s :
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='3db1d33282765b8fad8ff0a5ec763a4d2487ee9f' sh -s - server
3) The cron takes a snapshot every minute. Sleep for 2 to 3 minutes; Update the node name with suffix1 for all 4 nodes, restart the services. Ex: server1-8581 4) Do step 3 once more. Add another suffix to the node name in this step: Ex: server1-17695-8581 5) Save etcd snapshot on demand(5 snapshots), prune with retention of 3 and delete 1 snapshot.
sudo k3s etcd-snapshot save
sudo k3s etcd-snapshot prune --snapshot-retention 3
sudo k3s etcd-snapshot delete <on demand snapshot>
Outputs:
$ sudo k3s etcd-snapshot save --debug
WARN[0000] Unknown flag --token found in config.yaml, skipping
WARN[0000] Unknown flag --etcd-snapshot-retention found in config.yaml, skipping
WARN[0000] Unknown flag --etcd-snapshot-schedule-cron found in config.yaml, skipping
WARN[0000] Unknown flag --cluster-init found in config.yaml, skipping
WARN[0000] Unknown flag --write-kubeconfig-mode found in config.yaml, skipping
WARN[0000] Unknown flag --node-external-ip found in config.yaml, skipping
WARN[0000] Unknown flag --node-label found in config.yaml, skipping
WARN[0000] Unknown flag --server found in config.yaml, skipping
DEBU[0000] Attempting to retrieve extra metadata from k3s-etcd-snapshot-extra-metadata ConfigMap
DEBU[0000] Error encountered attempting to retrieve extra metadata from k3s-etcd-snapshot-extra-metadata ConfigMap, error: configmaps "k3s-etcd-snapshot-extra-metadata" not found
INFO[0000] Saving etcd snapshot to /var/lib/rancher/k3s/server/db/snapshots/on-demand-server1-17695-8581-1697670509
{"level":"info","ts":"2023-10-18T23:08:29.188729Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-server1-17695-8581-1697670509.part"}
{"level":"info","ts":"2023-10-18T23:08:29.191305Z","logger":"client","caller":"v3@v3.5.9-k3s1/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2023-10-18T23:08:29.191368Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":"2023-10-18T23:08:29.33019Z","logger":"client","caller":"v3@v3.5.9-k3s1/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2023-10-18T23:08:29.367338Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"12 MB","took":"now"}
{"level":"info","ts":"2023-10-18T23:08:29.367585Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/lib/rancher/k3s/server/db/snapshots/on-demand-server1-17695-8581-1697670509"}
INFO[0000] Checking if S3 bucket sonobuoy-results exists
INFO[0000] S3 bucket sonobuoy-results exists
INFO[0000] Saving etcd snapshot on-demand-server1-17695-8581-1697670509 to S3
INFO[0000] Uploading snapshot to s3://sonobuoy-results//var/lib/rancher/k3s/server/db/snapshots/on-demand-server1-17695-8581-1697670509
INFO[0000] Uploaded snapshot metadata s3://sonobuoy-results//var/lib/rancher/k3s/server/db/.metadata/on-demand-server1-17695-8581-1697670509
INFO[0000] S3 upload complete for on-demand-server1-17695-8581-1697670509
INFO[0000] Reconciling ETCDSnapshotFile resources
DEBU[0000] Found snapshotFile for etcd-snapshot-server1-17695-8581-1697670485 with key local-etcd-snapshot-server1-17695-8581-1697670485
DEBU[0000] Found snapshotFile for on-demand-server1-17695-8581-1697667654 with key local-on-demand-server1-17695-8581-1697667654
DEBU[0000] Found snapshotFile for on-demand-server1-17695-8581-1697667659 with key local-on-demand-server1-17695-8581-1697667659
DEBU[0000] Found snapshotFile for on-demand-server1-17695-8581-1697670509 with key local-on-demand-server1-17695-8581-1697670509
DEBU[0000] Found snapshotFile for etcd-snapshot-server1-17695-8581-1697670423 with key s3-etcd-snapshot-server1-17695-8581-1697670423
DEBU[0000] Found snapshotFile for etcd-snapshot-server1-17695-8581-1697670423 with key local-etcd-snapshot-server1-17695-8581-1697670423
DEBU[0000] Found snapshotFile for etcd-snapshot-server1-17695-8581-1697670485 with key s3-etcd-snapshot-server1-17695-8581-1697670485
DEBU[0000] Found snapshotFile for on-demand-server1-17695-8581-1697667654 with key s3-on-demand-server1-17695-8581-1697667654
DEBU[0000] Found snapshotFile for on-demand-server1-17695-8581-1697667659 with key s3-on-demand-server1-17695-8581-1697667659
DEBU[0000] Found snapshotFile for on-demand-server1-17695-8581-1697670509 with key s3-on-demand-server1-17695-8581-1697670509
DEBU[0000] Found ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697667303 with key local-etcd-snapshot-server1-17695-8581-1697667303
DEBU[0000] Key local-etcd-snapshot-server1-17695-8581-1697667303 not found in snapshotFile list
INFO[0000] Deleting ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697667303
DEBU[0000] Found ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697667362 with key local-etcd-snapshot-server1-17695-8581-1697667362
DEBU[0000] Key local-etcd-snapshot-server1-17695-8581-1697667362 not found in snapshotFile list
INFO[0000] Deleting ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697667362
DEBU[0000] Found ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697670423 with key local-etcd-snapshot-server1-17695-8581-1697670423
DEBU[0000] Found ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697670485 with key local-etcd-snapshot-server1-17695-8581-1697670485
DEBU[0000] Found ETCDSnapshotFile for on-demand-server1-17695-8581-1697667654 with key local-on-demand-server1-17695-8581-1697667654
DEBU[0000] Found ETCDSnapshotFile for on-demand-server1-17695-8581-1697667659 with key local-on-demand-server1-17695-8581-1697667659
DEBU[0000] Found ETCDSnapshotFile for on-demand-server1-17695-8581-1697670509 with key local-on-demand-server1-17695-8581-1697670509
DEBU[0000] Found ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697667303 with key s3-etcd-snapshot-server1-17695-8581-1697667303
DEBU[0000] Key s3-etcd-snapshot-server1-17695-8581-1697667303 not found in snapshotFile list
INFO[0000] Deleting ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697667303
DEBU[0000] Found ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697667362 with key s3-etcd-snapshot-server1-17695-8581-1697667362
DEBU[0000] Key s3-etcd-snapshot-server1-17695-8581-1697667362 not found in snapshotFile list
INFO[0000] Deleting ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697667362
DEBU[0000] Found ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697670423 with key s3-etcd-snapshot-server1-17695-8581-1697670423
DEBU[0000] Found ETCDSnapshotFile for etcd-snapshot-server1-17695-8581-1697670485 with key s3-etcd-snapshot-server1-17695-8581-1697670485
DEBU[0000] Found ETCDSnapshotFile for on-demand-server1-17695-8581-1697667654 with key s3-on-demand-server1-17695-8581-1697667654
DEBU[0000] Found ETCDSnapshotFile for on-demand-server1-17695-8581-1697667659 with key s3-on-demand-server1-17695-8581-1697667659
DEBU[0000] Found ETCDSnapshotFile for on-demand-server1-17695-8581-1697670509 with key s3-on-demand-server1-17695-8581-1697670509
INFO[0000] Reconciliation of ETCDSnapshotFile resources complete
Note the node name "server1-8581" recorded in the output below(previous node name):
kubectl get etcdsnapshotfile
NAME SNAPSHOTNAME NODE LOCATION SIZE CREATIONTIME
local-etcd-snapshot-server1-17695-8581-1697667303-96f01c etcd-snapshot-server1-17695-8581-1697667303 server1-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-server1-17695-8581-1697667303 7155744 2023-10-18T22:15:03Z
local-etcd-snapshot-server1-17695-8581-1697667362-2244bc etcd-snapshot-server1-17695-8581-1697667362 server1-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-server1-17695-8581-1697667362 7557152 2023-10-18T22:16:02Z
local-etcd-snapshot-server1-17695-8581-1697670423-aa7833 etcd-snapshot-server1-17695-8581-1697670423 server1-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-server1-17695-8581-1697670423 12505120 2023-10-18T23:07:03Z
local-etcd-snapshot-server1-17695-8581-1697670485-2a4faf etcd-snapshot-server1-17695-8581-1697670485 server1-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-server1-17695-8581-1697670485 12505120 2023-10-18T23:08:05Z
local-etcd-snapshot-server1-8581-1697667062-1f9348 etcd-snapshot-server1-8581-1697667062 server1-8581 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-server1-8581-1697667062 5410848 2023-10-18T22:11:02Z
local-etcd-snapshot-server1-8581-1697667243-888329 etcd-snapshot-server1-8581-1697667243 server1-8581 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-server1-8581-1697667243 6762528 2023-10-18T22:14:03Z
local-on-demand-server1-17695-8581-1697667654-951f0b on-demand-server1-17695-8581-1697667654 server1-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server1-17695-8581-1697667654 10063904 2023-10-18T22:20:54Z
local-on-demand-server1-17695-8581-1697667659-36d75d on-demand-server1-17695-8581-1697667659 server1-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server1-17695-8581-1697667659 10113056 2023-10-18T22:20:59Z
local-on-demand-server1-17695-8581-1697670509-f1c5c5 on-demand-server1-17695-8581-1697670509 server1-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server1-17695-8581-1697670509 12505120 2023-10-18T23:08:29Z
local-on-demand-server2-17695-8581-1697667662-7ca7eb on-demand-server2-17695-8581-1697667662 server2-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server2-17695-8581-1697667662 10231840 2023-10-18T22:21:02Z
local-on-demand-server2-17695-8581-1697669375-678e95 on-demand-server2-17695-8581-1697669375 server2-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server2-17695-8581-1697669375 12689440 2023-10-18T22:49:35Z
local-on-demand-server3-17695-8581-1697667640-fd4d00 on-demand-server3-17695-8581-1697667640 server3-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server3-17695-8581-1697667640 9834528 2023-10-18T22:20:40Z
local-on-demand-server3-17695-8581-1697667646-82845d on-demand-server3-17695-8581-1697667646 server3-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server3-17695-8581-1697667646 10002464 2023-10-18T22:20:46Z
local-on-demand-server3-17695-8581-1697667652-a77e6d on-demand-server3-17695-8581-1697667652 server3-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server3-17695-8581-1697667652 10068k 2023-10-18T22:20:52Z
local-on-demand-server3-17695-8581-1697667658-2cf5ad on-demand-server3-17695-8581-1697667658 server3-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server3-17695-8581-1697667658 10186784 2023-10-18T22:20:58Z
local-on-demand-server3-17695-8581-1697667663-0a4f57 on-demand-server3-17695-8581-1697667663 server3-17695-8581 file:///var/lib/rancher/k3s/server/db/snapshots/on-demand-server3-17695-8581-1697667663 10235936 2023-10-18T22:21:03Z
s3-etcd-snapshot-server1-17695-8581-1697667303-c76d87 etcd-snapshot-server1-17695-8581-1697667303 server1-17695-8581 s3://sonobuoy-results/arch-k3ssnap/commit-setup/server1-17695-8581/etcd-snapshot-server1-17695-8581-1697667303 7155744 2023-10-18T22:15:03Z
s3-etcd-snapshot-server1-17695-8581-1697667362-9303ba etcd-snapshot-server1-17695-8581-1697667362 server1-17695-8581 s3://sonobuoy-results/arch-k3ssnap/commit-setup/server1-17695-8581/etcd-snapshot-server1-17695-8581-1697667362 7557152 2023-10-18T22:16:02Z
s3-etcd-snapshot-server1-17695-8581-1697670423-70d6fd etcd-snapshot-server1-17695-8581-1697670423 server1-17695-8581 s3://sonobuoy-results/arch-k3ssnap/commit-setup/server1-17695-8581/etcd-snapshot-server1-17695-8581-1697670423 12505120 2023-10-18T23:07:03Z
s3-etcd-snapshot-server1-17695-8581-1697670485-db03ba etcd-snapshot-server1-17695-8581-1697670485 server1-17695-8581 s3://sonobuoy-results/arch-k3ssnap/commit-setup/server1-17695-8581/etcd-snapshot-server1-17695-8581-1697670485 12505120 2023-10-18T23:08:05Z
s3-on-demand-server1-17695-8581-1697667654-098090 on-demand-server1-17695-8581-1697667654 server1-17695-8581 s3://sonobuoy-results/arch-k3ssnap/commit-setup/server1-17695-8581/on-demand-server1-17695-8581-1697667654 10063904 2023-10-18T22:20:54Z
s3-on-demand-server1-17695-8581-1697667659-737d88 on-demand-server1-17695-8581-1697667659 server1-17695-8581 s3://sonobuoy-results/arch-k3ssnap/commit-setup/server1-17695-8581/on-demand-server1-17695-8581-1697667659 10113056 2023-10-18T22:20:59Z
s3-on-demand-server1-17695-8581-1697670509-8e79a8 on-demand-server1-17695-8581-1697670509 server1-17695-8581 s3://sonobuoy-results/arch-k3ssnap/commit-setup/server1-17695-8581/on-demand-server1-17695-8581-1697670509 12505120 2023-10-18T23:08:29Z
Current node names:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
agent1-17695-8581 Ready <none> 74m v1.28.2+k3s-3db1d332
server1-17695-8581 Ready control-plane,etcd,master 77m v1.28.2+k3s-3db1d332
server2-17695-8581 Ready control-plane,etcd,master 76m v1.28.2+k3s-3db1d332
server3-17695-8581 Ready control-plane,etcd,master 74m v1.28.2+k3s-3db1d332
sudo ls -lrt /var/lib/rancher/k3s/server/db/snapshots/
total 56360
-rw------- 1 root root 10063904 Oct 18 22:20 on-demand-server1-17695-8581-1697667654
-rw------- 1 root root 10113056 Oct 18 22:20 on-demand-server1-17695-8581-1697667659
-rw------- 1 root root 12505120 Oct 18 23:08 on-demand-server1-17695-8581-1697670509
-rw------- 1 root root 12505120 Oct 18 23:13 etcd-snapshot-server1-17695-8581-1697670783
-rw------- 1 root root 12505120 Oct 18 23:14 etcd-snapshot-server1-17695-8581-1697670842
It appears that cleanup of snapshots from deleted nodes is working as designed, however there appears to be the possibility of a stuck finalizer on etcdsnapshotfile
resources if the snapshot controller is running on the node that is deleted.
In order to avoid this, the node should be stopped for a short period of time (at least a minute to be safe) before being deleted, so that leader-elected controllers can migrate to other nodes.
root@ip-172-31-26-200:~# kubectl get node -l node-role.kubernetes.io/etcd=true
NAME STATUS ROLES AGE VERSION
server1-17695-8581 Ready control-plane,etcd,master 95m v1.28.2+k3s-3db1d332
server2-17695-8581 Ready control-plane,etcd,master 94m v1.28.2+k3s-3db1d332
server3-17695-8581 Ready control-plane,etcd,master 93m v1.28.2+k3s-3db1d332
root@ip-172-31-26-200:~# kubectl get etcdsnapshotfile -l 'etcd.k3s.cattle.io/snapshot-storage-node notin (s3,server1-17695-8581,server2-17695-8581,server3-17695-8581)'
NAME SNAPSHOTNAME NODE LOCATION SIZE CREATIONTIME
local-etcd-snapshot-server1-8581-1697667062-1f9348 etcd-snapshot-server1-8581-1697667062 server1-8581 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-server1-8581-1697667062 5410848 2023-10-18T22:11:02Z
local-etcd-snapshot-server1-8581-1697667243-888329 etcd-snapshot-server1-8581-1697667243 server1-8581 file:///var/lib/rancher/k3s/server/db/snapshots/etcd-snapshot-server1-8581-1697667243 6762528 2023-10-18T22:14:03Z
root@ip-172-31-26-200:~# kubectl get etcdsnapshotfile local-etcd-snapshot-server1-8581-1697667062-1f9348 -o yaml | grep -C1 -E 'deletionTimestamp|finalizers'
deletionGracePeriodSeconds: 0
deletionTimestamp: "2023-10-18T22:20:04Z"
finalizers:
- wrangler.cattle.io/managed-etcd-snapshots-controller
This is a bit of a known issue with wrangler OnDelete handlers; in order to fix it we would need to add code to remove the stuck finalizer.
As a workaround until this can be implemented, the following command can be run to manually clear the finalizers on any snapshots for deleted nodes:
for ESF in $(kubectl get etcdsnapshotfile -o=go-template --template '{{range .items}}{{.metadata.name}} {{end}}' -l 'etcd.k3s.cattle.io/snapshot-storage-node notin (s3,'$(kubectl get node -l node-role.kubernetes.io/etcd=true -o=go-template --template '{{range .items}}{{.metadata.name}},{{end}}')')'); do
kubectl patch etcdsnapshotfile $ESF -p '{"metadata":{"finalizers":null}}' --type=merge;
done
Hey @brandond is this a confirmed issue you're actively working on, or should it be up for grabs?
I mostly left it here just to document it as a possible problem further down the road, and demonstrate the steps to fix it. If it does become a problem and/or someone wants to fix it, it's up for grabs.
from @aganesh-suse
Steps to reproduce:
kubectl
.kubectl get etcdsnapshotfile
Note that the snapshot taken before the node-name was changed is not updated to reflect the fact that it is on the new node. It still appears to be on the deleted node.