Bug found
The cluster cannot be deleted if the user first scale down a cluster, and then delete it. The deletion process got stuck
Why
When the cluster is scaling down, a machine is deleted, with PreTerminateDeleteHookAnnotationPrefix set
Machine controller set the etcd.k3s.cattle.io/remove to that node, waiting for the etcd.k3s.cattle.io/removed-node-name to be set
User triggered the deletion to the whole cluster, which deletes other healthy control plane
Machine controller could not check etcd.k3s.cattle.io/removed-node-name annotation, as the whole cluster is down. And it stuck forever, with the machine not deleted
Fix
Before machine controller remove an etcd member, it will check if the cluster/cp is already under deletion, or if the removed machine is the last node, or if the machine has no noderef. If so, it will skip etcd removal.
Bug found The cluster cannot be deleted if the user first scale down a cluster, and then delete it. The deletion process got stuck
Why
PreTerminateDeleteHookAnnotationPrefix
setetcd.k3s.cattle.io/remove
to that node, waiting for theetcd.k3s.cattle.io/removed-node-name
to be setetcd.k3s.cattle.io/removed-node-name
annotation, as the whole cluster is down. And it stuck forever, with the machine not deletedFix Before machine controller remove an etcd member, it will check if the cluster/cp is already under deletion, or if the removed machine is the last node, or if the machine has no noderef. If so, it will skip etcd removal.