Closed Preisschild closed 1 year ago
We discussed this change internally, and even though it definitely resolves a real issue, we feel it might be not the right way:
CACPPT
controls the etcd leave process, ensuring that the cp machine leaves etcd; talosctl reset
running concurrently also does etcd leave, and might lead to some surprisesThis whole issue is a cluster-wide orchestration which can be done outside of CAPI scope in a separate controller.
One could watch Cluster
and Machine
resources, triggered on changes, doing the following reconciliation:
CACPPT controls the etcd leave process, ensuring that the cp machine leaves etcd; talosctl reset running concurrently also does etcd leave, and might lead to some surprises
Fortunately, this isn't an issue. CACPPT removes the node from etcd prior to a deletionTimestamp being set and thus before the reset request is sent. Been using this for a few months now, and I didn't have issues yet.
But yeah, I understand the rest. Maybe CAPI will provide a "standard" to handle bootstrap-provider specific cleanup tasks in the future.
Fortunately, this isn't an issue. CACPPT removes the node from etcd prior to a deletionTimestamp being set and thus before the reset request is sent. Been using this for a few months now, and I didn't have issues yet.
This was actually wrong order, and we fixed it :) it's coming in the next release.
Another issue is the node being down/not accessible during termination... Should we wait? Should we not? Should we block machine deletion?
External controller can do clean up independent of the machine state.
Fixes: https://github.com/siderolabs/cluster-api-bootstrap-provider-talos/issues/159
This feature makes use of the CAPI pre-terminate hook, which is implemented in CAPI here.
The hook just waits until all annotations prefixed with
pre-terminate.delete.hook.machine.cluster.x-k8s.io
are removed before it allows the Machine to be deleted from the infrastructure provider (i.e.: VM is removed from cloud provider).This MR does the following:
pre-terminate.delete.hook.machine.cluster.x-k8s.io/talos-reset: cabpt-controller
annotation to the machines which are being provisioned