Open philbrookes opened 2 years ago
ping @jmprusi @maleck13
@philbrookes @jmprusi could please provide a reference to what a "soft finalizer" is?
Related, the current state machine: https://docs.google.com/drawings/d/1LyQD0qM62z6HfMRsYqNKpSaIrE4xAO3c-Yn-XoyXWe8/edit
@ncdc updated the issue with additional context on the Soft finalizers.
== example scenario
@philbrookes to set up a call to discuss potential implementations.
I'm interested in attending the call.
Discussed and came to a decision: The "soft finalizers" or "Sync Finalizers" will be added to namespaces, and syncers last check before deleting this item, will be to check if there are any non-empty "Sync Finalizers" related to that sync target on the object's namespace before deletion.
@philbrookes will look to contribute this to KCP
/transfer-issue contrib-tmc
Is your feature request related to a problem? Please describe. While investigating workload migration using the advanced scheduling introduced in 0.5.0, I found that soft finalizers can be used (in 0.6.0) to allow me to schedule the removal of resources from the losing cluster, to allow a graceful migration with no downtime.
However with further thought we've realised that the pods running from the deployment could easily be relying on resources that we are unaware of (e.g. a secret, or a CR for a database operator, etc.) and although the deployment, service and ingress are still migrating gracefully the pod itself will crash when the resources it relies on are deleted ungracefully from the losing cluster.
Describe the solution you'd like An implementation of the soft finalizers at a namespace level.
As the namespace is the unit of currency for the advanced scheduling feature, and when a namespace is rescheduled all workload resources inside that namespace will need to move together; it seems that the most common use-case of soft finalizers will be to gracefully move ALL of the resources within a namespace to a new workloadcluster. Using namespace scoped softfinalizers allows us to do this gracefully, instead of via the individual resources.
Describe alternatives you've considered The user could be educated to set the resources relied on, as owners of the deployment, thereby causing the workloadcluster to prevent tidying up those resources until the deployment is removed.
Additional context
Cluster Finalizer (Soft Finalizer): https://github.com/kcp-dev/kcp/blob/4d74a085a82affafba7f6d91818d0f0c6953e1d4/pkg/apis/workload/v1alpha1/types.go#L46-L56