Closed danielvegamyhre closed 1 month ago
hi, @danielvegamyhre I'm curious about a question. Foreground is to delete all the child resources under it before deleting the Jobset. So, when !js.DeletionTimestamp.IsZero(), we need to do something, right? like this:
// reconcile is the internal method containing the core JobSet reconciliation logic.
func (r *JobSetReconciler) reconcile(ctx context.Context, js *jobset.JobSet, updateStatusOpts *statusUpdateOpts) (ctrl.Result, error) {
log := ctrl.LoggerFrom(ctx).WithValues("jobset", klog.KObj(js))
ctx = ctrl.LoggerInto(ctx, log)
. ...
if !js.DeletionTimestamp.IsZero() {
// do something
}
log.V(2).Info("Reconciling JobSet")
...
return ctrl.Result{}, nil
}
@googs1025 k8s itself should handle deleting owned resources using cascading deletion. If you can try to reproduce this issue I would appreciate it though. Create a JobSet with as many jobs as you can, then try deleting the JobSet using foreground deletion.
Actually it's ok, I just reproduced the issue myself, confirmed foreground deletion doesn't work properly.
Foreground is to delete all the child resources under it before deleting the Jobset I tested it. This problem is indeed reproduced
root@VM-0-15-ubuntu:/home/ubuntu# kubectl delete jobset network-jobset --cascade=foreground --context kind-cluster1 jobset.jobset.x-k8s.io "network-jobset" deleted
root@VM-0-15-ubuntu:/home/ubuntu# kubectl get job --context kind-cluster1 -w
NAME COMPLETIONS DURATION AGE
network-jobset-leader-0 0/1 92s 92s
network-jobset-workers-0 0/2 92s 92s
network-jobset-workers-1 0/2 92s 92s
network-jobset-workers-10 0/2 92s 92s
network-jobset-workers-11 0/2 92s 92s
network-jobset-workers-12 0/2 91s 92s
network-jobset-workers-2 0/2 92s 92s
network-jobset-workers-3 0/2 91s 92s
network-jobset-workers-4 0/2 92s 92s
network-jobset-workers-5 0/2 92s 92s
network-jobset-workers-6 0/2 92s 92s
network-jobset-workers-7 0/2 92s 92s
network-jobset-workers-8 0/2 92s 92s
network-jobset-workers-9 0/2 91s 92s
network-jobset-workers-1 0/2 95s 95s
network-jobset-workers-1 0/2 0s
network-jobset-workers-1 0/2 0s
network-jobset-workers-1 0/2 0s 0s
network-jobset-workers-1 0/2 1s 1s
network-jobset-leader-0 0/1 96s 96s
network-jobset-workers-11 0/2 96s 96s
network-jobset-workers-12 0/2 95s 96s
network-jobset-workers-3 0/2 96s 97s
network-jobset-workers-0 0/2 97s 97s
network-jobset-workers-8 0/2 97s 97s
network-jobset-workers-6 0/2 97s 97s
network-jobset-workers-10 0/2 97s 97s
network-jobset-workers-5 0/2 97s 97s
network-jobset-workers-7 0/2 97s 97s
network-jobset-workers-2 0/2 97s 97s
network-jobset-workers-4 0/2 97s 97s
network-jobset-workers-9 0/2 96s 97s
network-jobset-workers-1 0/2 2s 2s
JobSets deleted with foreground cascading deletion policy set the deletion timestamp, and a finalizer is added to the jobset which is only removed once all it's owned resources are deleted.
Right now, the jobset controller isn't currently filtering out jobsets with deletion timestamps set, so the controller is continuing to reconcile the jobset and recreate it's resources that are being deleted.
Unfortunately this went unnoticed until now because users have been using the default deletion policy (background cascading deletion) where the jobset object is deleted immediately and owned resources are cleaned up asynchronously, during which time no jobset object exists to reconcile anymore.
However, a user using foreground deletion to delete jobSets just reported this issue, so we need to add an event filter to prevent the controller from reconciling jobsets with deletion timestamp set.