kubernetes-sigs / sig-storage-lib-external-provisioner

Apache License 2.0
536 stars 174 forks source link

Delete volume requests are sent twice with new sig-storage-lib-external-provisioner v10.0.1 because of the finalizer "external-provisioner.volume.kubernetes.io/finalizer" #176

Open zjuliuf opened 1 month ago

zjuliuf commented 1 month ago

What happened:

When we try to delete volume using new new sig-storage-lib-external-provisioner v10.0.1, we still notice 2 or more volume deletion requests in the logs. As a result of which PV deleting costs more time.

What you expected to happen:

Only 1 volume deletion request during the pv deletion.

Simple analysis:

When volume related to the PV was deleted successfully, sig-storage-lib-external-provisioner begins to Delete PV,then PV object was updated with a deletionTimestamp, so that the PV was added to volumeQueue, then exec deleteVolumeOperation again because the finalizer "external-provisioner.volume.kubernetes.io/finalizer" exists. finally, 2 or more volume deletion requests was sent during the pv deletion.

Below is the code analysis: (first time deletion) function 'shouldDelete' returns true ---> ctrl.provisioner.Delete deletes volume related to the PV ---> ctrl.client.CoreV1().PersistentVolumes().Delete deletes PV ---> PV was updated with adding deletionTimestamp ---> try to remove finalizer "external-provisioner.volume.kubernetes.io/finalizer"

(second time deletion) During the first time deletion, when PV was updated with adding deletionTimestamp(before removing finalizer), the PV was added to the volumeQueue again. Then the second time deletion began. Becuase the finalizer "external-provisioner.volume.kubernetes.io/finalizer" still existed and the PV status was Released, so the function 'shouldDelete' still returned 'true' and executed ctrl.provisioner.Delete again.

volumeHandler := cache.ResourceEventHandlerFuncs{
    AddFunc:    func(obj interface{}) { controller.enqueueVolume(obj) },
    UpdateFunc: func(oldObj, newObj interface{}) { controller.enqueueVolume(newObj) },
    DeleteFunc: func(obj interface{}) { controller.forgetVolume(obj) },
}

How to reproduce it:

Call NewProvisionController with an option AddFinalizer(true) Create a storageclass and PVC Delete PVC Notice the status of PV and logs of provisioner

Environment:

Driver version: Kubernetes version (use kubectl version): v1.28 OS (e.g. from /etc/os-release): Kernel (e.g. uname -a): Install tools: Others: sig-storage-lib-external-provisioner v10.0.1

Jainbrt commented 1 month ago

@zjuliuf you would need latest k8s release to verify the fix, such as 1.28.12.

zjuliuf commented 1 month ago

@Jainbrt Thanks a lot, I will try. By the way, which PR of k8s fixed this issue?

Jainbrt commented 1 month ago

https://github.com/kubernetes/kubernetes/pull/125767

zjuliuf commented 1 month ago

@Jainbrt I watched the status of the PV, its status was changed to 'Released' from 'Bound' without 'Failed'.

Jainbrt commented 1 month ago

is this the issue you are talking about https://github.com/kubernetes-csi/external-provisioner/issues/1184 ?

zjuliuf commented 1 month ago

@Jainbrt Looks like the same problem. I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV(this finalizer removed successfully finally) so that another volume deletion request was sent to the volume provisioner. Volume and PV can be deleted succefully finallly, only the deletion process will be executed more than 2 times(This doesn't happen every time and I guess because events of PV sometimes merge).

zjuliuf commented 1 month ago

In order to reduce redundant deletion processes when 'addFinalizer' is true, can we modify the code like below? In the codes below, If only the DeletionTimestamp and Finalizers of pv change, the pv will not be added to the queue. @jsafrane @xing-yang @Jainbrt

volumeHandler := cache.ResourceEventHandlerFuncs{
    AddFunc:    func(obj interface{}) { controller.enqueueVolume(obj) },
    UpdateFunc: func(oldObj, newObj interface{}) {
        if controller.addFinalizer {
            oldPv, ok := oldObj.(*v1.PersistentVolume)
            if !ok {
                return
            }
            newPv, ok := newObj.(*v1.PersistentVolume)
            if !ok {
                return
            }
            newPv.ResourceVersion = oldPv.ResourceVersion
            newPv.ManagedFields = oldPv.ManagedFields
            newPv.DeletionTimestamp = oldPv.DeletionTimestamp
            newPv.DeletionGracePeriodSeconds = oldPv.DeletionGracePeriodSeconds
            if len(newPv.Finalizers) <= len(oldPv.Finalizers) {
                newPv.Finalizers = oldPv.Finalizers
            }
            if equality.Semantic.DeepEqual(oldPv, newPv) {
                klog.V(2).Infof("debug, no changes except deletionTimestamp and finalizers")
                return
            }
        }
        controller.enqueueVolume(newObj)
    },
    DeleteFunc: func(obj interface{}) { controller.forgetVolume(obj) },
}
Chaunceyctx commented 1 month ago

same problem occurs in my production environment

jsafrane commented 4 weeks ago

I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV

What events did the provisioner receive and who / what caused them? You can either get a diff in UpdateFunc in the provisioner or get it from the API server audit logs. It looks like something is racy here.

I would prefer to see what causes these events before we start ignoring them.