Open zjuliuf opened 1 month ago
@zjuliuf you would need latest k8s release to verify the fix, such as 1.28.12.
@Jainbrt Thanks a lot, I will try. By the way, which PR of k8s fixed this issue?
@Jainbrt I watched the status of the PV, its status was changed to 'Released' from 'Bound' without 'Failed'.
is this the issue you are talking about https://github.com/kubernetes-csi/external-provisioner/issues/1184 ?
@Jainbrt Looks like the same problem. I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV(this finalizer removed successfully finally) so that another volume deletion request was sent to the volume provisioner. Volume and PV can be deleted succefully finallly, only the deletion process will be executed more than 2 times(This doesn't happen every time and I guess because events of PV sometimes merge).
In order to reduce redundant deletion processes when 'addFinalizer' is true, can we modify the code like below? In the codes below, If only the DeletionTimestamp and Finalizers of pv change, the pv will not be added to the queue. @jsafrane @xing-yang @Jainbrt
volumeHandler := cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) { controller.enqueueVolume(obj) },
UpdateFunc: func(oldObj, newObj interface{}) {
if controller.addFinalizer {
oldPv, ok := oldObj.(*v1.PersistentVolume)
if !ok {
return
}
newPv, ok := newObj.(*v1.PersistentVolume)
if !ok {
return
}
newPv.ResourceVersion = oldPv.ResourceVersion
newPv.ManagedFields = oldPv.ManagedFields
newPv.DeletionTimestamp = oldPv.DeletionTimestamp
newPv.DeletionGracePeriodSeconds = oldPv.DeletionGracePeriodSeconds
if len(newPv.Finalizers) <= len(oldPv.Finalizers) {
newPv.Finalizers = oldPv.Finalizers
}
if equality.Semantic.DeepEqual(oldPv, newPv) {
klog.V(2).Infof("debug, no changes except deletionTimestamp and finalizers")
return
}
}
controller.enqueueVolume(newObj)
},
DeleteFunc: func(obj interface{}) { controller.forgetVolume(obj) },
}
same problem occurs in my production environment
I checked the code about 'syncVolume' in sig-storage-lib-external-provisioner v10.0.1 and watched the status of the PV. Because of the changing of PV object, sig-storage-lib-external-provisioner received multiple updating events, so that function 'syncVolume' executed multiple times, and then function 'shouldDelete' return 'true' because "external-provisioner.volume.kubernetes.io/finalizer" still exists in PV
What events did the provisioner receive and who / what caused them? You can either get a diff in UpdateFunc in the provisioner or get it from the API server audit logs. It looks like something is racy here.
I would prefer to see what causes these events before we start ignoring them.
What happened:
When we try to delete volume using new new sig-storage-lib-external-provisioner v10.0.1, we still notice 2 or more volume deletion requests in the logs. As a result of which PV deleting costs more time.
What you expected to happen:
Only 1 volume deletion request during the pv deletion.
Simple analysis:
When volume related to the PV was deleted successfully, sig-storage-lib-external-provisioner begins to Delete PV,then PV object was updated with a deletionTimestamp, so that the PV was added to volumeQueue, then exec deleteVolumeOperation again because the finalizer "external-provisioner.volume.kubernetes.io/finalizer" exists. finally, 2 or more volume deletion requests was sent during the pv deletion.
Below is the code analysis: (first time deletion) function 'shouldDelete' returns true ---> ctrl.provisioner.Delete deletes volume related to the PV ---> ctrl.client.CoreV1().PersistentVolumes().Delete deletes PV ---> PV was updated with adding deletionTimestamp ---> try to remove finalizer "external-provisioner.volume.kubernetes.io/finalizer"
(second time deletion) During the first time deletion, when PV was updated with adding deletionTimestamp(before removing finalizer), the PV was added to the volumeQueue again. Then the second time deletion began. Becuase the finalizer "external-provisioner.volume.kubernetes.io/finalizer" still existed and the PV status was Released, so the function 'shouldDelete' still returned 'true' and executed ctrl.provisioner.Delete again.
How to reproduce it:
Call NewProvisionController with an option AddFinalizer(true) Create a storageclass and PVC Delete PVC Notice the status of PV and logs of provisioner
Environment:
Driver version: Kubernetes version (use kubectl version): v1.28 OS (e.g. from /etc/os-release): Kernel (e.g. uname -a): Install tools: Others: sig-storage-lib-external-provisioner v10.0.1