Open andyzhangx opened 4 months ago
Team, do we have any debug points on this issue, as we are also hitting this in our test wih 5.0.1 provisioner image? @carlory @jsafrane
I appreciate your reminder. I didn't know this issue. Is it easy to reproduce? It may be a blocker to promote this feature to beta. Unfortunately, we have promoted it to beta and enabled it by default. cc @xing-yang @deepakkinni
I will investigate this issue next week. Can you provide reproducible steps or a simple test case?
@carlory not sure if this issue is also related https://github.com/kubernetes-csi/external-provisioner/issues/1235 but that has more details.
If we have pre-provisioned PV bound to PVC and gets deleted. PV first move to failed state ( during PVC delete) and then on stuck in Terminating( during PV delete)
Below are the provisioner logs:
I0620 10:50:11.805561 1 controller.go:1132] "shouldDelete" PV="pv-fbeqn"
I0620 10:50:11.805575 1 controller.go:1548] "Started" PV="pv-fbeqn"
I0620 10:50:11.805665 1 connection.go:264] "GRPC call" method="/csi.v1.Controller/DeleteVolume" request="{\"volume_id\":\"453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-rgupwoxrg\"}"
I0620 10:50:12.894422 1 leaderelection.go:281] successfully renewed lease ibm-spectrum-scale-csi-driver/spectrumscale-csi-ibm-com
I0620 10:50:21.453285 1 connection.go:270] "GRPC response" response="{}" err=null
I0620 10:50:21.453335 1 controller.go:1563] "Volume deleted" PV="pv-fbeqn"
I0620 10:50:21.528417 1 controller.go:1607] "PersistentVolume deleted succeeded" PV="pv-fbeqn"
I0620 10:50:21.528709 1 controller.go:1171] "handleProtectionFinalizer" PV="&PersistentVolume{ObjectMeta:{pv-fbeqn 2d464b99-70d9-4001-b4bd-058d4a51dcb4 19599924 0 2024-06-20 10:47:32 +0000 UTC 2024-06-20 10:50:21 +0000 UTC 0xc0009efa80 map[author:kubetest] map[pv.kubernetes.io/bound-by-controller:yes] [] [external-provisioner.volume.kubernetes.io/finalizer] [{OpenAPI-Generator Update v1 2024-06-20 10:47:32 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:labels\":{\".\":{},\"f:author\":{}}},\"f:spec\":{\"f:accessModes\":{},\"f:capacity\":{\".\":{},\"f:storage\":{}},\"f:csi\":{\".\":{},\"f:driver\":{},\"f:volumeHandle\":{}},\"f:persistentVolumeReclaimPolicy\":{},\"f:storageClassName\":{},\"f:volumeMode\":{}}} } {csi-provisioner Update v1 2024-06-20 10:47:32 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:finalizers\":{\"v:\\\"external-provisioner.volume.kubernetes.io/finalizer\\\"\":{}}}} } {kube-controller-manager Update v1 2024-06-20 10:47:32 +0000 UTC FieldsV1 {\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:pv.kubernetes.io/bound-by-controller\":{}}},\"f:spec\":{\"f:claimRef\":{\".\":{},\"f:apiVersion\":{},\"f:kind\":{},\"f:name\":{},\"f:namespace\":{},\"f:resourceVersion\":{},\"f:uid\":{}}}} } {kube-controller-manager Update v1 2024-06-20 10:50:11 +0000 UTC FieldsV1 {\"f:status\":{\"f:message\":{},\"f:phase\":{}}} status}]},Spec:PersistentVolumeSpec{Capacity:ResourceList{storage: {{8589934592 0} {<nil>} BinarySI},},PersistentVolumeSource:PersistentVolumeSource{GCEPersistentDisk:nil,AWSElasticBlockStore:nil,HostPath:nil,Glusterfs:nil,NFS:nil,RBD:nil,ISCSI:nil,Cinder:nil,CephFS:nil,FC:nil,Flocker:nil,FlexVolume:nil,AzureFile:nil,VsphereVolume:nil,Quobyte:nil,AzureDisk:nil,PhotonPersistentDisk:nil,PortworxVolume:nil,ScaleIO:nil,Local:nil,StorageOS:nil,CSI:&CSIPersistentVolumeSource{Driver:spectrumscale.csi.ibm.com,VolumeHandle:453210684566980716;34690B0A:6606C9E1;path=/ibm/fs1/dir-rgupwoxrg,ReadOnly:false,FSType:,VolumeAttributes:map[string]string{},ControllerPublishSecretRef:nil,NodeStageSecretRef:nil,NodePublishSecretRef:nil,ControllerExpandSecretRef:nil,NodeExpandSecretRef:nil,},},AccessModes:[ReadWriteMany],ClaimRef:&ObjectReference{Kind:PersistentVolumeClaim,Namespace:test-driver-static-sc-14-1718880431,Name:pvc-kqdmmbm,UID:f3821261-6e7a-4c91-b3e5-59871d7dfc2a,APIVersion:v1,ResourceVersion:19599399,FieldPath:,},PersistentVolumeReclaimPolicy:Delete,StorageClassName:sc-wdijv,MountOptions:[],VolumeMode:*Filesystem,NodeAffinity:nil,VolumeAttributesClassName:nil,},Status:PersistentVolumeStatus{Phase:Failed,Message:error getting deleter volume plugin for volume \"pv-fbeqn\": no deletable volume plugin matched,Reason:,LastPhaseTransitionTime:2024-06-20 10:50:11 +0000 UTC,},}"
@andyzhangx, there have been changes to use Patch; hopefully, you shouldn't see this error 5.0.1. Can you try it again?
@deepakkinni we are using patch verb but still seeing above error.
@deepakkinni we are using patch verb but still seeing above error.
Could you please post both the attacher and provisioner logs? Post the whole thing instead of just snippets.
HI @deepakkinni , @shrutinipane has put all the logs here https://github.com/kubernetes-csi/external-provisioner/issues/1235#issuecomment-2185804406
@Jainbrt ,the attached logs don't show the issue mentioned by the reporter.
@andyzhangx, could you try to reproduce this issue on the latest external-provisioner?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
What happened: pv is stuck Terminating due to race condition when csi-attacher removes finalizer and csi-provisioner tries to remove another finalizer
symptom pv is stuck Terminating when pvc is deleted with
HonorPVReclaimPolicy
feature gate enabled.process csi-attacher would remove the finalizer(e.g. external-attacher/disk-csi-azure-com) when pv is detached, and later on csi-provisioner the would try to remove the
external-provisioner.volume.kubernetes.io/finalizer
finalizer when pvc is deleted and since pv object is in the cache of provisioner , the finalizer deletion always fail until maximum 6 retries exceeds, and finally leaves the pv in Terminating state forever (the underlying storage is deleted before finalizer remove fails)./kind bug cc @jsafrane
What you expected to happen:
How to reproduce it:
Anything else we need to know?:
Environment:
kubectl version
): 1.27uname -a
):