kubernetes-csi / external-snapshotter

Sidecar container that watches Kubernetes Snapshot CRD objects and triggers CreateSnapshot/DeleteSnapshot against a CSI endpoint.
Apache License 2.0
502 stars 375 forks source link

Annotation "groupsnapshot.storage.kubernetes.io/volumegroupsnapshot-being-deleted" is missing from VolumeGroupSnapshotContent #1212

Closed xing-yang closed 4 hours ago

xing-yang commented 3 days ago

What happened: When trying to delete a VolumeGroupSnapshotContent with Retain Deletion Policy after its VolumeGroupSnapshot is already deleted, I noticed that it is hanging because there is still finalizer on it.

We need to add the annotation groupsnapshot.storage.kubernetes.io/volumegroupsnapshot-being-deleted: yes on VolumeGroupSnapshotContent when VolumeGroupSnapshot is being deleted.

We actually have a check in shouldDeleteGroupSnapshotContent() in the sidecar that checks if this annotation is present and if so, allow the VolumeGroupSnapshotContent to be deleted.

https://github.com/kubernetes-csi/external-snapshotter/blob/master/pkg/sidecar-controller/groupsnapshot_helper.go#L351

    // 3) shouldDeleteGroupSnapshotContent returns true if AnnVolumeSnapshotBeingDeleted annotation is set
    if metav1.HasAnnotation(groupSnapshotContent.ObjectMeta, utils.AnnVolumeGroupSnapshotBeingDeleted) {
        return true
    }
kubectl delete volumegroupsnapshotcontent groupsnapcontent-747992ea-e150-4c44-ad5a-18a2e5d83dbe

volumegroupsnapshotcontent.groupsnapshot.storage.k8s.io "groupsnapcontent-747992ea-e150-4c44-ad5a-18a2e5d83dbe" deleted
......

kubectl describe volumegroupsnapshotcontent

Name:         groupsnapcontent-747992ea-e150-4c44-ad5a-18a2e5d83dbe
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  groupsnapshot.storage.k8s.io/v1beta1
Kind:         VolumeGroupSnapshotContent
Metadata:
  Creation Timestamp:             2024-11-23T02:01:52Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2024-11-23T02:25:25Z
  Finalizers:
    groupsnapshot.storage.kubernetes.io/volumegroupsnapshotcontent-bound-protection
  Generation:        2
  Resource Version:  102993
  UID:               44f71d50-3d8a-457d-bdf5-dc03999051bf
Spec:
  Deletion Policy:  Retain
  Driver:           hostpath.csi.k8s.io
  Source:
    Volume Handles:
      a789a405-a93d-11ef-9549-66940726f2fd
      ac773916-a93d-11ef-9549-66940726f2fd
  Volume Group Snapshot Class Name:  csi-hostpath-groupsnapclass
  Volume Group Snapshot Ref:
    API Version:       groupsnapshot.storage.k8s.io/v1beta1
    Kind:              VolumeGroupSnapshot
    Name:              new-groupsnapshot-demo
    Namespace:         default
    Resource Version:  102245
    UID:               747992ea-e150-4c44-ad5a-18a2e5d83dbe
Status:
  Creation Time:                 1732327312499129161
  Ready To Use:                  true
  Volume Group Snapshot Handle:  e8779136-a93e-11ef-9549-66940726f2fd
  Volume Snapshot Handle Pair List:
    Snapshot Handle:  e8779147-a93e-11ef-9549-66940726f2fd
    Volume Handle:    a789a405-a93d-11ef-9549-66940726f2fd
    Snapshot Handle:  e8783cd0-a93e-11ef-9549-66940726f2fd
    Volume Handle:    ac773916-a93d-11ef-9549-66940726f2fd
Events:               <none>

Note that we have similar logic that adds snapshot.storage.kubernetes.io/volumesnapshot-being-deleted: "yes" annotation on the VolumeSnapshotContent.

What you expected to happen:

How to reproduce it:

Anything else we need to know?:

Environment:

leonardoce commented 1 day ago

We don’t receive any deletion messages regarding dynamically provisioned VolumeGroupSnapshots with DeletionPolicy set to Retain, because we don’t add the finalizers to the VolumeGroupSnapshot object.

We didn’t add the finalizer on purpose. See:

https://github.com/kubernetes-csi/external-snapshotter/blob/4183faa057cc0e2f09e0b59bb5189893c10013e3/pkg/common-controller/groupsnapshot_controller_helper.go#L1315-L1325

About the annotation...we already have the logic to add it, but we won’t trigger it unless we set the finalizer:

https://github.com/kubernetes-csi/external-snapshotter/blob/4183faa057cc0e2f09e0b59bb5189893c10013e3/pkg/utils/util.go#L525-L529

This works for VolumeSnapshots because we always have the [snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection](http://snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection) finalizer: https://github.com/kubernetes-csi/external-snapshotter/blob/4183faa057cc0e2f09e0b59bb5189893c10013e3/pkg/utils/util.go#L531-L534

What behavior should we have? Should we fix both VolumeSnapshots and VolumeGroupSnapshots?

Thanks!

cc: @xing-yang

xing-yang commented 1 day ago

I couldn't remember why condition 3 was there. I don't want to make changes in VolumeSnapshot since it has been working. Let's drop condition 3 from VolumeGroupSnapshot and add the bound finalizer on VolumeGroupSnapshot for both Delete and Retain.