vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.71k stars 1.4k forks source link

Respect the "snapshot.storage.kubernetes.io/is-default-class: true" annotation of VolumeSnapshotClass when taking the CSI snapshot #8294

Open ywk253100 opened 3 weeks ago

ywk253100 commented 3 weeks ago

Currently, Velero chooses the VolumeSnapshotClass with the annotation velero.io/csi-volumesnapshot-class: "true" added when taking CSI snapshot, there is an official annotation snapshot.storage.kubernetes.io/is-default-class: true introduced to specify a default VolumeSnapshotClass for VolumeSnapshot that don't request any particular class to bind to.

So we could refine the current VolumeSnapshotClass choosing logic as follows:

  1. Use the default VolumeSnapshotClass annotated with snapshot.storage.kubernetes.io/is-default-class: true
  2. Else use the VolumeSnapshotClass annotated with velero.io/csi-volumesnapshot-class: "true"

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

kaovilai commented 3 weeks ago

Is there any reason there isn't a third else for looping through drivers and picking one that works?

ywk253100 commented 3 weeks ago

Is there any reason there isn't a third else for looping through drivers and picking one that works?

Is this what you expect? This also makes sense to me.

  1. Use the default VolumeSnapshotClass annotated with snapshot.storage.kubernetes.io/is-default-class: true
  2. Else choose the VolumeSnapshotClass whose driver field matches the PVC's StorageClass
  3. Else use the VolumeSnapshotClass annotated with velero.io/csi-volumesnapshot-class: "true"
kaovilai commented 3 weeks ago
  1. Use the default VolumeSnapshotClass annotated with snapshot.storage.kubernetes.io/is-default-class: true
  2. Else choose the VolumeSnapshotClass whose driver field matches the PVC's StorageClass annotated with velero.io/csi-volumesnapshot-class: "true"
    1. since there can be multiple volumesnapshotclasses per driver, and you can add parameters in a different way for the same driver.
  3. Else choose the VolumeSnapshotClass whose driver field matches the PVC's StorageClass without annotation
  4. Else use the VolumeSnapshotClass annotated with velero.io/csi-volumesnapshot-class: "true" but driver does not match (not sure if this is even possible or not.. but last resort)
anshulahuja98 commented 2 weeks ago

https://github.com/vmware-tanzu/velero-plugin-for-csi/pull/178/files#diff-0f38f067df1a3a5e5fb78bd16bfeb63f7c7c89524abc32b98a875b6152474bb4

I believe snapshot.storage.kubernetes.io/is-default-class: true should come after all these 3 if

anshulahuja98 commented 2 weeks ago

Because if we put this above the velero annotation, the existing behaviour will break in some sense for users.

kaovilai commented 2 weeks ago

If this is breaking change we can add feature flag to enable the new behavior.

ywk253100 commented 2 weeks ago

If this is breaking change we can add feature flag to enable the new behavior.

I don't think the feature flag is a good option because this will introduce another configuration.

We should try to avoid introducing the break change, so how about choosing the VolumeSnapshotClass as the following priority:

  1. The VolumeSnapshotClass annotated with velero.io/csi-volumesnapshot-class: "true"
  2. The VolumeSnapshotClass annotated with snapshot.storage.kubernetes.io/is-default-class: true
  3. The VolumeSnapshotClass whose driver field matches the PVC's StorageClass

And report error if the above logic matches more than 1 VolumeSnapshotClass

anshulahuja98 commented 2 weeks ago
// If a snapshot class is sent for provider in PVC annotations, use that
snapshotClass, err := GetVolumeSnapshotClassFromPVCAnnotationsForDriver(pvc, provisioner, snapshotClasses)
if err != nil {
    log.Debugf("Didn't find VolumeSnapshotClass from PVC annotations: %v", err)
}
if snapshotClass != nil {
    return snapshotClass, nil
}

// If there is no annotation in PVC, attempt to fetch it from backup annotations
snapshotClass, err = GetVolumeSnapshotClassFromBackupAnnotationsForDriver(backup, provisioner, snapshotClasses)
if err != nil {
    log.Debugf("Didn't find VolumeSnapshotClass from Backup annotations: %v", err)
}
if snapshotClass != nil {
    return snapshotClass, nil
}

// fallback to default behaviour of fetching snapshot class based on label on VSClass 
// velero.io/csi-volumesnapshot-class: "true"
snapshotClass, err = GetVolumeSnapshotClassForStorageClass(provisioner, snapshotClasses)
if err != nil || snapshotClass == nil {
    return nil, errors.Wrap(err, "error getting volumesnapshotclass")
}

// fallback to default behaviour of fetching snapshot class based on label on VSClass 
// snapshot.storage.kubernetes.io/is-default-class: true
snapshotClass, err = GetVolumeSnapshotClassForStorageClass(provisioner, snapshotClasses)
if err != nil || snapshotClass == nil {
    return nil, errors.Wrap(err, "error getting volumesnapshotclass")
}
anshulahuja98 commented 2 weeks ago

@kaovilai / @ywk253100 how does above draft look?

kaovilai commented 2 weeks ago

Looks good, missing the third one from https://github.com/vmware-tanzu/velero/issues/8294#issuecomment-2421855002

anshulahuja98 commented 2 weeks ago

I personally don't see a need for it at this point.

We should expect customer to put either of snapshot.storage.kubernetes.io/is-default-class: true snapshot.storage.kubernetes.io/is-default-class: true

This keeps the behaviour deterministic. Let me know if there is any user ask for this.

kaovilai commented 2 weeks ago

It just removes a step from pre-requisite that's all. Many local/dev cluster like KinD or crc would most likely only have one. This would keep velero install scripts generic for several local cluster environments. But that can also be done outside of velero so I am ok skipping non-deterministic for velero.

reasonerjt commented 6 days ago

IMHO, although this issue improves user experience, it may not be a high priority for v1.16.