kubevirt / containerized-data-importer

Data Import Service for kubernetes, designed with kubevirt in mind.
Apache License 2.0
428 stars 269 forks source link

Random Storage Class Selection During Host-Assisted Snapshot Restore #3530

Open TheRealSibasishBehera opened 6 days ago

TheRealSibasishBehera commented 6 days ago

What happened: During host-assisted snapshot restore operations, CDI creates intermediate PVCs with randomly selected storage class (matching only the CSI driver), ignoring both the DataVolume's specified storage class and the source PVC's storage class. This leads to restore failures when an incompatible storage class is selected.

Error seen:

in the host-assisted pvc

ProvisioningFailed: failed to provision volume with StorageClass "ms-ext4-3-replicas-local": 
rpc error: code = InvalidArgument desc = error in response: status code '400 Bad Request', content: 'RestJsonError { details: "", message: "SvcError :: ClonedSnapshotVolumeThin: Cloned snapshot volumes must be thin provisioned", kind: InvalidArgument }'

What you expected to happen:

CDI should follow a predictable storage class selection priority during host-assisted snapshot restore:

How to reproduce it (as minimally and precisely as possible): Steps to reproduce the behavior.

Create a snapshot from existing PVC:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: blank-datavolume-snapshot
spec:
  source:
    persistentVolumeClaimName: blank-datavolume

Create DataVolume to restore from snapshot:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: new-datavolume-from-snapshot
spec:
  source:
    snapshot:
      name: blank-datavolume-snapshot
      namespace: default
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 2Gi
    storageClassName: ms-xfs-snapshot-restore

Observe intermediate PVC creation:

kubectl get pvc <uuid>-host-assisted-source-pvc -o yaml

Shows storage class ms-ext4-3-replicas-local instead of the specified ms-xfs-snapshot-restore

Additional context:

The current implementation in getStorageClassCorrespondingToSnapClass only matches on CSI driver and takes the first match:

func (r *SnapshotCloneReconciler) getStorageClassCorrespondingToSnapClass(driver string) (string, error) {
    matches := []storagev1.StorageClass{}
    // Lists all storage classes
    storageClasses := &storagev1.StorageClassList{}
    if err := r.client.List(context.TODO(), storageClasses); err != nil {
        return "", errors.New("unable to retrieve storage classes")
    }
    // Just matches on driver and takes first match
    for _, storageClass := range storageClasses.Items {
        if storageClass.Provisioner == driver {
            matches = append(matches, matches)
        }
    }
    if len(matches) > 1 {
        return matches[0].Name, nil
    }
    //...
}

Suggested Changes:

Prioritized Storage Class Selection:

  1. First: Use DV's specified storage class (if matches driver)
  2. Second: Try source PVC's storage class (from snapshot)
  3. Third: Try default storage class (with matching driver)
  4. Last Resort: Any matching storage class

or

Have a way for the user to configure behaviour for intermediate pvc

Environment:

akalenyu commented 6 days ago

Hey, thanks for opening the issue!

2. Second: Try source PVC's storage class (from snapshot)

We've faced this concern when designing the feature. The issue is the source PVC isn't available when cloning a snapshot, otherwise we would ofc just grab the storage class from that. But I am not against refining the logic around picking the storage class.

We've also had a working assumption that the same provisioner is able to restore from snapshot regardless of the storage class permutation, so, basically, that it could handle anything thrown at it just to get through the restore, which I am seeing is not true in your case.

TheRealSibasishBehera commented 6 days ago

We've faced this concern when designing the feature. The issue is the source PVC isn't available when cloning a snapshot, otherwise we would ofc just grab the storage class from that.

I totally missed that - my bad.

What do you think about just using DV.PVCSpec.StorageClass? This isn't the best solution because users might not be aware of any intermediate operation mechanisms like the requirements for host-assisted-pvc .

This issue could become common since we don't know what restrictions storage provisioners might have for their cloning/snapshotting. In my case, the storage class must be thin-provisioned.

Giving users more control would be beneficial. Perhaps similar to the CDI config status field scratchSpaceStorageClass, we could have snapshotRestoreConfig or similar section.

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDIConfig
spec:
  snapshotRestoreConfig:
    storageClass: "thin-provisioned-sc"      # Single provisioner
    storageClassMap:                         # Multiple provisioners
      "provisioner-1": "thin-sc-1"
      "provisioner-2": "thin-sc-2"

Using an annotation like cdi.kubevirt.io/snapshot.restore.storageClass: "thin-sc", would be a simpler alternative

akalenyu commented 6 days ago

We've faced this concern when designing the feature. The issue is the source PVC isn't available when cloning a snapshot, otherwise we would ofc just grab the storage class from that.

I totally missed that - my bad.

What do you think about just using DV.PVCSpec.StorageClass? This isn't the best solution because users might not be aware of any intermediate operation mechanisms like the requirements for host-assisted-pvc .

This issue could become common since we don't know what restrictions storage provisioners might have for their cloning/snapshotting. In my case, the storage class must be thin-provisioned.

Giving users more control would be beneficial. Perhaps similar to the CDI config status field scratchSpaceStorageClass, we could have snapshotRestoreConfig or similar section.

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDIConfig
spec:
  snapshotRestoreConfig:
    storageClass: "thin-provisioned-sc"      # Single provisioner
    storageClassMap:                         # Multiple provisioners
      "provisioner-1": "thin-sc-1"
      "provisioner-2": "thin-sc-2"

Using an annotation like cdi.kubevirt.io/snapshot.restore.storageClass: "thin-sc", would be a simpler alternative

I like the annotation solution more. Introducing more APIs for the host assisted fallback side of this is just too much bloat to have users understand. If we go with the annotation, we could take over for our own snapshots and mark them. That would still unfortunately not be a complete solution since the storage class could be gone at that point..

I really hoped the storage provisioner could just do whatever it takes to restore. The data anyway gets copied later byte for byte over the network to the target PVC.

TheRealSibasishBehera commented 6 days ago

Agreed, manual configuration isn't ideal for a user. It's a just bit frustrating to see that while direct PVC snapshot restores work, DataVolume(being a abstaction over PVC) snapshot restores fail.

We're constrained by the lack of standardization in how storage provisioners handle snapshot operations. Having a solution where we figure out what is the best storage class configuration and creating it, has many variables involved, and is a overkill for CDI to do.

The best workaround IMO, making sure storage class still exists at the point of creation of DataVolume might just be using a annotation on DV.

akalenyu commented 6 days ago

Agreed, manual configuration isn't ideal for a user. It's a just bit frustrating to see that while direct PVC snapshot restores work, DataVolume(being a abstaction over PVC) snapshot restores fail.

We're constrained by the lack of standardization in how storage provisioners handle snapshot operations. Having a solution where we figure out what is the best storage class configuration and creating it, has many variables involved, and is a overkill for CDI to do.

The best workaround IMO, making sure storage class still exists at the point of creation of DataVolume might just be using a annotation on DV.

I was thinking more of an annotation on the snapshot object, wdyt?

TheRealSibasishBehera commented 6 days ago

both are very similar, but dataVolume annotation gives little more control at point of restore , because in case of annotation on volume snapshot, by the time the user try to make a dv the storage class might be deleted and the user would have to reapply the annotation finding a suitable storage class.

TheRealSibasishBehera commented 4 days ago

@akalenyu I am happy to make a PR to close this issue if you agree with the approach. Let me know

akalenyu commented 4 days ago

Sounds good just want to poke at the use case some more. Why were you falling back to host assisted in your case?

TheRealSibasishBehera commented 4 days ago

CDI expects to use populator only when storage class is CSI storage class. For this CDI checks the CSIDriver object to determine that .

Smart clone is designed to be a subset of external populator feature . Please correct me if I am wrong on this. And if this external populator annotation is false , it falls back to host-assisted one.

openebs mayastor doesn't have a CSIDriver object , and still supports snapshots . I am not sure if this is something CDI should solve because CSIDriver although not the best way to determine snapshot capability , is a kubernetes standard. I am not sure if it is being used for conformance though.

A approach to solve this might be not check external populator annotation but actually listing/creating a VolumeSnapshotClass to determine snapshot capability