vmware-tanzu / velero-plugin-for-vsphere

Plugin to support Velero on vSphere
Other
58 stars 50 forks source link

Cannot delete snapshots `A specified parameter was not correct` #522

Open braunsonm opened 1 year ago

braunsonm commented 1 year ago

Describe the bug

It seems that at some point an upload failed so Velero left the snapshot on vSphere. Now I am unable to expand the volume but I also cannot delete the snapshot because there are no Snapshot resources anymore that are active.

To Reproduce

Unsure. Here is what I think reproduced it in my environment:

  1. Created a backup
  2. That backup failed to upload
  3. The snapshot expired and a DeleteSnapshot CR was created
  4. That DeleteSnapshot CR was marked as Completed even though it failed to delete from vSphere
  5. Creating subsequent DeleteSnapshot CR's if you happen to know the UID will also fail. The CR will show Completed even though it is failing the API call to vSphere.

Expected behavior

The snapshot should be deleted in vSphere.

Anything else you would like to add:

The vSphere Client shows Delete a virtual object snapshot: A specified parameter was not correct: (the last : is not a typo, the message just ends there).

In the backup-driver I see:

Disk doesn't have given snapshot due to the snapshot stamp was removed in the previous DeleteSnapshot operation which failed with InvalidState fault. And it will be resolved by the next snapshot operation on the same VM. Will NOT retry

What does this mean?

Is there a way to get the cluster back into a good state without deleting the PVC? As it stands:

  1. I cannot list all the snapshots which were left dangling unless I go into the datastore and look for fcd/<UID>-0000.vmdk but even then I do not know the snapshot ID
  2. I'm not sure what the API is in vSphere to trigger the deletion of this snapshot myself. How can I list and delete the existing orphaned snapshots of the fcd disks? Is there a govc command that I can use here?