kubevirt / kubevirt-velero-plugin

Plugin to Velero which automates backing up and restoring KubeVirt/CDI objects
Apache License 2.0
26 stars 26 forks source link

VirtualMachineClone during Restore either fails to restore or triggers cloning process again #242

Open msfrucht opened 2 months ago

msfrucht commented 2 months ago

What happened:

What you expected to happen: VirtualMachineClone has several bad behaviors on restore.

First Behavior

  1. Clone a VM using VirtualMachineClone (Openshift Virtualization 4.15 from the UI plugin, 4.14 does not use VirtualMachineClone objects from the UI)
  2. Do a backup using Velero and kubevirt plugin
  3. Delete the original namespace and all VMs
  4. Restore to the original namespace
  5. Velero reports the restore as failing and is unable to complete the restore as unable to create the VirtualMachineClone because the source VM does not exist.
Time Type Object
11.549958142Z VirtualMachineClone centos-stream8-fuchsia-nightingale-41-clone-6y9d9d-ru3fsk-cr
11.577348972Z VirtualMachine centos-stream8-fuchsia-nightingale-41-clone-6y9d9d
12.963322544Z VirtualMachine centos-stream8-fuchsia-nightingale-41
15.363754974Z VirtualMachine rhel8-gold-emu-69
15.966218379Z VirtualMachine rhel9-bronze-leopard-14

This is caused by the default Velero restore order in alphabetical once you get past the default restore order set. The VirtualMachineClone always restores the VirtualMachine objects. At minimum, some documentation is needed to mention that the restore order has to be changed for the VirtualMachineClones to be successfully restored.

This was resolved by setting Velero restore order, but it is a required behavior for dealing with this specific object. Otherwise, Velero will always report the restore as failed. The VMs did get restored at least due to not automatically stopping the restore on a single object error.

Similar issue with VirtalMachineInstanceMigration objects. Always restores before VirtualMachine objects.

Second behavior

  1. Set the Velero restore order to "virtualmachines,virtualmachineclones"
  2. Clone a VM using VirtualMachineClone (Openshift Virtualization 4.15 from the UI plugin, 4.14 does not use VirtualMachineClone objects from the UI) object
  3. Delete the clone.
  4. Do a backup using Velero and kubevirt plugin
  5. Delete the original namespace and all VMs
  6. Restore to the original namespace
  7. The clone comes back

This is because the status gets ignored that the original clone object was successful. The creation of the VirtualMachineClone object, regardless of the status field contents triggers the clone operation again.

Third Behavior

  1. Set the Velero restore order to "virtualmachines,virtualmachineclones"
  2. Clone a VM using VirtualMachineClone (Openshift Virtualization 4.15 from the UI plugin, 4.14 does not use VirtualMachineClone objects from the UI) object
  3. Do a backup using Velero and kubevirt plugin
  4. Delete the original namespace and all VMs
  5. Restore to the original namespace
  6. The clone triggers despite the original status was successful causing a VirtualMachineSnapshot of the original VM. The clone never finishes, leaving a mysterious VirtualMachineSnapshot of unclear origin.

Again, this behavior is caused by ignoring the original VirtualMachineClone status on creation. This triggers a clone process of a previous clone process that was successful.

Status stays stuck in progress forever.

status:
  conditions:
    - lastProbeTime: null
      lastTransitionTime: '2024-04-18T17:12:20Z'
      reason: Still processing
      status: 'True'
      type: Progressing
    - lastProbeTime: null
      lastTransitionTime: '2024-04-18T17:12:20Z'
      reason: Still processing
      status: 'False'
      type: Ready
  phase: RestoreInProgress
  snapshotName: tmp-snapshot-c5fb2af7-bc3d-4d18-a7b0-57fff6af9178

How to reproduce it (as minimally and precisely as possible): See above. Steps for each behavior included.

Additional context: Add any other context about the problem here.

Environment: