kubevirt / community

Community content
https://kubevirt.io
50 stars 105 forks source link

design-proposal: volume migration #245

Closed alicefr closed 8 months ago

alicefr commented 1 year ago

This proposal is a follow-up from the https://github.com/kubevirt/community/pull/240 and only targets moving the storage to another PVC. The VM live migration with local storage is out-of-scope.

In comparison to the previous proposal, this introduces a new CRD for representing the Volume Migration.

mhenriks commented 9 months ago

@vladikr - we will have declarative hotplug volume API soon so replacement could potentially happen instantly not just when VM is restarted (the current case)

vladikr commented 9 months ago

@mhenriks, thanks. yes. Wouldn't the declarative hot plug happen only when a user adds a new volume but wont apply to a change of an existing volume?

alicefr commented 9 months ago

@mhenriks, thanks. yes. Wouldn't the declarative hot plug happen only when a user adds a new volume but wont apply to a change of an existing volume?

The user could use the same volume name for an old and new volume. Like they want to unplug pvc1 and replace it with pvc2 but both PVCs use the same name for the volume

alicefr commented 9 months ago

This feature appears to be hard to design, so let me recap the most recent conversations here once more.

The solutions above appear to be out of date; here there are some new, fresh ones. Using this spec as an illustration Old spec:

  volumes:
  - name: rootDisk
     persistentVolumeClaim:
    claimName: src-pvc

1.Add a new updateStrategy field per volume under the volumes spec. Example with updating the spec with the updateStrategy:

  volumes:
  - name: rootDisk
     persistentVolumeClaim:
    updateStrategy: migration # or replacement
    claimName: src-pvc
  1. Add some sort of gating under the volume spec where the user can signal that the update needs to wait for the volume migration cr to exist. Example with the volume
  volumes:
    condition: migration
  - name: rootDisk
     persistentVolumeClaim:
    claimName: src-pvc

and the volume migration cr:

apiVersion: storage.kubevirt.io/v1alpha1
kind: VolumeMigration
metadata:
  name: vol-mig
spec:
  migratedVolume:
  - sourceClaim: src-pvc
    destinationClaim: dest-pvc

The gating mechanism solves the race condition described in https://github.com/kubevirt/community/pull/245#issuecomment-1943327128 .

Without a doubt, the first option is the easiest to develop and put into practice. Still, I feel that it is excessively limiting. One problem is that we are unable to stop the migration in the absence of a separate object and it is difficult to express it using a single field under the VM spec.

Therefore, I think we can take advantage of both solutions and have a mixture of the two. I'd like to put the updateStrategy field in the VM spec and the gating on the VMI. The user needs to update the VM spec volumes and specifying the updateStrategy and create a Volume Migration object.

The gating could be represented by a condition on the VMI status.

We could have the 2 situations where the update of the VM spec happen before or after the volume migration object creation.

The first picture depicts the situation where the update occurs before the volume migration existence. vol-mig-obj-wait

The second represents when the volume migration object is visible before the update. vol-mig-obj-exists

In the second case, the controller doesn't need to wait and can immediately proceed with the volume migration.

This mechanism is gitops friendly and allow us to control the volume migration lifecycle thanks to the volume migration CRD.

If the users want to abort the migration, they need to delete the volume migration object. This will be reflected into the VMI status

/cc @vladikr @mhenriks @awels @acardace another round :)

acardace commented 9 months ago

@alicefr while I like this hybrid solution but I think there are still issues with it.

First, I don't think aborting a storage migration is that easy. Let's say you patch the VM with the new dest-pvc, create a storageMigration CR and then delete the storageMigration CR, what should happen now? The VM still has the dest-pvc in the volume spec. I'd expect that the storage migration gets retried, which would defeat the additional complication of having a separate CR for the sake of decoupling the lifetime of the MigrationStorage object and the VM object (because in practice they're still coupled as the dest-pvc info is in both places now, so who wins?).

Second, we already have a way to abort a migration which is deleting the associated migration object.

To be honest I don't think aborting this process is something we should worry about, in the happy path where you're able to migrate your storage successfully you can migrate back if you want to "abort" it.

I see such a strong coupling between the new CR and the new field in the VolumeSpec that I don't know if it makes sense to add this new CR, effectively the CR is just a replication of the info you already have in the VolumeSpec. I agree that a single field is more limiting but you can always use a struct instead of a string for the updateStrategy if you want to leave room for future developments.

PS: those diagrams are really nice, what have you used to draw them?

alicefr commented 9 months ago

Close in favor of https://github.com/kubevirt/community/pull/260