kubevirt / kubevirt-velero-plugin

Plugin to Velero which automates backing up and restoring KubeVirt/CDI objects
Apache License 2.0
32 stars 28 forks source link

Restore fails after label-based backup due to missing instancetype `ControllerRevision` #258

Closed e3b0c442 closed 4 months ago

e3b0c442 commented 4 months ago

What happened: When attempting to restore a backed-up VM, the restore fails with an error similar to the following: error restoring virtualmachines.kubevirt.io/vms/one: admission webhook "virtualmachine-validator.kubevirt.io" denied the request: Failure to find instancetype: controllerrevisions.apps "node1-o1.large-3ae509a7-5446-4119-945d-1eb2a00d00cb-1" not found

What you expected to happen: The ControllerRevision is correctly backed up with the VM when a label is specified, and the restore completes successfully.

How to reproduce it (as minimally and precisely as possible):

  1. Create a VM with a VirtualMachineClusterInstanceType or VirtualMachineInstanceType and an identifying label
  2. Back up the VM with Velero, specifying the label. e.g. velero backup create node1-backup --include-namespaces vms -l name=node1 --snapshot-move-data --wait
  3. Delete the backed-up VM kubectl delete vm node1
  4. Attempt to restore the VM velero restore create --from-backup node1-backup

Additional context: When specifying a label filter, the ControllerRevision resource is not added to the backup, despite being owned by the VirtualMachine resource. Labels applied to the VirtualMachine resource are not propagated down to the ControllerRevision created by KubeVirt.

If the label filter is removed and the entire namespace is backed up, the ControllerRevision resource is included in the backups and subsequently restored successfully..

Environment:

mhenriks commented 4 months ago

Hi @e3b0c442 the example VM is using VirtualMachineClusterInstancetype which is not supported yet [1]. I think support could be added pretty easily just have to make sure that appropriate args are passed to velero backup create

VirtualMachineInstancetype is supported currently though. See [2] and [3]

[1] https://github.com/kubevirt/kubevirt-velero-plugin/blob/93ecaf974d5d3e57dea038a7282b4b9257e7e914/pkg/plugin/vm_backup_item_action.go#L145 [2] https://github.com/kubevirt/kubevirt-velero-plugin/blob/93ecaf974d5d3e57dea038a7282b4b9257e7e914/pkg/plugin/vm_backup_item_action.go#L146-L159 [3] https://github.com/kubevirt/kubevirt-velero-plugin/blob/93ecaf974d5d3e57dea038a7282b4b9257e7e914/tests/vm_backup_test.go#L203-L260

e3b0c442 commented 4 months ago

Sorry, this is also occurring when VirtualMachineInstanceType is used, so I do believe there is a bug there, as the code indicates that should be backed up. I wouldn't disagree with adding VirtualMachineClusterInstanceType though.

Example:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: node1
  namespace: vms
  labels:
    name: node1
spec:
  dataVolumeTemplates:
    - metadata:
        namespace: vms
        name: node1-rootdisk
        labels:
          name: node1
      spec:
        pvc:
          accessModes:
            - ReadWriteMany
          resources:
            requests:
              storage: 40Gi
          volumeMode: Block
          storageClassName: ceph-block
        source:
          pvc:
            namespace: vms
            name: rocky-8-amd64
  instancetype:
    name: o1.xlarge
    kind: VirtualMachineInstanceType
  preference:
    name: rhel.8
    kind: VirtualMachinePreference
  runStrategy: Always
  template:
    metadata:
      labels:
        name: node1
    spec:
      domain:
        cpu:
          model: Westmere
        devices:
          disks:
            - name: rootdisk
            - name: cloudinitdisk
          interfaces:
            - name: multus
              macAddress: 02:e3:b0:00:00:01
              bridge: {}
      networks:
        - name: multus
          multus:
            networkName: vms/vlan-176
      evictionStrategy: LiveMigrate
      terminationGracePeriodSeconds: 0
      volumes:
        - dataVolume:
            name: node1-rootdisk
          name: rootdisk
        - cloudInitNoCloud:
            userData: |-
              #cloud-config
          name: cloudinitdisk

The backup command in this case was velero backup create node1-backup --include-namespaces vms -l name=node1 --snapshot-move-data --wait

The output of velero backup describe shows that the ControllerRevision is not among the backed up objects:

Name:         node1-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.29.6
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=29

Phase:  Completed

Namespaces:
  Included:  vms
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  name=node1

Or label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          true
Data Mover:                  velero

TTL:  720h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2024-07-10 22:52:17 -0500 CDT
Completed:  2024-07-10 23:06:47 -0500 CDT

Expiration:  2024-08-09 22:52:17 -0500 CDT

Total items to be backed up:  10
Items backed up:              10

Backup Item Operations:
  Operation for persistentvolumeclaims vms/node1-rootdisk:
    Backup Item Action Plugin:  velero.io/csi-pvc-backupper
    Operation ID:               du-e57bbee5-7d85-4da0-b217-716f478d3d1c.2281f741-ce3b-423370d29
    Items to Update:
                           datauploads.velero.io velero/node1-backup-6bsft
    Phase:                 Completed
    Progress:              42949672960 of 42949672960 complete (Bytes)
    Progress description:  Completed
    Created:               2024-07-10 22:52:24 -0500 CDT
    Started:               2024-07-10 22:52:40 -0500 CDT
    Updated:               2024-07-10 23:06:40 -0500 CDT
Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - datavolumes.cdi.kubevirt.io
    - virtualmachineinstances.kubevirt.io
    - virtualmachines.kubevirt.io
  cdi.kubevirt.io/v1beta1/DataVolume:
    - vms/node1-rootdisk
  kubevirt.io/v1/VirtualMachine:
    - vms/node1
  kubevirt.io/v1/VirtualMachineInstance:
    - vms/node1
  v1/Namespace:
    - vms
  v1/PersistentVolume:
    - pvc-2281f741-ce3b-423b-8911-cc0a4728e585
  v1/PersistentVolumeClaim:
    - vms/node1-rootdisk
  v1/Pod:
    - vms/virt-launcher-node1-4s8zc

Backup Volumes:
  Velero-Native Snapshots: <none included>

  CSI Snapshots:
    vms/node1-rootdisk:
      Data Movement:
        Operation ID: du-e57bbee5-7d85-4da0-b217-716f478d3d1c.2281f741-ce3b-423370d29
        Data Mover: velero
        Uploader Type: kopia
        Moved data Size (bytes): 42949672960

  Pod Volume Backups: <none included>

HooksAttempted:  2
HooksFailed:     0

In contrast, if I remove the label selector from the backup e.g. velero backup create node1-ns-backup --include-namespaces vms --snapshot-move-data --wait, the instance types and the controller revisions are in the backed-up resource list.

Name:         node1-ns-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.29.6
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=29

Phase:  Completed

Namespaces:
  Included:  vms
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Or label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          false
Data Mover:                  velero

TTL:  720h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2024-07-10 23:09:35 -0500 CDT
Completed:  2024-07-10 23:09:47 -0500 CDT

Expiration:  2024-08-09 23:09:35 -0500 CDT

Total items to be backed up:  100
Items backed up:              100

Backup Item Operations:
  Operation for volumesnapshots.snapshot.storage.k8s.io vms/velero-node1-rootdisk-xpw2s:
    Backup Item Action Plugin:  velero.io/csi-volumesnapshot-backupper
    Operation ID:               vms/velero-node1-rootdisk-xpw2s/2024-07-11T04:09:41Z
    Items to Update:
              volumesnapshots.snapshot.storage.k8s.io vms/velero-node1-rootdisk-xpw2s
              volumesnapshotcontents.snapshot.storage.k8s.io /snapcontent-4d13b2e2-9e92-4fd0-a52f-26be6526cb6f
    Phase:    Completed
    Created:  2024-07-10 23:09:41 -0500 CDT
    Started:  2024-07-10 23:09:41 -0500 CDT
    Updated:  2024-07-10 23:09:42 -0500 CDT
Resource List:
  apiextensions.k8s.io/v1/CustomResourceDefinition:
    - datavolumes.cdi.kubevirt.io
    - network-attachment-definitions.k8s.cni.cncf.io
    - virtualmachineinstances.kubevirt.io
    - virtualmachineinstancetypes.instancetype.kubevirt.io
    - virtualmachinepreferences.instancetype.kubevirt.io
    - virtualmachines.kubevirt.io
  apps/v1/ControllerRevision:
    - vms/node1-o1.xlarge-df5bbbdf-888a-43ca-b90f-ac90e35773e3-1
    - vms/node1-rhel.8-1806db40-2de8-4f47-adf3-b93dfe22c471-1
    - vms/revision-start-vm-286e751f-61c5-4ce5-97f2-617521b23cbc-2
    - vms/revision-start-vm-46c5d532-7764-4716-824c-9105035e9833-1
  cdi.kubevirt.io/v1beta1/DataVolume:
    - vms/node1-rootdisk
  instancetype.kubevirt.io/v1beta1/VirtualMachineInstancetype:
    - vms/cx1.2xlarge
    - vms/cx1.4xlarge
    - vms/cx1.8xlarge
    - vms/cx1.large
    - vms/cx1.medium
    - vms/cx1.xlarge
    - vms/gn1.2xlarge
    - vms/gn1.4xlarge
    - vms/gn1.8xlarge
    - vms/gn1.xlarge
    - vms/m1.2xlarge
    - vms/m1.4xlarge
    - vms/m1.8xlarge
    - vms/m1.large
    - vms/m1.xlarge
    - vms/n1.2xlarge
    - vms/n1.4xlarge
    - vms/n1.8xlarge
    - vms/n1.large
    - vms/n1.medium
    - vms/n1.xlarge
    - vms/o1.2xlarge
    - vms/o1.4xlarge
    - vms/o1.8xlarge
    - vms/o1.large
    - vms/o1.medium
    - vms/o1.micro
    - vms/o1.nano
    - vms/o1.small
    - vms/o1.xlarge
    - vms/u1.2xlarge
    - vms/u1.4xlarge
    - vms/u1.8xlarge
    - vms/u1.large
    - vms/u1.medium
    - vms/u1.micro
    - vms/u1.nano
    - vms/u1.small
    - vms/u1.xlarge
  instancetype.kubevirt.io/v1beta1/VirtualMachinePreference:
    - vms/alpine
    - vms/centos.7
    - vms/centos.7.desktop
    - vms/centos.stream8
    - vms/centos.stream8.desktop
    - vms/centos.stream8.dpdk
    - vms/centos.stream9
    - vms/centos.stream9.desktop
    - vms/centos.stream9.dpdk
    - vms/cirros
    - vms/fedora
    - vms/rhel.7
    - vms/rhel.7.desktop
    - vms/rhel.8
    - vms/rhel.8.desktop
    - vms/rhel.8.dpdk
    - vms/rhel.9
    - vms/rhel.9.desktop
    - vms/rhel.9.dpdk
    - vms/ubuntu
    - vms/windows.10
    - vms/windows.10.virtio
    - vms/windows.11
    - vms/windows.11.virtio
    - vms/windows.2k12
    - vms/windows.2k12.virtio
    - vms/windows.2k16
    - vms/windows.2k16.virtio
    - vms/windows.2k19
    - vms/windows.2k19.virtio
    - vms/windows.2k22
    - vms/windows.2k22.virtio
  k8s.cni.cncf.io/v1/NetworkAttachmentDefinition:
    - vms/vlan-176
  kubevirt.io/v1/VirtualMachine:
    - vms/node1
  kubevirt.io/v1/VirtualMachineInstance:
    - vms/node1
  policy/v1/PodDisruptionBudget:
    - vms/kubevirt-disruption-budget-2mbjn
    - vms/kubevirt-disruption-budget-76tg4
  snapshot.storage.k8s.io/v1/VolumeSnapshot:
    - vms/velero-node1-rootdisk-xpw2s
  snapshot.storage.k8s.io/v1/VolumeSnapshotClass:
    - velero-ceph-block
  snapshot.storage.k8s.io/v1/VolumeSnapshotContent:
    - snapcontent-4d13b2e2-9e92-4fd0-a52f-26be6526cb6f
  v1/ConfigMap:
    - vms/istio-ca-root-cert
    - vms/kube-root-ca.crt
  v1/Event:
    - vms/velero-node1-rootdisk-zgsgb.17e10c316ee80f74
    - vms/velero-node1-rootdisk-zgsgb.17e10c31fbae4ba0
    - vms/velero-node1-rootdisk-zgsgb.17e10c31fbaf2ce1
  v1/Namespace:
    - vms
  v1/PersistentVolume:
    - pvc-2281f741-ce3b-423b-8911-cc0a4728e585
  v1/PersistentVolumeClaim:
    - vms/node1-rootdisk
  v1/Pod:
    - vms/virt-launcher-node1-4s8zc
  v1/ServiceAccount:
    - vms/default

Backup Volumes:
  Velero-Native Snapshots: <none included>

  CSI Snapshots:
    vms/node1-rootdisk:
      Snapshot:
        Operation ID: vms/velero-node1-rootdisk-xpw2s/2024-07-11T04:09:41Z
        Data Mover: velero
        Uploader Type: kopia
        Moved data Size (bytes): 42949672960

  Pod Volume Backups: <none included>

HooksAttempted:  2
HooksFailed:     0
mhenriks commented 4 months ago

The plugin is expecting instancetype.kind to be virtualmachineinstancetype not VirtualMachineInstanceType. Not sure why

https://github.com/kubevirt/kubevirt-velero-plugin/blob/93ecaf974d5d3e57dea038a7282b4b9257e7e914/pkg/plugin/vm_backup_item_action.go#L146