kubevirt / kubevirt-velero-plugin

Plugin to Velero which automates backing up and restoring KubeVirt/CDI objects
Apache License 2.0
30 stars 28 forks source link

Backup data upload fails with `data path backup failed: Failed to run kopia backup: unable to get local block device entry: resolveSymlink: lstat /var/lib/kubelet: no such file or directory` #257

Closed e3b0c442 closed 2 months ago

e3b0c442 commented 3 months ago

What happened: A clear and concise description of what the bug is.

When attempting to back up a VM with the CSI snapshot data mover, the data upload fails with the error data path backup failed: Failed to run kopia backup: unable to get local block device entry: resolveSymlink: lstat /var/lib/kubelet: no such file or directory

What you expected to happen: A clear and concise description of what you expected to happen.

The backup completes successfully

How to reproduce it (as minimally and precisely as possible): Steps to reproduce the behavior.

  1. Create a VM
  2. Install velero with the EnableCSI feature flag, and the kubevirt-velero-plugin and velero-plugin-for-aws plugins enabled. (In my case, all are latest [velero 1.14.0, kubevirt-velero-plugin 0.7.0, velero-plugin-for-aws 1.10.0]. I also tried with velero 1.13 and the appropriate plugin versions with no change in behavior)

Additional context: While attempting to troubleshoot this issue, I created a pod with identical mounts to the velero node-agent pod on the node containing the VM, including volumeMount /host_pods which mount's the host's /var/lib/kubelet folder. Investigating the volume's contents, I found a file trying to link back to /var/lib/kubelet, which I think is the root of this failure, as the path /var/lib/kubelet doesn't exist in the node-agent pod.

# ls -alR /host_pods/POD_UUID
# ...
./volumeDevices/kubernetes.io~csi:
total 0
drwxr-x--- 2 root root  54 Jul  1 20:48 .
drwxr-x--- 3 root root  31 Jul  1 20:48 ..
lrwxrwxrwx 1 root root 142 Jul  1 20:48 pvc-eecd1d33-cefd-42bf-a99b-c45f6cae5759 -> /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/publish/pvc-eecd1d33-cefd-42bf-a99b-c45f6cae5759/88e93be8-befb-4a7f-a670-1a87b081aedb
# ...

I am using the minimalist Talos Linux, though I don't think that should have any special bearing here.

I can provide Velero configs as needed but there is very little modified from default.

Environment:

e3b0c442 commented 3 months ago

Adding an extra volume mount to mount /var/lib/kubelet in the pod at the same location as the host allowed the backup to succeed, confirming my hypothesis. This doesn't seem like it should be normal practice though.

e3b0c442 commented 2 months ago

Closing this issue as it is not specifically related to KubeVirt or kubevirt-velero-plugin; this issue is present on any pod with a Block mode PVC attached via volumeDevices spec.