Open Elias-elastisys opened 5 months ago
This is by design. The CSI snapshot data mover restore is similar to the filesystem restore or PodVolumeRestore. The Velero node-agent needs to let the volume settle down on a node, then tries to write the data into the pod's volume-mounting directory.
@Elias-elastisys This is by the design of Velero and Kubernetes for volumes with WaitForFirstConsumer as the binding mode.
And Velero snapshot data movement is not tested against the case of backing up volumes without pod, so is not officially supported. Could you describe more about your case? Why there are many volumes to be backed up but without attaching to pods? You use case would help us to prioritize our work and include this support into the future plan.
This is by design.
Alright thanks, unfortunate for my test case but it makes sense.
And Velero snapshot data movement is not tested against the case of backing up volumes without pod, so is not officially supported. Could you describe more about your case? Why there are many volumes to be backed up but without attaching to pods?
It is to truly backup all data in a cluster. If you run CronJobs or Jobs with PVs then there might be cases where backups run when the Job is not running. In that case the regular Velero backups will not get that data.
Now as I said, CSI data movement seems to be able to successfully backup data without Pods, even with WaitForFirstConsumer since the PV is already there. The only issue is that the restore requires manual intervention which I guess is not the biggest issue, since it succeeds if you apply the Pod manually.
But it would of course be greatly appreciated if it would be possible to catch this edge case as well.
We had a discussion about this issue, here is the conclusion:
Add some test results here. If the unmounted volume is backed up by the CSI plugin, after restoration, the PVC hangs in the pending state, until the PVC is mounted by some pod manually.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.
What steps did you take and what happened:
I've been trying out the Volume Snapshot Data movement for backups and restores to s3, specifically in order to be able to backup PVs that have no currently associated running Pod, as this is not possible with regular Velero backups.
When doing a restore of a successful backup in a cluster with a StorageClass with binding mode WaitForFirstConsumer the restore will time out since Velero waits until a PV is provisioned before it creates its helper pod that facilitates the restore. But the storage provider waits until a Pod is attached to the PVC until it provisions storage, so the restore essentially deadlocks.
If I manually create a Pod that attaches to the restored PVC the restore eventually succeeds.
What did you expect to happen: To be able to restore a backup successfully, even if the backup contains PV/PVCs without any attached Pod.
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
Anything else you would like to add:
I found this old "prioritized" issue: https://github.com/vmware-tanzu/velero/issues/2971 with the same problem but for Restic, while volume snapshot movement uses Kopia. This doesnt seem to have been updated or made any progress in over 2 years. Any updates?
Environment:
velero version
): v1.13.1velero client config get features
): Nonekubectl version
): v1.28.6volumeBindingMode: WaitForFirstConsumer
/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.