vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.5k stars 1.37k forks source link

Doing a full cluster restore fails to restore PVC contents #7487

Closed Protryon closed 4 weeks ago

Protryon commented 5 months ago

What steps did you take and what happened:

  1. I had total catastophic data loss
  2. I deleted all PVCs manually and restarted pods so that Velero could recreate the PVCs.
  3. I did a full spectrum Velero restore: velero restore create --from-backup velero-weekly-20240225020029
  4. Created PVCs from Velero did not contain the backed up data

What did you expect to happen: I expected the PVCs to contain the restored data from backup

What I suspect is the issue + workaround: I think that because the statefulsets still existed, the pods already existed and matched the pod backup and so the backup didn't schedule properly. As a workaround, I deleted the statefulsets, and ran a pod,pv,pvc backup on each namespace before recreating the statefulsets:

velero restore create --from-backup velero-weekly-20240225020029 --restore-volumes=true --include-resources pods,persistentvolumeclaims,persistentvolumes --include-namespaces=xyz

This worked as expected.

Environment:

reasonerjt commented 5 months ago

@Protryon Because you are running a full restore, velero will restore the PVC and binds it to the pod that it restores. You'll need to apply the workaround.

Additionally, we are considering to implement "data-only" restoration, could you take a look at this design, and let us know will it help in your use case?

Protryon commented 5 months ago

@reasonerjt It sounds like it will, but a much easier solution to this specific problem is probably to just automatically delete the pod at the right time to allow the mutating web hook to insert the init container. Ditto, you can time that with deleting an existing PVC probably with some other flag, to allow a full refresh.

reasonerjt commented 5 months ago

.... to just automatically delete the pod at the right time to allow the mutating webhook to insert the init container.

I don't quite understand how it has anything to do with mutating webhook and initContainer.

When you say "automatically delete the pod", are you suggesting that when velero restores a PVC, if a pod is bound to that PVC the pod should be deleted? It may solve some problems, but it may cause unexpected results when the pod is managed by other resources like replicaset, deployment, or CRs. IMO in general we should avoid making velero maneuver the user's workload during restore.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

github-actions[bot] commented 4 weeks ago

This issue was closed because it has been stalled for 14 days with no activity.