vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.79k stars 1.41k forks source link

bug? how to restic restore only one pvc #6690

Open Heiko-san opened 1 year ago

Heiko-san commented 1 year ago

What steps did you take and what happened:

We make a full velero backup of a specific namespace and also restic backup the pvcs/pvs in that namespace (no snapshot backups).

Then we restore the lastest backup for a subset of resources selected by label.

velero restore create --from-schedule my-schedule --selector "mylabel=myvalue"

The set consists of a pvc/pv and an sts with 1 pod. As long as we delete the sts and the pvc, the restore succeeds, including the restic restore of the data within the pvc.

However if we just scale down the sts to 0 and only delete the pvc, then both of the following approaches fail:

velero restore create --from-schedule my-schedule --selector "mylabel=myvalue"
velero restore create --from-schedule my-schedule --selector "mylabel=myvalue" --include-resources persistentvolumeclaims,persistentvolumes

The first will never end because of an error, that the sts's pod can't be queried. The second actually succeeds, but without doing the restic restore, the pvc is restored empty.

We use velero v1.11.1

What did you expect to happen:

At least with the latter approach, I would have expected the data to be restored.

velero restore create --from-schedule my-schedule --selector "mylabel=myvalue" --include-resources persistentvolumeclaims,persistentvolumes

Is this a bug? Or how would one do a restic restore of only the pvc/pv without touching the related resources (sts in our case)?

Environment:

Client:
    Version: v1.11.0
    Git commit: 0da2baa908c88ec3c45da15001f6a4b0bda64ae2
Server:
    Version: v1.11.1
features: <NOT SET>
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"archive", BuildDate:"2023-07-20T07:37:53Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.11", GitCommit:"8cfcba0b15c343a8dc48567a74c29ec4844e0b9e", GitTreeState:"clean", BuildDate:"2023-06-14T09:49:38Z", GoVersion:"go1.19.10", Compiler:"gc", Platform:"linux/amd64"}

The Schedule

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: my-schedule
  namespace: velero
spec:
  schedule: 0 3 * * *
  template:
    csiSnapshotTimeout: 0s
    defaultVolumesToFsBackup: true
    hooks: {}
    includedNamespaces:
    - my-namespace
    itemOperationTimeout: 0s
    metadata: {}
    snapshotVolumes: false
  useOwnerReferencesInBackup: true
sseago commented 1 year ago

Restic restore requires a pod to be restored with the PVC, since the pod mounts the PVC, and then the node agent pod on the same node as the restored pod has access to the volume. You need to restore the pod with the PVC in order to use restic restore. If the pod belongs to a deployment, it may also fail if the deployment is scaled down to 0, since the deployment may kill the pod as soon as it's restored.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 14 days with no activity.

dfaltum commented 4 months ago

I saw a workaround here, but it is not very elegant: https://www.ergton.com/velero-pvc-only-restore.html It can be quite messy if you need to restore the pod and it's configmaps and secrets etc just to restore the volume, and when the restore is done you delete those objects and reinstall the helm chart.

@Lyndon-Li will there be a solution to this problem?

Lyndon-Li commented 4 months ago

For volume only restore, there are ongoing discussions and a design PR #7481. For now, here are the suggestions:

  1. If you volumes support CSI snapshot, use CSI snapshot data mover backup instead. It supports volume only restore. Merely it only supports Immediate volumes at present. So if you have to use WaitForFirstConsumer volumes, you need to wait for a further fix (see issue #8044).
  2. If you have to use fs-backup, just keep tune on PR #7481. Regarding to the workaround you mentioned, I don't think it is a universal solution, but if it works in your case, you can just use it for now.
github-actions[bot] commented 2 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

blackpiglet commented 2 months ago

unstale

github-actions[bot] commented 3 days ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

blackpiglet commented 1 day ago

unstale