vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.77k stars 1.41k forks source link

Support only restore volume data for filesystem backup #7345

Open blackpiglet opened 10 months ago

blackpiglet commented 10 months ago

Describe the problem/challenge you have

Sometimes, it's possible to only restore data into volume without modifying the k8s resources. Some issues required a similar function, #504 and #2598. Suggest, at least, provide a workaround for this scenario.

Describe the solution you'd like

If a PodVolumeRestore is created based on the PodVolumeBackup information manually, the PodVolumeRestore is not processed by the Velero PVR controller. This is due to the PVR controller checking the PVR-requested pod whether has a restore-helper InitContainer. If not, the PVR is neglected.

Suggest adding a new field in the PVR to make the PVR controller skip the restore-helper InitContainer check logic. AllowOnlyRestoreData is the proposed parameter name. Its type is bool. The default value is false. This is an example:

apiVersion: velero.io/v1
kind: PodVolumeRestore
metadata:
  name: test-01-xxxxxxxxxxxxx-xxxxxx
  namespace: velero
spec:
  backupStorageLocation: default
  pod:
    kind: Pod
    name: hello-app-xxxxxxxxx-xxxxx
    namespace: restore
    uid: xxxxxxxx
  repoIdentifier: gs:jxun:/restic/test
  snapshotID: xxxxxxxx
  sourceNamespace: test
  uploaderSettings:
    WriteSparseFiles: "false"
  uploaderType: kopia
  volume: sdk-volume
  allowOnlyRestoreData: true

Anything else you would like to add:

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

blackpiglet commented 9 months ago

As a PoC, already verified that, after bypassing the InitContainer check code, the manual created PVB worked as expected.

stefanhenseler commented 9 months ago

@blackpiglet, this is amazing! We need this functionality as well. Do you have a link to your PoC branch, would try it out and provide feedback as well. Is this a feature being worked on for 1.14?

reasonerjt commented 9 months ago

I think we need a design to elaborate the e2e flow and explain why this requires a change in the CRD.

blackpiglet commented 9 months ago

@stefanhenseler https://github.com/blackpiglet/velero/tree/7345_poc I created a branch in my forked repository. This is still a proof of concept of this issue. The change bypasses checking the restore-help InitContainer for PodVolumeRestore pointing pod. The user needs to create the PodVolumeRestore with the correct content, and the target pod and PVC should exist before creating the PVR.

https://github.com/blackpiglet/velero/blob/4799c3086da07e4773fc99d4db06afe6a7092714/pkg/controller/pod_volume_restore_controller.go#L197-L202

qmaraval-csgroup commented 8 months ago

Hello, i have an issue similar to this idea.

The context: I use velero to perform backups of PVC using FSB. I only want to backup the content of my PVC, from what i read i have to backup at least :

Everything works well for backup but the restoration is a little bit more adventurous... I want to restore PVC directly on the backup namespace, however as the pod comes from a statefulset or a deployment, the restoration pod can not be scheduled, so the init container never run and no restore is performed. Do you think it's possible through restore-resource-modifiers to generate a fake pod modifying the initial one just to allow init container to run (bypassing STS and deployment own policy) ?

For now, the only way i have to make it possible is to backup all resources from Namespace A, restore them in another empty namespace B, once everything is restored, i change the PV reclaim policy, delete everything and manually reattach the pv from B to the PVC on namespace A ... Which works but is far from optimal ...

Thanks,

blackpiglet commented 8 months ago

@qmaraval-csgroup Thanks for sharing your thoughts. Could you share more information about why you only want to back up and restore the data in volume?

I know one possible scenario is that the k8s workload is governed by GitOps tools. Is this the same as your scenario?

qmaraval-csgroup commented 8 months ago

@blackpiglet Thanks for your answer. Indeed the reason is that we kind of use GitOps tool, it is not really a gitops tool but a specific CI/CD.

blackpiglet commented 8 months ago

@qmaraval-csgroup Velero cannot use the resource modifier to remove the InitContainer from the PodVolumeRestore-generated pod, because, from the Velero perspective, the pod is already restored, Velero doesn't have a phase in restore to modify the already restored resource.

I see. Using the k8s in the DevOps pipeline to deploy the new version application into the production or staging environment is a typical scenario. Usually, these DevOps tools will guarantee the workload's state and replica. If something goes wrong, it will send alerts to the contact.

Thanks for your feedback. Just from curiosity, does this environment often perform backup and restore operations?

qmaraval-csgroup commented 8 months ago

@blackpiglet Thanks for your answer. The backup are performed on a daily basis. The restore are rarer maybe once or twice a month according to users mistakes.

reasonerjt commented 6 months ago

Moving this out of v1.14 milestone as we are seeing different understanding about the feature. More discussion is needed before we have consensus about the scope.

DreamingRaven commented 5 months ago

Adding my two cents for why I would desire this feature.

I have a production cluster. I also have a staging cluster. These two clusters are divergent in that environment variables, secrets, configmaps, etc will be different due to their two very different environments E.G one is hosted kubernetes and another is Talos and have different ingress hosts like app.org.example vs app.staging.org.example in staging. This means it is undesirable for me to attempt to restore manifests from one cluster to another cluster, since they will be different. Both clusters are also GitOps clusters so they have no need of externally provided manifests. They both have CSI snapshot support.

I want this data from production in staging to enable me to test before new changes enter production using real data. Let's say a series of databases like MongoDB or data stores like Minio that need the data to be that of production to be able to fully test new changes. Thus, the only thing I require is the volumes themselves to be moved.

Hopefully that is a clear potential use-case for this restore.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. If a Velero team member has requested log or more information, please provide the output of the shared commands.

blackpiglet commented 3 months ago

unstale

simenhaga commented 3 months ago

Hello, i have an issue similar to this idea.

The context: I use velero to perform backups of PVC using FSB. I only want to backup the content of my PVC, from what i read i have to backup at least :

  • The pod (to allow the initcontainer to be scheduled) where the PVC is mounted on
  • The PV
  • The PVC

Everything works well for backup but the restoration is a little bit more adventurous... I want to restore PVC directly on the backup namespace, however as the pod comes from a statefulset or a deployment, the restoration pod can not be scheduled, so the init container never run and no restore is performed. Do you think it's possible through restore-resource-modifiers to generate a fake pod modifying the initial one just to allow init container to run (bypassing STS and deployment own policy) ?

For now, the only way i have to make it possible is to backup all resources from Namespace A, restore them in another empty namespace B, once everything is restored, i change the PV reclaim policy, delete everything and manually reattach the pv from B to the PVC on namespace A ... Which works but is far from optimal ...

Thanks,

We experience exactly the same problem as qmaraval-csgroup, but we use ArgoCD for GitOps. We need to be able to recover all data in case of a disaster recovery situation. When restoring from cluster-1 to cluster-2 we are only able to get the data from the PVC if the namespace on the restore side is totally empty, but as we are governed by gitops this will not be the case in an emergency.

Are you still working on solving the problem and do you know when we can assume it is solved?

Any updates would be appreciated.

blackpiglet commented 3 months ago

Yes, but this issue is not planned in the v1.15 release, so no ETA yet.