Backup with --move-snapshot-data fails to restore - Githubissues

vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes

https://velero.io

Apache License 2.0

8.57k stars 1.39k forks source link

Backup with --move-snapshot-data fails to restore #8171

Open darnone opened 2 weeks ago

darnone commented 2 weeks ago

What steps did you take and what happened: I am using velero v1.14.1 and aws plugin v1.10.1. I have a successful backup with command:

velero backup create csi-backup-data --include-namespaces csi-ebs --snapshot-move-data

/backups and /kopia paths and files are created in S3 but there is no/kopia/kopia.repository in S3. I remove the test deployment and issue a restore command:

velero create restore csi-restore-data --from-backup csi-backup-data --existing-resource-policy=update --include-namespaces csi-ebs

which fails with error:

error to initialize data path: error to boost backup repository connection velero-backup-storage-location-csi-ebs-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage

backup and restore does work without --snapshot-move-data

What did you expect to happen: I would expect the restore to replace the deleted test deployment. Instead the app is replaced and the pods are pending because the PVC doesn't 'exist.

The following information will help us better understand what's going on: bundle-2024-08-30-14-47-23.tar.gz

The StorageClass reclaimPolicy is set to Delete and the VolumeSnapshotClass deletionPolicy is set to Delete.

Anything else you would like to add:

Environment:

Velero version: Client: Version: v1.14.1 Server: Version: v1.14.1
Velero features:: Not Set but features: EnableCSI and uploaderType: kopia are set in helm values
Kubernetes version (use kubectl version): Client Version: v1.28.12 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.28.11-eks-db838b
Kubernetes installer & version: asdf
Cloud provider or hardware configuration: EKS
OS (e.g. from /etc/os-release): MAC client, Amazon Linux 2

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

:+1: for "I would like to see this bug fixed as soon as possible"
:-1: for "There are more important bugs to focus on right now"

Lyndon-Li commented 2 weeks ago

Looks like this issue has been solved, https://kubernetes.slack.com/archives/C6VCGP4MT/p1725041924556139

reasonerjt commented 1 week ago

Let's clarify if this has been figured out already.

darnone commented 1 week ago

So if I understand the answer I received in Slack on the ConfigMap issue, when velero is restored and the namespace is recreated, k8s create a new kube-root-ca.crt ConfigMap automatically which is why Velero is reporting a warning. In this case maybe the config doesn't need to be part of the backup and is there a way to exclude it? In any case, the warning is benign and should fail the restoration correct?I

In any case, this ticket was about the error I was getting:

error to initialize data path: error to boost backup repository connection velero-backup-storage-location-csi-ebs-kopia: error to connect backup repo: error to connect repo with storage: error to connect to repository: repository not initialized in the provided storage

I will redeploy this tomorrow and share that I see under theS3 /kopia path

darnone commented 1 week ago

Would a reclaimPolicy in the storage class and the deletionPolicy in the VolumeSnapshotClass set to Delete vs Retain have any impact this error?

Lyndon-Li commented 1 week ago

Would a reclaimPolicy in the storage class and the deletionPolicy in the VolumeSnapshotClass set to Delete vs Retain have any impact this error?

No, this should be related to the BSL configuration.

darnone commented 1 week ago

BSL? What is that?

Lyndon-Li commented 1 week ago

BSL? What is that?

BackupStorageLocation

darnone commented 1 week ago

I have been exercising the all day and have not been able to reproduce the orginal error in this issue. FSB and volume snapshots, I have not yet tried EFSThe only thing I can think of that might have changed is the reclaim/deletion policies in the StorageClass and VolumeSnapshotClass. As long as warnings are just that I am comfortable closing this issue. I can always reopen this or create a new one.

One thing I want to ask is, is there a way to remove the associated Kopia data when a backup is released?

kaovilai commented 1 week ago

Apart from some metadata kept, most data should be released as well. See doc

darnone commented 1 week ago

Let me make sure I understand. When I create backup that uses volume snapshot with --snapshot-move-data or and FSB with --default-volumes-to-fs-backup=true, the volume contents are stored under the /kopia path in S3. When a backup is deleted, the object store under backups is deleted immediately. But the volume data under the /kopia directory is not. Instead there are maintenance jobs that delete the volume data that is "orphaned".

"The repository relies on the maintenance functionality to delete the orphan data. As a result, after you delete a backup, you don’t see the backup storage size reduces until some full maintenance jobs completes successfully. And for the same reason, you should check and make sure that the periodical repository maintenance job runs and completes successfully."

So what are these maintenance jobs and how do I detect they are running and removing orphaned data?

"if you are sure you’ll never usage the backup repository, you can empty the backup storage manually."

In a running production where backups are running periodically , how do I know what is volume data is not cleaned up? In other words, how do I know what backup data under 'backups maps to what volume content under /kopia and what is no longer needed? Or is best just to let everything get removed at the 30 day expiration date? I guess wha I am getting at I show to I prevent stale data from building up up over time and how do I identify what is stale?

kaovilai commented 1 week ago

I guess a more accurate doc is https://velero.io/docs/v1.14/csi-snapshot-data-movement/#backup-deletion

kaovilai commented 1 week ago

For Velero built-in data mover, Kopia uploader may keep some internal snapshots which is not managed by Velero. In normal cases, the internal snapshots are deleted along with running of backups. However, if you run a backup which aborts halfway(some internal snapshots are thereby generated) and never run new backups again, some internal snapshots may be left there. In this case, since you stop using the backup repository, you can delete the entire repository metadata from the backup storage manually.