vmware-tanzu / velero

Backup and migrate Kubernetes applications and their persistent volumes
https://velero.io
Apache License 2.0
8.63k stars 1.39k forks source link

Velero v1.12.3 Fail to Ignore Resources in Terminating Phase #7777

Open nwakalka opened 4 months ago

nwakalka commented 4 months ago

What steps did you take and what happened:

What did you expect to happen:

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

[root@runner-jgnwu6xf-project-14702-concurrent-0 tmp]# kubectl exec -it mcs-velero-69b6f59bdc-tr7p9 -c mcs-velero -n mcs-backup -- /velero backup logs cb-e2e-klu-tgphvd --insecure-skip-tls-verify|grep level=error
time="2024-04-26T09:55:26Z" level=error msg="Error backing up item" backup=mcs-backup/cb-e2e-klu-tgphvd error="error getting persistent volume claim for volume: persistentvolumeclaims \"e2eapp-pv-claim-new\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:218" error.function="github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:448" name=new-label-app-845dbc7d96-t7h46
time="2024-04-26T09:55:27Z" level=error msg="Error backing up item" backup=mcs-backup/cb-e2e-klu-tgphvd error="error getting persistent volume claim for volume: persistentvolumeclaims \"e2eapp-pv-claim\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:218" error.function="github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:448" name=label-app-585cccb667-tjbtn
time="2024-04-26T09:55:28Z" level=error msg="Error backing up item" backup=mcs-backup/cb-e2e-klu-tgphvd error="error getting persistent volume claim for volume: persistentvolumeclaims \"e2eapp-pv-claim-new\" not found" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:218" error.function="github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:448" name=new-label-app-845dbc7d96-t7h46
[root@runner-jgnwu6xf-project-14702-concurrent-0 tmp]# kubectl exec -it mcs-velero-69b6f59bdc-tr7p9 -c mcs-velero -n mcs-backup -- /velero backup describe cb-e2e-klu-tgphvd

Anything else you would like to add:

Steps Taken:

What Happened:

The cluster backup was initiated while certain resources, including a namespace and its associated pod, were still in the terminating phase. Velero proceeded with the backup process and attempted to resolve the resources. It successfully identified the PV mount associated with the pod but encountered a failure when attempting to grab the PVC referenced in the PV. This failure occurred because the namespace, to which the PVC belonged, had already been terminated by that time. As a result, Velero was unable to complete the backup process for the PVC, leading to potential inconsistencies in the backup data.

Environment:

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

qiuming-best commented 4 months ago

If you back up one deleting resources namespace, the error reported by Velero is as expected, we should not ignore the errors

blackpiglet commented 4 months ago

I agree with @qiuming-best. The reason is Velero cannot understand the k8s resource's dependency. Velero collects the backup k8s resources by the alphabet order in most cases.

As a result, Velero can skip the resources already having a Deletion Timestamp, but it cannot understand the namespace-scoped resource's namespace's Deletion Timestamp meaning.