Open anshulahuja98 opened 4 months ago
@anshulahuja98 @blackpiglet
This is essentially the reason for #7978, right?
I see we are discussing whether we can skip uploading vsc to BSL and modify the deletion/restore process, if we can reach an agreement this is a good candidate for v1.15 IMO
Yes @reasonerjt This is just the rootcause Bug item
And yes I am in favour of removing dependency on VSC for the various flows, we can plan for 1.15
https://github.com/vmware-tanzu/velero/issues/7978#issuecomment-2222257681 Link to another explanation of the issue
Per discussion, the effort to resolve this one is relatively large, I wanna propose this to be deferred.
This issue might be related to the following problem:
We have multiple K8s clusters, each with a local CephFS and Velero installed. We perform hourly backups without data movement (but with CSI snapshots) and daily backups with data movement. The Velero instances can see (in S3) the backups from the other Velero instances. This leads to the following problem:
Velero sees a hourly backup from a different Velero instance in S3 and (because this is a backup without data movement) tries to create a corresponding VolumeSnapshotContent in the local cluster. This fails, because the snapshot is only accessible from the cluster where the backup was taken.
This leads to a huge number of pending VolumeSnapshotContents which also impacts reconciliation of other "legit" snapshot operations in a "Denial of Service" style.
What steps did you take and what happened:
In the finalizing phase today the backup controller re uploads the backup TarBall. (https://github.com/vmware-tanzu/velero/blob/1ec52beca80975f74f9ed28d6f9c5f7afe67edee/pkg/backup/backup.go#L756) But it does not update CSI related artifacts in the object store. The CSI gzips with VolumeSNapshotCOntent, VolumeSnapshot etc.
Velero in the CSI plugin BIAv2 implementation does a cleanup of the VolumeSnapshot & recreates VolumeSnapshotContent after the backup goes into finalizing phase. https://github.com/vmware-tanzu/velero/blob/28d64c2c529f33510a68200c129012a163777a67/pkg/util/csi/volume_snapshot.go#L633-L636
Given this behavioural gap in velero, the object store is not updated with this recreated VolumeSnapshotContent as the contents are not re uploaded.
This has lead to other behavioural issues in Velero as highlighted in Issue - #7978
What did you expect to happen:
The following information will help us better understand what's going on:
If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
Environment:
velero version
):velero client config get features
):kubectl version
):/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.