Open blackpiglet opened 2 months ago
The possible cause is that:
When restarting the Velero server by kubectl delete pod
or kubectl rollout restart
, there are two Velero servers running in a short time (the old one is in Terminating
but not deleted completely).
The new Velero server marks the Backup
as Failed
when starting up while the old one updates backup's status to WaitingForPluginOperations
. Then because the DataUpload
is canceled, the backup is updated to PartaillyFailed
at last.
One possible improvement for upgrading use case is to change the deployment strategy of Velero server from RollingUpdate
to Recreate
, this will make sure the old Velero server is stopped before creating the new one.
@ywk253100 another possibility is that the backup just happened to move from InProgress to WaitingForPluginOperations just as the pod was being killed, so that transition happened right before killing it. But yes, we probably do want the old server stopped before starting the new one. If both are running at the same time or even a brief period, unpredictable things can happen.
What steps did you take and what happened:
Create a backup for a workload with volumes, and the volume data is backed up by the CSI snapshot data mover.
Restart the Velero server pod when the backup is in the
InProgress
state.The Backup ended in
PartiallyFailed
state, and some of the DataUploads ended asCompleted
.What did you expect to happen: The backup should end in
Failed
state, and the DataUploads should be Cancelled.The following information will help us better understand what's going on: Another worth notice is the Backup state changed from
Failed
toPartiallyFailed
.If you are using velero v1.7.0+:
Please use
velero debug --backup <backupname> --restore <restorename>
to generate the support bundle, and attach to this issue, more options please refer tovelero debug --help
bundle-2024-09-19-22-59-51.tar.gzIf you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)
kubectl logs deployment/velero -n velero
velero backup describe <backupname>
orkubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename>
orkubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>
Anything else you would like to add:
Environment:
velero version
): mainvelero client config get features
): EnableCSIkubectl version
): v1.31/etc/os-release
):Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.