Closed Va1 closed 2 years ago
@Va1 I think this failure is caused by backup only includes resources PVC and PV. This is because Velero CSI plugin use BackupItemActions to implement some code logic for PVC, VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass, then the CSI plugin's code for VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass will not be run.
The SIGSEGV happened, because CSI plugin will wait for VolumeSnapshotContent at least is created and snapshot handle is created in VolumeSnapshot BackupItemAction. Since VolumeSnapshot is not included in backup, this code will not be run. Then right after VolumeSnapshot creation, and its Status is still none, even after checkVolumeSnapshotReadyToUse
is run, the original array volumeSnapshots is not updated, so the Status section is still none.
I think I can make some change to make code more robust, but restore still needs VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass included in backup to work.
I suggest to create backup with this command:
velero backup create csi-test --include-namespaces=ohlc
@blackpiglet That seems reasonable. You're right -- those other resources must also be included in the backup, but Velero should fail gracefully with a useful error message rather than crashing like that.
@blackpiglet adding VolumeSnapshotClass, VolumeSnapshot and VolumeSnapshotContent to backup included resources resolved the issue, thank you.
as outlined by @sseago , i agree that it indeed should be a part of documentation and error message, ideally.
is this something i can help you with by submitting a pull request with a fix?
@Va1 Sure. Welcome for contribution.
What steps did you take and what happened: Velero 1.9.0 is deployed on AWS EKS 1.22 via an official Helm chart v2.31.0. Plugins: AWS v1.5.0, CSI v0.3.0.
Upon backing up, right after CSI snapshots are created (both VolumeSnapshot, VolumeSnapshotContent in proper statuses and EBS snapshot desplays ready in AWS console) and backup is about to wrap up, Velero crashes with SIGSEGV. Backup stays in a
Failed
status.Retried multiple times and it always ends this way.
What did you expect to happen: Backup succeeds and is restorable.
The following information will help us better understand what's going on:
Can not provide this at the moment.
But here are the logs printed prior to a crash:
A backup in question (one of) in yaml format:
A describe of a PersistentVolume created by a backup (one of):
A describe of a PersistentVolumeContent created by a backup (one of):
Chart values overrides:
Anything else you would like to add:
Environment:
Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.