Open QuentinBtd opened 2 months ago
Are you using data mover backup?
Are you using data mover backup?
Nop
Please double check if the issue still exists after the introduction of itemblock in v1.15
Seeing this issue as well, same use case
Same here using velero 1.14.1 (restic instead of kopia). The backup pre-hook is configured to scale down the clickhouse statefulset, and the post-hook is executed right after the pre-hook, while the clickhouse backup job is still running. This seems to be a regression with 1.14, because it failing consistently right after the upgrade from velero 1.13 to 1.14
@reasonerjt
Please double check if the issue still exists after the introduction of itemblock in v1.15
I don't think that will change anything. Especially with an itemblock that does not have multiple pods, the post hook will run as soon as the pod backup is completed (which happens after the pvc backup is completed). If this was datamover, I'd understand to a degree -- we may have a bug similar to the one fixed on the restore side where we moved post hooks to happen in finalize when there were async actions involved, since the snapshot (or data movement) might complete after the backup/restore of kube metadata is done. But for fs-backup, I thought we blocked on completion of it before declaring the pod backup/restore done, so this shouldn't be happening here. I guess we'll need to look at what changed between 1.13 and 1.14 in terms of PVB processing.
This issue is caused by https://github.com/vmware-tanzu/velero/pull/7571. Before the change, the backup of pods is in sequence, the backup process doesn't handle the next pod until the last one is processed (all PVBs are processed).
@ywk253100 On the restore side, we made changes to make sure hooks happened after volume restore was done which involved some of the processing moving to the finalizing phase. I wonder whether we need similar changes on the backup side.
@ywk253100 Agreed that there are inconsistencies in terms of the behavior of post-hooks. Please open new issues to track the unclarity, and double check the CSI scenarios whether the sequence of snapshot of execution is expected. This issue is specifically about the fs-backup scenario, and it should be fixed.
What steps did you take and what happened:
I am using Velero to back up a PostgreSQL dump created by a pre-hook command, so that Kopia can backup the dump, then delete the file using a post-hook command. However, I noticed that the post-hook was executed before Kopia had the chance to back up the dump.
I conducted several tests:
The first one involved adding a
sleep 60
at the beginning of my post-hook command-> the dump was successfully included in the Kopia snapshot.The second test involved creating a file after the dump was completed, which I used in my post-hook command to wait for the file to be fully generated before deleting the dump -> the dump was not included in the Kopia snapshot.
What did you expect to happen:
Dump file should be in Kopia snapshot
Anything else you would like to add:
Environment:
velero version
): 1.14.0velero client config get features
): Nonekubectl version
): 1.30/etc/os-release
): Amazon Linux 2023Vote on this issue!
This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.