Closed vijay-yadav-3 closed 2 years ago
Hi, look like there is no error in the posted logs. Is the EFS VolumeSnapshot got timeout error in the end?
If the Velero's version you are using no older than v1.7, please collect the Velero debug bundle file with command velero debug
?
Furthermore, what is the size of EFS volume?
Yes it got timed out in the end. I am not able to fetch logs but got this in the end
time="2022-10-04T11:21:06Z" level=info msg="BackupStorageLocations is valid, marking as available" backup-storage-location=velero/default controller=backup-stora ge-location logSource="pkg/controller/backup_storage_location_controller.go:116" I1004 11:21:09.054105 1 request.go:665] Waited for 1.046389052s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:4 43/apis/authentication.k8s.io/v1?timeout=32s time="2022-10-04T11:22:06Z" level=info msg="Validating BackupStorageLocation" backup-storage-location=velero/default controller=backup-storage-location logSource ="pkg/controller/backup_storage_location_controller.go:131" time="2022-10-04T11:22:06Z" level=info msg="BackupStorageLocations is valid, marking as available" backup-storage-location=velero/default controller=backup-stora ge-location logSource="pkg/controller/backup_storage_location_controller.go:116" time="2022-10-04T11:23:06Z" level=info msg="Validating BackupStorageLocation" backup-storage-location=velero/default controller=backup-storage-location logSource ="pkg/controller/backup_storage_location_controller.go:131" time="2022-10-04T11:23:06Z" level=info msg="BackupStorageLocations is valid, marking as available" backup-storage-location=velero/default controller=backup-stora ge-location logSource="pkg/controller/backup_storage_location_controller.go:116"
Got it, then I think this is related to the size of the backup in the EFS volume.
If you are using Velero with no older than v1.9.1, you can enlarge the CSI snapshot creation timeout with CSISnapshotTimeout
https://velero.io/docs/v1.9/api-types/backup/
I did not understand, we have one EFS and all the PVs are 1GB and 600+ PVs. Even if I try for only 1 PV it gives this error
Waiting for CSI driver to reconcile volumesnapshot
for over an hour and then fails.
On the other note how and where do I update CSISnapshotTimeout
. Will try with updating it.
As per the message:
"Waiting for CSI driver to reconcile volumesnapshot cn/velero-node-1002-pvc-pxw25. Retrying in 5s"
The operation will be retried in 5 seconds and this will continue for 10 minutes (by default). At the end of 10 minutes, you should the message:
"Timed out awaiting reconciliation of volumesnapshot ..."
Do you see such message in the log?
@vijay-yadav-3
If your EFS PV data size is significant bigger than the EBS PV data size, I suggest to seperate them into two different backups.
To set CSISnapshotTimeout
value of backup, you can do this by velero backup create <backup-name> --csi-snapshot-timeout=1h
, and please make sure the Velero version is no older than v1.9.1
As per the message:
"Waiting for CSI driver to reconcile volumesnapshot cn/velero-node-1002-pvc-pxw25. Retrying in 5s"
The operation will be retried in 5 seconds and this will continue for 10 minutes (by default). At the end of 10 minutes, you should the message:
"Timed out awaiting reconciliation of volumesnapshot ..."
Do you see such message in the log?
Yes I See this Same Logs when trying.
Ok. What is happening is that Velero creates VolumeSnapshot resource and expects to see corresponding VolumeSnapshotContent show up. But that is not happening here. VolumeSnapshotContent is created by snapshot controller on seeing a VolumeSnapshot resource so there must be some problem with it. You said EBS volumes are getting backed up. Do you know if Velero is taking CSI snapshots of EBS volumes or native EBS snapshots?
In any case, you should verify that snapshot controller is properly set up by manually creating a VolumeSnapshot and verify that a corresponding VolumeSnapshotContent is created. Until this succeeds Velero CSI backups will not work. Let me know if you need help with creating VolumeSnapshot manually.
+1 on what @draghuram is suggesting here.
@draghuram Do you know if Velero is taking CSI snapshots of EBS volumes or native EBS snapshots? Velero is taking native EBS Snapshots.
Let me know if you need help with creating VolumeSnapshot manually. Yes, I would like to know and test it out by creating Volume snapshot manually. Please let me know how to do that, will try it out by myself as well.
@vijay-yadav-3 https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/5e1fcd3e915d62d3b091c6de780ff9e6816f3a7b/pkg/driver/controller.go#L430-L440
After checking EFS CSI driver code, I think it doesn't support snapshot function yet.
Ok. What is happening is that Velero creates VolumeSnapshot resource and expects to see corresponding VolumeSnapshotContent show up. But that is not happening here. VolumeSnapshotContent is created by snapshot controller on seeing a VolumeSnapshot resource so there must be some problem with it. You said EBS volumes are getting backed up. Do you know if Velero is taking CSI snapshots of EBS volumes or native EBS snapshots?
In any case, you should verify that snapshot controller is properly set up by manually creating a VolumeSnapshot and verify that a corresponding VolumeSnapshotContent is created. Until this succeeds Velero CSI backups will not work. Let me know if you need help with creating VolumeSnapshot manually.
hi @draghuram Can you please guide on how to manually create a VolumeSnapshot and verify that a corresponding VolumeSnapshotContent is created.
Sure, I will post my comments in #5436.
I have installed velero with csi plugin for efs, volume snapshotter volume snapshot class and all the other required prerequisites. But in the End it is failing with this error. Velero is working without efs and all the ebs volumes are getting backed up. But For EFS backed PV it is getting stuck at this point.
time="2022-09-29T14:59:29Z" level=info msg="BackupStorageLocations is valid, marking as available" backup-storage-location=velero/default controller=backup-storage-location logSource="pkg/controller/backup_storage_location_controller.go:116" time="2022-09-29T14:59:33Z" level=info msg="Waiting for CSI driver to reconcile volumesnapshot cn/velero-node-1002-pvc-pxw25. Retrying in 5s" backup=velero/backup-node-1002 cmd=/plugins/velero-plugin-for-csi logSource="/go/src/velero-plugin-for-csi/internal/util/util.go:169" pluginName=velero-plugin-for-csi