Open kish5430 opened 2 months ago
What Velero version are you using? Can you help provide us with more debug info by using the command from this doc.
@allenxu404 Please let me know if there is any additional information requires
Log given above looks normal. PV was successfully restored from snapshot as below log message shows:
time="2024-04-25T05:46:33Z" level=info msg="Restoring persistent volume from snapshot." logSource="pkg/restore/restore.go:2453" restore=velero/milvus-stg-east1-etcd-restore
time="2024-04-25T05:46:34Z" level=info msg="successfully restored persistent volume from snapshot" logSource="pkg/restore/pv_restorer.go:91" persistentVolume=pvc-ed7a6088-9f9e-46fc-88ab-bbe8364a28f7 providerSnapshotID=snap-0d4da2d4c9d3f2c0d restore=velero/milvus-stg-east1-etcd-restore
It seems that the VolumeId was not available for cluster B for some reason. I think you can further troubleshoot it by restore PV on ACTIVE cluster instead of STAND BY cluster B. I assume the restore will work in that case.
HI @allenxu404 Its not working on Active cluster also. I did velero restore on Active Cluster and getting same issue Thanks
time="2024-04-24T18:42:02Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:2185" resource=volumesnapshotclass.snapshot.storage.k8s.io restore=velero/milvus-stg-east1-etcd-restore
time="2024-04-24T18:42:02Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:2185" resource=volumesnapshotcontents.snapshot.storage.k8s.io restore=velero/milvus-stg-east1-etcd-restore
time="2024-04-24T18:42:02Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:2185" resource=volumesnapshots.snapshot.storage.k8s.io restore=velero/milvus-stg-east1-etcd-restore
It seems the CSI snapshot related CRDs are missed from the cluster.
time="2024-04-24T18:42:02Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:2185" resource=volumesnapshotclass.snapshot.storage.k8s.io restore=velero/milvus-stg-east1-etcd-restore time="2024-04-24T18:42:02Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:2185" resource=volumesnapshotcontents.snapshot.storage.k8s.io restore=velero/milvus-stg-east1-etcd-restore time="2024-04-24T18:42:02Z" level=info msg="Skipping restore of resource because it cannot be resolved via discovery" logSource="pkg/restore/restore.go:2185" resource=volumesnapshots.snapshot.storage.k8s.io restore=velero/milvus-stg-east1-etcd-restore
It seems the CSI snapshot related CRDs are missed from the cluster.
HI @blackpiglet
I have already installed volume snapshot crd's
$ kubectl api-resources | grep -i 'volume'
persistentvolumeclaims pvc v1 true PersistentVolumeClaim
persistentvolumes pv v1 false PersistentVolume
k8spspvolumetypes constraints.gatekeeper.sh/v1beta1 false K8sPSPVolumeTypes
volumesnapshotclasses vsclass,vsclasses snapshot.storage.k8s.io/v1 false VolumeSnapshotClass
volumesnapshotcontents vsc,vscs snapshot.storage.k8s.io/v1 false VolumeSnapshotContent
volumesnapshots vs snapshot.storage.k8s.io/v1 true VolumeSnapshot
volumeattachments storage.k8s.io/v1 false VolumeAttachment
podvolumebackups velero.io/v1 true PodVolumeBackup
podvolumerestores velero.io/v1 true PodVolumeRestore
volumesnapshotlocations vsl velero.io/v1 true VolumeSnapshotLocation
Thanks
@kish5430 Can you help verify the status of the associated PV and PVC to confirm their functionality? Additionally, Can you access the AWS console to validate the volume's creation and ensure its proper configuration in the backend?
What steps did you take and what happened: While working on an active EKS cluster, I deployed an application with three etcd pods. I took a backup of these etcd pods using Velero. Later, I switched to a standby cluster and attempted to restore the backup. Although the restore process was successful and the pods were deployed and not running, there was a failure in attaching volumes to the etcd pods.
Command: velero backup create milvus-stg-east1-etcd-backup --selector 'app.kubernetes.io/name=etcd'
What did you expect to happen: Volume attachment should happen and etcd pods run without any issue.
Etcd Pod logs: Warning FailedAttachVolume 101s (x11 over 34m) attachdetach-controller (combined from similar events): AttachVolume.Attach failed for volume "pvc-ed7a6088-9f9e-46fc-88ab-bbe8364a28f7" : rpc error: code = Internal desc = Could not attach volume "vol-00c1e0e23881130c9" to node "i-03a2b2d33c76ccef2": could not attach volume "vol-00c1e0e23881130c9" to node "i-03a2b2d33c76ccef2": InvalidVolume.NotFound: The volume 'vol-00c1e0e23881130c9' does not exist. status code: 400, request id: 4160e339-013b-4b3b-8f39-c3990cf66c2e
Here volume 'vol-00c1e0e23881130c9'' is not exist in volumes in aws
Please find the attached velero restore logs. velero_restore.txt