Open zohebs341 opened 2 years ago
In Pod events, I can see this info: 3 node (s) had volume node affinity conflict.
Source PVC is in Multi AZ - NodePool Destination CLuster is using Multi AZ. Then still why pod mapping is failing?
@zohebs341 Could you follow this link guidance to see whether it would resolve your problem? https://velero.io/docs/v1.9/restore-reference/#changing-pvc-selected-node
@blackpiglet Thanks for your response. I've attached two files. (One one and failing one)
Kindly go through the documents attached NotWorking-Velero-AKS-1.22.11.docx word document attached. Working-Velero-AKS-1.22.11.docx
The problem that I noticed is: In the destination cluster - While we are restoring velero backup, it is not changing PV Zones. Ex: Backed up PVC/PV was in WestEurope, and after restoration PVC/PV is coming up with the same zones.
That's why the pod is not able to come up and getting volume conflicts.
But how come the same concept worked for me for one time? In working case: All configs are the same, while restoration PVC/PV came up under the NorthEurope region. So pods came up, as both pod/PVC/PV is under the same region.
Howcome sometimes it is working? Is it a bug from velero side or its an Issue from Azure AKS CSI Driver?
@zohebs341 AFAIK, Velero Azure plugin doesn't support cross region backup and restore. @ywk253100 Am I right?
@blackpiglet I am storing backups in NorthEurope. Even it worked for me one time, today when I tried again. It's not working.
If my backups were in WestEurope, then as you said - It won't support cross-region. Please can you check my attachment?
In this use case:
Source: WestEurope Dest: NorthEurope
And backup storage location is NorthEurope.
After restoration in the destination cluster, the PV location is still pointed to the source cluster. But sometimes, after restoration PV location is pointing to the destination cluster region and the pod is coming up. I've attached both word documents in my previous comments.
Name: pvc-762426ba-4b92-4af0-84f4-4ab76c627866
Labels: velero.io/backup-name=con-zrs
velero.io/restore-name=con-zrs-ds
Annotations: pv.kubernetes.io/provisioned-by: disk.csi.azure.com
Finalizers: [kubernetes.io/pv-protection external-attacher/disk-csi-azure-com]
StorageClass: csi-zrs
Status: Bound
Claim: default/zrs1gb
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 1Gi
Node Affinity:
Required Terms:
Term 0: topology.disk.csi.azure.com/zone in [westeurope-1]
Term 1: topology.disk.csi.azure.com/zone in [westeurope-2]
Term 2: topology.disk.csi.azure.com/zone in [westeurope-3]
Term 3: topology.disk.csi.azure.com/zone in []
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: disk.csi.azure.com
If you're going cross-region, you need to use restic for backup rather than snapshots. Restic does support cross-region (since completely new PVs are provisioned in the restore cluster, with data copied from the BackupStorageLocation), but AWS/Azure snapshotter plugins do not support cross-region restores.
@sseago Thanks for your response. Got it.
@sseago one last question.
What if source cluster (Region A) is running with No AZs. Destination cluster is running with Multi AZs (but same region - Region A)
In this case, velero backup/restore work without restic?
As both clusters are in same region but difference is with AZs
@zohebs341 I'm not 100% sure on this off the top of my head, but I think you're fine across AZs within the same region, but not across multiple regions.
@zohebs341 I think there is possibility Velero plugins don't work in this case. I'm sure GCP plugin doesn't guarantee this function. For example, GCP has 6 AZs in us-central1 region. If your cluster is created regional, it means GKE will choose 3 random AZs from the 6 AZs in this region, so there is a big chance the source cluster's AZs is different from destination AZs, and Velero GCP plugin cannot handle the AZ matching by now.
@sseago @blackpiglet I Just deployed a basic statefulset with PVC on No AZ NodePool(Cluster Region -A) Once the pod is up and PVC got attached to it. I add a node selector to that stateful set, to run on Multi-AZ NodePool of the same cluster.
Same Error: node (s) had volume node affinity conflict.
PVC belongs to No AZ Cluster/NodePool - cannot be used across Multi AZ Nodepool of the same Cluster. I guess restoration of such No AZ PVCs will fail, even if both clusters are in the same region.
But after converting that LRS PVC(of No AZ NodePool) to ZRS PVC, it worked. As ZRS supports multi-zone.
@zohebs341 Sounds like this is the expected behavior. Since I'm not familar with Azure cloud provider, @ywk253100, could you please take a look to ensure?
Discussed in https://github.com/vmware-tanzu/velero/discussions/5245