Open soostdijck opened 5 days ago
it adds a selector that breaks the PV creation by the NFS driver
Sounds like a faulty NFS driver if it can't handle user (or velero) added labelSelector.
it adds a selector that breaks the PV creation by the NFS driver
Sounds like a faulty NFS driver if it can't handle user (or velero) added labelSelector.
It seems very unlikely to me that something as large and common as csi nfs would be "faulty".
This is done purposefully as/expected by Velero data mover restore workflow. After Velero data mover restore completes, the restored PV will be bound to this PVC. Or in another words, this PVC can only be bound by Velero data mover restore.
If you don't see the binding happens, it means the data mover restore doesn't complete.
Then you can get the corresponding DataDownload CR to see the progress by kubectl get datadownload -n velero
@soostdijck can you link docs that indicate label selector cannot be added?
Hi @Lyndon-Li and @kaovilai
Thanks for the quick replies!
I think there's one confusion about how we use the NFS driver. We do not back up the PV's, as we use a storage class that dynamically creates them when a PVC is added. This is where it goes wrong. The dynamic PV's cannot be created due to the selector added by Velero, resulting in the error "failed to restore volume with StorageClass, claim Selector is not supported".
Here's an example of how we did it:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
argocd.argoproj.io/sync-options: Delete=false
name: sc-example
provisioner: nfs.csi.k8s.io
parameters:
server: nfs.example.com
share: /
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
- nfsvers=4.2
- nolock
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-example
labels:
velero.io/include-in-backup: "true"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
storageClassName: sc-example
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: vsc-example
labels:
velero.io/csi-volumesnapshot-class: "true"
driver: nfs.csi.k8s.io
parameters:
server: nfs.example.com
share: /
deletionPolicy: Delete
I hope this makes the issue a bit more clear?
Regards, Stella
"failed to restore volume with StorageClass, claim Selector is not supported"
As I mentioned here, this is expected if you are running data mover restore.
We do not back up the PV's, as we use a storage class that dynamically creates them when a PVC is added
Velero automatically select PVC and PV to back up. Varying from backup methods, sometimes PV object is backed up, sometimes it is not. And for data mover backup you are using, PV object is NOT backed up, and PVC object is backed up.
@Lyndon-Li ,
Velero does by default select everything. But we only include the PVC in the backup. I'm not sure what the impact will be if we try to restore a dynamically created PV. But what I would prefer to see is that a new PV is created by the NFS driver once a PVC is restored by Velero.
what I would prefer to see is that a new PV is created by the NFS driver once a PVC is restored by Velero
This will happen if the Velero after data mover restore completes. During the restore process, a PV will be created by the NFS driver and finally bind to the restored PVC after the data is restored to the PV.
Therefore, just check if you get any problem that the PVC is not restored successfully, just check if the DataDownload has completed successfully.
what I would prefer to see is that a new PV is created by the NFS driver once a PVC is restored by Velero
This will happen if the Velero after data mover restore completes. During the restore process, a PV will be created by the NFS driver and finally bind to the restored PVC after the data is restored to the PV.
Therefore, just check if you get any problem that the PVC is not restored successfully, just check if the DataDownload has completed successfully.
That's exactly what I also expected to happen, but I get the "claim Selector is not supported" error instead. The DataDownload step is not even reached.
I see a similar issue here, which is the next driver we needed to test with Velero :)
we only include the PVC in the backup. I'm not sure what the impact will be if we try to restore a dynamically created PV
This (only backing up/restoring the PVC, without the pod) doesn't relate to the provision method (dynamically or statically), but relates to the PVC's bindingMode
. Specifically, if the bindingMode
is Immediate
, everything works well.
But if the bindingMode
is WaitForFirstConsumer
, the restore will never complete until the PVC is mounted by a pod, see issue #7561. This is because of Kubernetes' designed constraint of WaitForFirstConsumer
--- the PVC/PV is not provisioned until the pod is scheduled.
This is for PVC-only restore only, normal restores (PVCs with pod) doesn't have the problem.
we only include the PVC in the backup. I'm not sure what the impact will be if we try to restore a dynamically created PV
This (only backing up/restoring the PVC, without the pod) doesn't relate to the provision method (dynamically or statically), but relates to the PVC's
bindingMode
. Specifically, if thebindingMode
isImmediate
, everything works well. But if thebindingMode
isWaitForFirstConsumer
, the restore will never complete until the PVC is mounted by a pod, see issue #7561. This is because of Kubernetes' designed constraint ofWaitForFirstConsumer
--- the PVC/PV is not provisioned until the pod is scheduled.This is for PVC-only restore only, normal restores (PVCs with pod) doesn't have the problem.
That makes perfect sense. We have the bindingMode
set to Immediate
(see the yaml snippet I added earlier, this is almost the exact code we used). So, this should not be an issue
OK, then as the expected behavior, the PVC should be restored successfully. If it is not for your case, just share us the velero log bundle by running velero debug
We have same issue using vSphere CSI driver csi.vsphere.vmware.com. If bindingMode is set to Immediate the restore fails (partially). Everything but PV and PVC gets restored. If bindingMode is set to WaitForFirstConsumer the whole restore works fine.
@edhunter665 This doesn't look like the origin problem, so please open another issue and attach more details and the velero log bundle.
What steps did you take and what happened: We have a setup where the NFS CSI driver creates the PV's dynamically once the PVC's are created/restored. This is done by specifying the correct storage classes.
However, when Velero backs up the PVC's, it adds a selector that breaks the PV creation by the NFS driver:
What did you expect to happen: We expect the restore to happen without Velero adding extra selectors that break the dynamic PV creation.
The output of the following commands will help us better understand what's going on:
Environment:
Velero helm chart 6.4.x, Velero version 1.13.2 Kubernetes version 1.27
Note, this is a duplicate of this issue on the helm chart, but I think it belongs here