hpe-storage / csi-driver

A Container Storage Interface (CSI) driver from HPE
https://scod.hpedev.io
Apache License 2.0
62 stars 56 forks source link

Failed to create NFS provisioned volume in OpenShift Virtualization #341

Closed justflite closed 8 months ago

justflite commented 1 year ago

Environment: OCP 4.12 deployed on bare metal with OpenShift Virtualization operator installed. The cluster is connected to a HPE Primera storage array. HPE CSI driver for Kubernetes 2.2.0 and HPE NFS Provisioner 3.0.0 were installed.

Issue:

When creating the following Data Volume:

apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: dv1 spec: source: blank: {} pvc: accessModes:

It failed with the following error messages:

failed to provision volume with StorageClass "nfs-ssd": rpc error: code = Internal desc = Failed to create NFS provisioned volume pvc-c2a552e7-5d05-4fe0-b32f-bc79ba6fb1e3, err persistentvolumeclaims "hpe-nfs-c2a552e7-5d05-4fe0-b32f-bc79ba6fb1e3" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , , rollback status: success

nfs-ssd is a storage class backed by HPE NFS provisioner.

datamattsson commented 1 year ago

This looks like a duplicate of #295. Is the PVC created by an Operator?

justflite commented 1 year ago

No, it is not exactly the same as issue #295, but the solutions you recommended also apply to this issue.

Yes, the PVC is created by operator "OpenShift Virtualization".

To solve the issue #341, we have to solve the following 4 issues:

  1. Creating a PVC provisioned by HPE NFS provisioner in a namespace other than hpe-nfs by an operator
  2. "cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on"
  3. HPE NFS Provisioner pod cannot be created in a namespace other than hpe-nfs if specifying "nfsNamespace" other than hpe-nfs
  4. Creating a Data Volume will create 2 importer pods by CDI, one for the PVC that is NFS provisioned, one for the interal PVC used by HPE NFS Provisioner pod, while the latter one should not be created.

Now I have solutions for the first 3 issues:

  1. Use the workaround as described in issue #295, setting the nfsNamespace parameter.

  2. Add the following rules to clusterole hpe-csi-provisioner-role by typing the following command: $ oc edit clusterrole hpe-csi-provisioner-role and add the following section:

    • apiGroups:
    • cdi.kubevirt.io resources:
    • datavolumes/finalizers verbs:
    • '*'
    • apiGroups:
    • kubevirt.io resources:
    • virtualmachines/finalizers verbs:
    • '*'
  3. Add an entry for every namespace you want to set to "nfsNamespace" in scc "hpe-csi-scc" ... users:

    • system:serviceaccount:hpe-storage:hpe-csi-controller-sa
    • system:serviceaccount:hpe-storage:hpe-csi-node-sa
    • system:serviceaccount:hpe-storage:hpe-csp-sa
    • system:serviceaccount:hpe-storage:hpe-csi-operator-sa
    • system:serviceaccount:hpe-nfs:hpe-csi-nfs-sa
    • system:serviceaccount:demo:hpe-csi-nfs-sa ...

But for the 4th issue, I don't have a solution yet.

justflite commented 1 year ago

Detailed description for the 4th issue:

Creating a Data Volume will create 2 importer pods by CDI, one for the PVC that is NFS provisioned, one for the interal PVC used by HPE NFS Provisioner pod, while the latter one should not be created.

Consider creating the following DV: $ cat dv4.yaml apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: dv4 spec: source: blank: {} pvc: accessModes:

The PVC can be created successfully:

$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE dv4 Bound pvc-994f5584-ea32-4a79-a5d6-2793856faeb4 1Gi RWX nfs-sas-demo 56s hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4 Bound pvc-fcee0baa-6b56-4f3c-8ea2-9c67ea50c340 1Gi RWO nfs-sas-demo 56s

dv4 is the PVC I want to create, and PVC "hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4 " is underlying PVC used by HPE NFS Provisioner.

But after 60 seconds, the PVC became:

$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE dv4 Bound pvc-994f5584-ea32-4a79-a5d6-2793856faeb4 1Gi RWX nfs-sas-demo 21m hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4 Terminating pvc-fcee0baa-6b56-4f3c-8ea2-9c67ea50c340 1Gi RWO nfs-sas-demo 21m

This is because a pod is created by CDI:

$ oc get pods NAME READY STATUS RESTARTS AGE hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4-8b55dd86d-ctq2w 0/1 ContainerCreating 0 3s importer-hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4 0/1 ContainerCreating 0 8s

Pod "importer-hpe-nfs-994f5584-ea32-4a79-a5d6-2793856faeb4" was created by CDI, but it should not be created because it is not the final PVC, it is the underlying PVC used by HPE NFS Provisioner.

Consider the following scenario:

Create a PVC by the following yaml file:

apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: "dv5" spec: source: http: url: "http://172.16.17.242/rhel9.qcow2" pvc: accessModes:

CDI will create 2 importer pods, one for dv5, and one for underlying PVC(which should not be created), and the creation of the PVC will failed because 2 pods (Importer for the underlying PVC and HPE NFS Provisioner pod which both need to mount the same PVC)want to mount the PVC at the same time. The creating of importer pod is always faster than HPE NFS Provisioner pod, so the creation of PVC dv5 will fail duo to HPE NFS Provisoner cannnot mount the underlying PVC that is already mounted by importer.

58m Warning FailedAttachVolume pod/hpe-nfs-fe636a61-6800-41c9-9bc8-454084591646-9c4bd48-bmbwg Multi-Attach error for volume "pvc-6eff39fd-7234-4d1a-8bfe-bad12307417b" Volume is already used by pod(s) importer-hpe-nfs-fe636a61-6800-41c9-9bc8-454084591646

So the question is:

How to tell CDI not to create importer pod for underlying PVC which is used by HPE NFS Provisioner, but just creating a importer pod for the final RWX PVC only?

The CDI picks up the PVC and does actions on it when it has a matching annotation in the PVC yaml. The HPE CSI is also copying the same to their PVC which is causing the problem.

$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE dv3 Pending nfs-sas-demo 6s hpe-nfs-5adb0f43-ac9b-4e16-85fe-59b098572d49 Bound pvc-8f61135c-451c-41d4-bbcc-63c11f8bd5ad 1Gi RWO nfs-sas-demo 6s $ $ $ oc get pvc hpe-nfs-5adb0f43-ac9b-4e16-85fe-59b098572d49 -o yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: cdi.kubevirt.io/storage.condition.running: "false" cdi.kubevirt.io/storage.condition.running.message: "" cdi.kubevirt.io/storage.condition.running.reason: ContainerCreating cdi.kubevirt.io/storage.contentType: kubevirt cdi.kubevirt.io/storage.deleteAfterCompletion: "true" cdi.kubevirt.io/storage.import.importPodName: importer-hpe-nfs-5adb0f43-ac9b-4e16-85fe-59b098572d49 cdi.kubevirt.io/storage.import.source: none cdi.kubevirt.io/storage.pod.phase: Pending cdi.kubevirt.io/storage.pod.restarts: "0" cdi.kubevirt.io/storage.preallocation.requested: "false" csi.hpe.com/nfsPVC: "true" pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: csi.hpe.com volume.kubernetes.io/storage-provisioner: csi.hpe.com creationTimestamp: "2023-04-13T13:33:22Z" finalizers:

datamattsson commented 1 year ago

Thanks for the additional details. To be completely honest here we have not qualified OpenShift Virtualization with the HPE CSI Driver. The operation you're performing should be made on with a regular RWX PVC (without NFS resources) using volumeMode: Block. More on that in this issue: #323

That said, this issue is a priority for HPE and we're currently working on getting this issue resolved for the next release of the CSI driver.

justflite commented 1 year ago

Thank you very much for your kindly reply. Can we use a block mode regular RWX PVC? On scod.hpedev.io, it said a block mode RWX PVC can be provisoned, but the behavior can be unpredictable. Is there a success story of block mode RWX PVC used for VM in order to enable live migration feature?

I have tested block mode regular RWX PVC with HPE Primera C630, the creation of VM succeeded, but the live migration failed.

What is the version of the next release of the CSI driver that can be expected to solve this issue? v2.3.0?

justflite commented 1 year ago

When I initiated a live migration of a VM that is based on block mode RWX PVC, the following error occurred:

Generated from kubelet on worker11.openshift.lab 4 times in the last 1 minute MapVolume.SetUpDevice failed for volume "pvc-8733eeb9-aced-47a9-944e-22f6cc7c2620" : rpc error: code = Internal desc = Failed to stage volume pvc-8733eeb9-aced-47a9-944e-22f6cc7c2620, err: rpc error: code = Internal desc = Error creating device for volume pvc-8733eeb9-aced-47a9-944e-22f6cc7c2620, err: device not found with serial 60002ac0000000000000209300029a7f or target

Generated from kubelet on worker11.openshift.lab Unable to attach or mount volumes: unmounted volumes=[rootdisk], unattached volumes=[rootdisk hotplug-disks private public ephemeral-disks container-disks libvirt-runtime sockets]: timed out waiting for the condition

datamattsson commented 1 year ago

It won't be in 2.3.0, it will be in subsequent release.

datamattsson commented 10 months ago

There's a beta chart available that fixes this for 3PAR pedigree platforms. GA release for the chart and certified OpenShift operator imminent.

datamattsson commented 8 months ago

Fixed in v2.4.1 using RWX block.