HPE Nimble not usable with OpenShift Virtualization

germanovm commented 2 years ago

The HPE CSI driver does not work as expected in ReadWriteMany mode using a raw block device.

During a VM live migration, the following happens.

VM is running in pod PA on node A
Live migration starts
pod PB is started on node B
VM memory/state is copied to PB on node B
PA shuts down, PB resumes the VM

The VM disk is a volume that is attached to 2 pods (source migration and destination) from steps 2 to 4. Once the migration finishes the source pod (PA) is deleted and the VM runs on the destination (PB). The problem is that once PA shuts down the volume is set to offline by the CSI driver, and the destination (PB) loses access to the volume, pausing the VM with EIO as its storage is gone, so instead of running on the destination node, the VM is now paused/hung because it lost access to its disk/volume.

The problem is reproducible by starting 2 pods on 2 different nodes, sharing a raw block RWX volume. No need for live migration or virtualisation.

Once the first pod shuts down the remaining pod loses access to the volume, because the nimble side makes the backing volume go offline and the iscsi connection is abruptly closed while one of the pods is still using it.

1) Create test-pvc with Block and RWX, on HPE CSI StorageClass. 2) Create 2 pods, running on 2 different nodes, using the shared PVC

apiVersion: v1
kind: Pod
metadata:
name: example
labels:
app: httpd
namespace: default
spec:
containers:
- name: httpd
image: 'image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest'
ports:
- containerPort: 8080
volumes: 
- name: raw 
persistentVolumeClaim: 
claimName: test-pvc

3) Shutdown one pod

4) Other pod loses LUN access, LUN is offline on HPE Nimble dashboard

Logs for the HPE CSI driver shows the request to offline the LUN that is still used by the other pod:

2022-08-22T21:32:44.661195588Z 21:32:44 DEBUG [co.ni.no.gr.GroupMgmtClient]] (executor-thread-88) 7 * Sending client request on thread executor-thread-88
2022-08-22T21:32:44.661195588Z 7 > PUT https://10.24.8.59:5392/v1/volumes/066b77f2277cc0b66d00000000000000000000098e
2022-08-22T21:32:44.661195588Z 7 > Content-Type: application/json
2022-08-22T21:32:44.661195588Z 7 > X-Auth-Token: ****
2022-08-22T21:32:44.661195588Z {"data":{"online":false,"force":true}}

I should not offline the volume, so PB does not lose access to it. This should only be offlined once all pods sharing it stop using it. Perhaps missing some refcount.

datamattsson commented 2 years ago

The team is aware of this issue and what happens is that the CSI Unpublish call revokes access to all initiators when it should only revoke access to the source initiator and not offline the volume. The RWX volumeMode: Block PVC should not have been provisioned in the first place as this has not been tested.

I'll update this thread if I hear about a potential bugfix in a future release.

peterclauterbach commented 1 year ago

Digging more into this, this item is specifically around RWX block mode PVs. If the Nimble CSI can support RWX Filesystem PVC/PVs, I'd recommend scoping the title to something more precise, and list possible workarounds.

datamattsson commented 1 year ago

The CSI driver supports volumeMode: Block for all accessModes (except RWOP). It's the CSP implementations that breaks. The official HPE CSPs are not open source at this time but we're working on a resolution.

pipopopo commented 1 year ago

@datamattsson Any update on the status on this ?

datamattsson commented 1 year ago

@pipopopo we've identified what needs to be done and engineering is working on it. No ETA as of yet but it will be part of the next release of the CSI driver.

datamattsson commented 1 year ago

For anyone feeling edgy there's a publicly available CSP image with the fix.

If you want to use it with the current operator you need to disable the Nimble and Alletra6K CSP in the CSV (it's parameterized .spec.disable.nimble: true and .spec.disable.alletra6000: true). Then deploy the CSP with a resource manifest from here and change the image to quay.io/datamattsson/alletra-6000-and-nimble-csp:block-rwx.

This is not meant for production use, the fix will be part of the next CSI driver release which is TBD at this point.

Greco21298 commented 1 year ago

@datamattsson Any update for this issue ？ I meet this issue on alletra 9k also

datamattsson commented 1 year ago

Alletra 5/6K and Nimble will be supported in the next release. Alletra 9K, Primera and 3PAR does not have an ETA at this time.

datamattsson commented 1 year ago

Fixed in v2.4.0.

nmoctezum commented 1 year ago

When do you think that 3PAR could OpenShift Virtualization will be supported using 3PAR.

peterclauterbach commented 1 year ago

When do you think that 3PAR could OpenShift Virtualization will be supported using 3PAR.

I'd recommend creating a new issue in this repo for 3PAR, and it can be tracked separately.

therevoman commented 11 months ago

Alletra 5/6K and Nimble will be supported in the next release. Alletra 9K, Primera and 3PAR does not have an ETA at this time. Can you clarify where we landed with this for v2.4.0? I have a customer needing Alletra 9K support

datamattsson commented 11 months ago

@therevoman the 2.4.0 release that includes Nimble/Alletra5/6K has been delayed for OpenShift. You can still install the CSI driver with the Helm chart but it's unsupported by Red Hat. However, that release will be available in the next few days in the Red Hat catalog so I would encourage customers/partners to wait for that.

For backends with 3PAR pedigree (3PAR/Primera/Alletra 9K/MP) you'll have to wait for 2.4.1 which is in the works. ETA is unknown at this time.

hpe-storage / csi-driver

HPE Nimble not usable with OpenShift Virtualization #323