Closed germanovm closed 1 year ago
The team is aware of this issue and what happens is that the CSI Unpublish call revokes access to all initiators when it should only revoke access to the source initiator and not offline the volume. The RWX volumeMode: Block PVC should not have been provisioned in the first place as this has not been tested.
I'll update this thread if I hear about a potential bugfix in a future release.
Digging more into this, this item is specifically around RWX block mode PVs. If the Nimble CSI can support RWX Filesystem PVC/PVs, I'd recommend scoping the title to something more precise, and list possible workarounds.
The CSI driver supports volumeMode: Block
for all accessModes
(except RWOP). It's the CSP implementations that breaks. The official HPE CSPs are not open source at this time but we're working on a resolution.
@datamattsson Any update on the status on this ?
@pipopopo we've identified what needs to be done and engineering is working on it. No ETA as of yet but it will be part of the next release of the CSI driver.
For anyone feeling edgy there's a publicly available CSP image with the fix.
If you want to use it with the current operator you need to disable the Nimble and Alletra6K CSP in the CSV (it's parameterized .spec.disable.nimble: true
and .spec.disable.alletra6000: true
). Then deploy the CSP with a resource manifest from here and change the image to quay.io/datamattsson/alletra-6000-and-nimble-csp:block-rwx
.
This is not meant for production use, the fix will be part of the next CSI driver release which is TBD at this point.
@datamattsson Any update for this issue ? I meet this issue on alletra 9k also
Alletra 5/6K and Nimble will be supported in the next release. Alletra 9K, Primera and 3PAR does not have an ETA at this time.
Fixed in v2.4.0.
When do you think that 3PAR could OpenShift Virtualization will be supported using 3PAR.
When do you think that 3PAR could OpenShift Virtualization will be supported using 3PAR.
I'd recommend creating a new issue in this repo for 3PAR, and it can be tracked separately.
Alletra 5/6K and Nimble will be supported in the next release. Alletra 9K, Primera and 3PAR does not have an ETA at this time. Can you clarify where we landed with this for v2.4.0? I have a customer needing Alletra 9K support
@therevoman the 2.4.0 release that includes Nimble/Alletra5/6K has been delayed for OpenShift. You can still install the CSI driver with the Helm chart but it's unsupported by Red Hat. However, that release will be available in the next few days in the Red Hat catalog so I would encourage customers/partners to wait for that.
For backends with 3PAR pedigree (3PAR/Primera/Alletra 9K/MP) you'll have to wait for 2.4.1 which is in the works. ETA is unknown at this time.
The HPE CSI driver does not work as expected in ReadWriteMany mode using a raw block device.
During a VM live migration, the following happens.
The VM disk is a volume that is attached to 2 pods (source migration and destination) from steps 2 to 4. Once the migration finishes the source pod (PA) is deleted and the VM runs on the destination (PB). The problem is that once PA shuts down the volume is set to offline by the CSI driver, and the destination (PB) loses access to the volume, pausing the VM with EIO as its storage is gone, so instead of running on the destination node, the VM is now paused/hung because it lost access to its disk/volume.
The problem is reproducible by starting 2 pods on 2 different nodes, sharing a raw block RWX volume. No need for live migration or virtualisation.
Once the first pod shuts down the remaining pod loses access to the volume, because the nimble side makes the backing volume go offline and the iscsi connection is abruptly closed while one of the pods is still using it.
1) Create test-pvc with Block and RWX, on HPE CSI StorageClass. 2) Create 2 pods, running on 2 different nodes, using the shared PVC
3) Shutdown one pod
4) Other pod loses LUN access, LUN is offline on HPE Nimble dashboard
Logs for the HPE CSI driver shows the request to offline the LUN that is still used by the other pod:
I should not offline the volume, so PB does not lose access to it. This should only be offlined once all pods sharing it stop using it. Perhaps missing some refcount.