[FEATURE] Need to support RWX of Harvester CSI Driver

guangbochen commented 2 years ago

Describe the bug Failed to deploy the Neuvector on the guest k8s cluster spin-up by the Harvester RKE2 node driver.

To Reproduce Steps to reproduce the behavior:

Rancher -> Node Driver deploy Harvester Cluster ( 1master , 3 workers)
upgrade harvester-csi-driver helm chart with correct cloud config path “/var/lib/rancher/rke2/etc/config-files/cloud-provider-config”
deploy Neuvector which include 1Gi RWX PVC (https://open-docs.neuvector.com/deploying/kubernetes)
found that only 1 node able to mount that 1Gi PVC

Expected behavior The Neuvector is able to up and running successfully.

Support bundle

Environment:

Harvester ISO version: v1.0.0
Underlying Infrastructure (e.g. Baremetal with Dell PowerEdge R630):

Additional context

guangbochen commented 1 year ago

depends on https://github.com/longhorn/longhorn/issues/2293

abonillabeeche commented 1 year ago

Both NeuVector and Epinio require RWX.

staedter commented 1 year ago

What is the status of this feature request? The longhorn dependency seems to have been implemented already, if I read that issue correctly?

We are evaluating wether we can use Harvester CSI Driver as the only storage options or if we need to deploy other storage options, which depends mostly on the timeline when this feature will be implemented.

egrist commented 1 year ago

Wondering this too, seems that Harvester 1.1.2 uses Longhorn 1.3.2 which supports RWX, I can create one in the Longhorn UI. Installed nfs-common on the worker nodes according to the Longhorn documentation and tried to create a ReadWriteMany PVC, but the harvester-csi-driver:0.1.1600 driver doesn't seem to allow it, looks like it's still present in master: https://github.com/harvester/harvester-csi-driver/blob/509123316e6150307e0f9c39b0eaef3e678ad914/pkg/csi/controller_server.go#L426C8-L426C70

Creating through Rancher 2.7.6, getting the message: failed to provision volume with StorageClass "harvester": rpc error: code = InvalidArgument desc = access mode MULTI_NODE_MULTI_WRITER is not supported

Vicente-Cheng commented 1 year ago

Hi @staedter, @egrist I thought that would be introduced with Harvester v1.3.0 Longhorn already supports the RWX volume currently, so we will start working on it!

dff1980 commented 10 months ago

Hello! Can you please advise if there is any workaround for this issue? How can I create a ReadWriteMany (RWX) Persistent Volume for an RKE2 cluster deployed in Harvester?

joshuarestivo commented 8 months ago

Bump

kingnarmer commented 5 months ago

I am running harvester 1.3 , is there any way to create RWX volumes on k8s cluster provisioned on harvester ?

PatrickLaabs commented 4 months ago

Greetings everyone, are there any updates on this?

web-engineer commented 4 months ago

We've just hit this issue - currently running 1.2.1 and was looking at the upgrade path, but if RWX still doesnt work I'm not sure we want to rush things, whats the state of play - took us a while to spot what was happening here - for anyone else who assumed ReadWriteMany would work describing your PVC will tell you the detail you need such as not supported -

e.g. kubectl describe persistentvolumeclaim <volume>

Check the messages for "failed to provision volume with StorageClass "harvester": rpc error: code = InvalidArgument desc = access mode MULTI_NODE_MULTI_WRITER is not supported"

As mentioned above we worked around this by provisioning an NFS VM in harvester and then mouting NFS volumes into our cluster - but it means running additional VM's and abstration to the filesystem to support a feature that longhorn already provides... frustrating...

Any news?

Vicente-Cheng commented 4 months ago

Hi folks, sorry for the late update. This feature is planned for the v1.4.0. We are currently working on it.

Vicente-Cheng commented 4 months ago

Hi @web-engineer

As mentioned above we worked around this by provisioning an NFS VM in harvester and then mouting NFS volumes into our cluster - but it means running additional VM's and abstration to the filesystem to support a feature that longhorn already provides... frustrating...

Did you mean you are provisioning the NFS VM and the guest cluster VM mount this NFS endpoint for the workload pod? Or, in your case, if the VM is on the Harvester, then it needs to use the NFS RWX volume.

web-engineer commented 3 months ago

I've got RWX working if the volume is from NFS - however expected that volumes should be mountable RWX without running another service if longhorn supported this more "natively"

harvesterhci-io-github-bot commented 2 months ago

Pre Ready-For-Testing Checklist

[x] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at: https://github.com/harvester/harvester/pull/5861
[x] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at:

Test plan:

create Harvester cluster (v1.4.0-rc2) and check the networkfs-manager pod is running. ~You need to patch the harvester managedchart as below~
(Optional) Enable Storage Network
(Optional) Enable Storage Network for RWX volume (on Longhorn UI)

Create Longhorn RWX SC (as below) on Harvester (Host Cluster) side

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-rwx
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
nfsOptions: "vers=4.2,noresvport,softerr,timeo=600,retrans=5"

Create downstream cluster (If you want to verify the storage network path, you need to add it to the downstream cluster VM). Also, the downstream cluster guest OS should have nfs client packages.
Check the Harvester CSI driver image version. It should be bigger than v0.2.0. (If not, you need to manually update the image)

Create new SC on downstream cluster to the above longhorn rwx sc (as below)

allowVolumeExpansion: false
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rwx-sc
parameters:
hostStorageClass: longhorn-rwx
provisioner: driver.harvesterhci.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

Create RWX volume with the above SC
Verified the RWX volume (i.e. attached to the multiple pod and use it)

Reference the test plan on RWX support and stability improvement

[x] Is there a workaround for the issue? If so, where is it documented? The workaround is at:

None

[x] Have the backend code been merged (harvester, harvester-installer, etc) (including backport-needed/*)? The PR is at: https://github.com/harvester/harvester-csi-driver/pull/43
- [x] Does the PR include the explanation for the fix or the feature?
- [ ] Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart? The PR for the YAML change is at: TBD The PR for the chart change is at:
[ ] If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:
[ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at: TBD

~* [ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?

The automation skeleton PR is at:
The automation test case PR is at:~

~* [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility? The compatibility issue is filed at:~

harvesterhci-io-github-bot commented 2 months ago

Automation e2e test issue: harvester/tests#1486

lknite commented 1 month ago

watching with enthusiasm ... if you have a pod 1.0 and you update to 2.0, and there is a pvc involved, the 2.0 has to first get access to the pvc before it can become ready, while 1.0 will not release the pvc and terminate until 2.0 is ready, locked they are, forever waiting on each other ... when using rwo

Vicente-Cheng commented 1 month ago

watching with enthusiasm ... if you have a pod 1.0 and you update to 2.0, and there is a pvc involved, the 2.0 has to first get access to the pvc before it can become ready, while 1.0 will not release the pvc and terminate until 2.0 is ready, locked they are, forever waiting on each other ... when using rwo

Hi @lknite, could you explain this more?

Did you share about the RWO case when updating the pod from 1.0 to 2.0?

TachunLin commented 4 weeks ago

Test RWX sc without storage network

Verified fixed on v1.4.0-rc2 with Rancher v2.8.8 (csi-driver 0.1.19) and v2.9.2 (csi-driver 0.2.0). Close this issue.

Result

$\color{green}{\textsf{PASS}}$ Rancher v2.8.8 - Create RWX volume on downstream cluster $~~$

1. ✅ The networkfs-manager pod is running on Harvester `v1.4.0-rc2` ``` harv140rc2:~ # kubectl get pods -A | grep networkfs-manager harvester-system harvester-networkfs-manager-74xrl 1/1 Running 3 (7m11s ago) 24h ``` 2. ✅ Can create Longhorn RWX storage class on Harvester ``` harv140rc2:~ # kubectl create -f rwx_sc.yaml storageclass.storage.k8s.io/longhorn-rwx created harv140rc2:~ # kubectl get storageclass.storage.k8s.io/longhorn-rwx NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE longhorn-rwx driver.longhorn.io Delete Immediate true 58s ``` 3. ✅ Can create new SC on downstream cluster to the above longhorn rwx sc ``` rke2-v1304-pool1-lfcx4-5k7zp:/var/lib/rancher/rke2/bin # ./kubectl create -f rwx-sc.yaml storageclass.storage.k8s.io/rwx-sc created rke2-v1304-pool1-lfcx4-5k7zp:/var/lib/rancher/rke2/bin # ./kubectl get storageclass.storage.k8s.io/rwx-sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE rwx-sc driver.harvesterhci.io Delete Immediate false 2m5s ``` ![image](https://github.com/user-attachments/assets/53d5bb5d-6a04-48da-8300-a0b3454a6719) 4. ✅ Can create RWX volume with the new `rwx-sc` storage class ![image](https://github.com/user-attachments/assets/c0c20bfd-3c58-4c3b-b51c-be8a70bc1681) 5. ✅ The RWX volume can be attached to the multiple pod - Attach rwx volume on the first nginx deployment ![image](https://github.com/user-attachments/assets/6d3a14f1-f6aa-4b05-82d8-5b61baa18dd9) - Attach rwx volume on the second nginx deployment ![image](https://github.com/user-attachments/assets/2decbd90-6f21-4e8d-a07f-0f5c6cb8910f) - Attach rwx volume on the third nginx deployment ![image](https://github.com/user-attachments/assets/bae2ab34-f1b5-4483-842d-da695bca2a90) - All nginx deployment are running well ![image](https://github.com/user-attachments/assets/b1fe827c-9036-4c83-90b0-551c78b52cff) 6. ✅ The `rwx-pvc` PVC and related PV in RWX mode ``` rke2-v12813-pool1-3f8b4823-vw5wh:~ # /var/lib/rancher/rke2/bin/kubectl get pvc rwx-pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rwx-pvc Bound pvc-827b88c0-b02d-4d7e-a23a-39f32921da2f 8Gi RWX rwx-sc 29m ``` ``` rke2-v12813-pool1-3f8b4823-vw5wh:~ # /var/lib/rancher/rke2/bin/kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-827b88c0-b02d-4d7e-a23a-39f32921da2f 8Gi RWX Delete Bound default/rwx-pvc rwx-sc 57m ``` 7. ✅ Can attach the RWX pvc to multiple pods ![image](https://github.com/user-attachments/assets/3cee3338-6da4-47de-a527-3222bbaaff8a) ![image](https://github.com/user-attachments/assets/aee897e9-01ba-4a25-a544-535a9c8d26eb) ![image](https://github.com/user-attachments/assets/32969e60-e494-4b81-8788-910079b22244) ![image](https://github.com/user-attachments/assets/9c540812-46f5-4e43-80ab-47bc972327fd) 8. ✅ Check the RWX volume (PVC) can be attached to multiple pods and write file to it. ![image](https://github.com/user-attachments/assets/04be2eb4-0b07-482f-b726-6a89e287f4fe) ``` root@nginx-pod2-mount:/mnt/data# echo "nginx2 nginx2 nginx2" >> nginx2.txt root@nginx-pod2-mount:/mnt/data# ls lost+found nginx1.txt nginx2.txt root@nginx-pod2-mount:/mnt/data# ``` ``` root@nginx-pod3-mount:/mnt/data2# ls lost+found nginx1.txt nginx2.txt ```

$\color{green}{\textsf{PASS}}$ Rancher v2.9.2- Basic csi driver functionality test $~~$

1. Can create nginx deployment with a new RWX PVC ![image](https://github.com/user-attachments/assets/247b2806-e2fd-4243-9ee1-f8c97d7ddf36) 2. The new rwx PVC can be created ![image](https://github.com/user-attachments/assets/ad3e9924-098f-44a8-8e5e-abca8974a42a) 3. Also the new rwx PV be created ![image](https://github.com/user-attachments/assets/dec934d0-a4d3-4928-adf3-97d3e2e57679) 4. Can create the corresponding volume on the Harvester side ![image](https://github.com/user-attachments/assets/1048f1cb-3c20-4182-afee-fe9d9400e414)

$\color{green}{\textsf{PASS}}$ Rancher v2.9.2 - Create RWX volume on downstream cluster $~~$

1. ✅ The networkfs-manager pod is running on Harvester `v1.4.0-rc2` ``` harv140rc2:~ # kubectl get pods -A | grep networkfs-manager harvester-system harvester-networkfs-manager-74xrl 1/1 Running 3 (7m11s ago) 24h ``` 2. ✅ Can create Longhorn RWX storage class on Harvester ``` harv140rc2:~ # kubectl create -f rwx_sc.yaml storageclass.storage.k8s.io/longhorn-rwx created harv140rc2:~ # kubectl get storageclass.storage.k8s.io/longhorn-rwx NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE longhorn-rwx driver.longhorn.io Delete Immediate true 58s ``` 3. ✅ Can create new SC on downstream cluster to the above longhorn rwx sc ``` rke2-v1304-pool1-lfcx4-5k7zp:/var/lib/rancher/rke2/bin # ./kubectl create -f rwx-sc.yaml storageclass.storage.k8s.io/rwx-sc created rke2-v1304-pool1-lfcx4-5k7zp:/var/lib/rancher/rke2/bin # ./kubectl get storageclass.storage.k8s.io/rwx-sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE rwx-sc driver.harvesterhci.io Delete Immediate false 2m5s ``` 4. ✅ Can create RWX volume with the new `rwx-sc` storage class ![image](https://github.com/user-attachments/assets/23eece0e-4f8e-4b22-bed3-a9068cd28fe7) 5. ✅ The RWX volume can be attached to the multiple pod - Attach rwx volume on the first nginx deployment ![image](https://github.com/user-attachments/assets/d419db8a-e170-4eaf-a8f2-d2341502580f) - Attach rwx volume on the second nginx deployment ![image](https://github.com/user-attachments/assets/e71321ff-acb6-47d7-a445-b56944b2ff77) - Attach rwx volume on the third nginx deployment ![image](https://github.com/user-attachments/assets/3d1878f3-9cbf-4d1e-97d1-ca0bb06b7b31) - All nginx deployment are running well ![image](https://github.com/user-attachments/assets/e68a54ed-2148-49af-9a1c-75c2ff1c46ce) 6. ✅ The `rwx-pvc` PVC and related PV in RWX mode ``` rke2-v1304-pool1-8rw8t-6pb46:~ # /var/lib/rancher/rke2/bin/kubectl get pvc rwx-pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE rwx-pvc Bound pvc-50cee269-22b3-40b5-827b-70bd5cb0a258 8Gi RWX rwx-sc 12m ``` ``` rke2-v1304-pool1-8rw8t-6pb46:~ # /var/lib/rancher/rke2/bin/kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE pvc-50cee269-22b3-40b5-827b-70bd5cb0a258 8Gi RWX Delete Bound default/rwx-pvc rwx-sc 12m ``` 8. ✅ Can attach the RWX pvc to multiple pods ![image](https://github.com/user-attachments/assets/7d68e7ee-d00a-44be-bb19-e9a1d93fb97d) ![image](https://github.com/user-attachments/assets/6558b380-8011-4432-87cf-e80238f6fe99) ![image](https://github.com/user-attachments/assets/224c2de4-c268-48b4-843f-2a2794e21215) ![image](https://github.com/user-attachments/assets/e31f82fc-f4ff-4797-af82-bc870e953b9d) 9. ✅ Check the RWX volume (PVC) can be attached to multiple pods and write file to it. ![image](https://github.com/user-attachments/assets/259f6e98-9e90-4c53-8617-cab900e787ca) ``` root@nginx1:/# df -h Filesystem Size Used Avail Use% Mounted on overlay 40G 12G 29G 29% / tmpfs 64M 0 64M 0% /dev tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup 10.53.20.19:/pvc-aee7dc09-c819-42b7-9d70-dcb7d2cdae0d 7.8G 0 7.8G 0% /mnt/data1 /dev/vdb3 40G 12G 29G 29% /etc/hosts shm 64M 0 64M 0% /dev/shm tmpfs 3.9G 12K 3.9G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 2.0G 0 2.0G 0% /proc/acpi tmpfs 2.0G 0 2.0G 0% /proc/scsi tmpfs 2.0G 0 2.0G 0% /sys/firmware tmpfs 2.0G 0 2.0G 0% /sys/devices/virtual/powercap root@nginx1:/# cd /mnt/data1/ root@nginx1:/mnt/data1# echo "nginx1 nginx1 nginx1" >> nginx1.txt root@nginx1:/mnt/data1# ls lost+found nginx1.txt root@nginx1:/mnt/data1# ls lost+found nginx1.txt nginx2.txt nginx3.txt root@nginx1:/mnt/data1# ``` ``` root@nginx2:/mnt/data2# ls lost+found nginx1.txt root@nginx2:/mnt/data2# echo "nginx2 nginx2 nginx2" >> nginx2.txt root@nginx2:/mnt/data2# ls lost+found nginx1.txt nginx2.txt root@nginx2:/mnt/data2# ``` ``` root@nginx3:/# cd /mnt/data3/ root@nginx3:/mnt/data3# ls lost+found nginx1.txt nginx2.txt root@nginx3:/mnt/data3# echo "nginx3 nginx3 nginx3" >> nginx3.txt root@nginx3:/mnt/data3# ls lost+found nginx1.txt nginx2.txt nginx3.txt root@nginx3:/mnt/data3# ```

$\color{green}{\textsf{PASS}}$ Rancher v2.9.2 - Basic csi driver functionality test $~~$

1. Can create nginx deployment with a new RWX PVC ![image](https://github.com/user-attachments/assets/ec384ae1-79a1-41e5-85fd-82f70f2b6889) 2. The new rwx PVC can be created ![image](https://github.com/user-attachments/assets/a74d3c57-1be6-4aae-a193-75ca393fa402) 3. Also the new rwx PV be created ![image](https://github.com/user-attachments/assets/16f717e6-fb00-476c-ade9-05d659080586) 4. Can create the corresponding volume on the Harvester side ![image](https://github.com/user-attachments/assets/7e00d008-ea37-4995-9c9e-fe8321d2cc98)

Test Information

Test Environment: Single nodes harvester on equinix bare machines
Harvester version: v1.1-340486a3-head (23/11/20)
Rancher version: v2.8.8, v2.9.2
RKE2 version: v1.28.13, v1.30.4

Verify Steps

Test RWX sc without storage network

1. Create Harvester cluster (v1.4.0-rc2) and check the networkfs-manager pod is running. 3. Create Longhorn RWX SC (as below) on Harvester (Host Cluster) side ``` kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: longhorn-rwx provisioner: driver.longhorn.io allowVolumeExpansion: true reclaimPolicy: Delete volumeBindingMode: Immediate parameters: numberOfReplicas: "3" staleReplicaTimeout: "2880" fromBackup: "" fsType: "ext4" nfsOptions: "vers=4.2,noresvport,softerr,timeo=600,retrans=5" ``` 4. Create downstream cluster 6. Add the following repository and branch in the Apps -> Charts - Repository: https://github.com/Vicente-Cheng/rancher-charts - Branch: bump-harvester-csi-driver-to-v2.8 - Branch: bump-harvester-csi-driver-to-v2.9 7. Upgrade csi-driver from `0.1.18` to `0.1.19` (with 0.2.0 image) 8. Access the downstream cluster vm 10. Install the nfs client service 11. Create new SC on downstream cluster to the above longhorn rwx sc (as below) ``` allowVolumeExpansion: false apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rwx-sc parameters: hostStorageClass: longhorn-rwx provisioner: driver.harvesterhci.io reclaimPolicy: Delete volumeBindingMode: Immediate ``` 12. Access the PersistentVolumeClaim page in RKE2 cluster 13. Create a pvc, and select the `rwx-sc` storage class 14. Select Many Nodes-Read-Write option in the Customize page 15. Check the pvc can create correctly 16. Access the workload -> Pods page 17. Create the `nginx-pod2-mount` pod 18. Access the pod's storage and set existing pvc with the `rwx-pvc` created ![image](https://github.com/user-attachments/assets/4c3f1516-1e08-48d7-8629-eef3a52c267c) 19. Access the pod's storage and set the storage to the `rwx-pvc` 20. Provide the mount point path ![image](https://github.com/user-attachments/assets/711c1cd8-00df-4c32-b51b-2a1bb8da1a2d) 21. Check can create the pod well in running 22. Access the pod in the /mnt/data page 23. Write a new file named nginx1.txt with some content 24. Create the `nginx-pod3-mount` pod 25. Repeat steps 15 - 18 26. Write a new file named nginx2.txt with some content 27. Access the `nginx-pod2-mount` in the /mnt/data2 path 28. Check both the `nginx1.txt` and `nginx2.txt` file exists

Vicente-Cheng commented 4 weeks ago

Hi @TachunLin Did you replace the harvester csi driver? We did not bump the new harvester csi driver chart, so you need to replace it manually.

UPDATE: The corresponding version of harvester csi driver should be v0.2.0

Thanks!

TachunLin commented 3 weeks ago

Thanks Vicente for the reminder, after setting the repository and upgrade the csi-driver to 0.1.19 (with 0.2.0 image) We can correctly create the RWX volume, attached to multiple pods and write files accordingly.

harvester / harvester