kubernetes-sigs / vsphere-csi-driver

vSphere storage Container Storage Interface (CSI) plugin
https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/index.html
Apache License 2.0
293 stars 177 forks source link

CSI driver will not work in default configuration with topology enabled in provisioner #2970

Open gnufied opened 1 month ago

gnufied commented 1 month ago

After https://github.com/kubernetes-csi/external-provisioner/pull/1167 is merged, topology feature is enabled by default in csi-provisioner.

Now since vsphere CSI driver by default returns topology capability - pkg/csi/service/identity.go:65 even though cluster has no topology, all volume provisioning operations will fail.

cc @divyenpatel @xing-yang @jingxu97

gnufied commented 1 month ago

May be a solution here is to not report topology capability in clusters where no topology information is configured/available. This will allow driver to work out of box with current version of csi-provisioner. Alternatively - I have considered emitting topology information even in clusters that are single zone, but that will require quite a bit of changes and also is manual process and hence clusters will break on upgrade.

My personal preference would be a CLI flag, which can be specified while starting the driver.

gnufied commented 1 month ago

Another thing is - disabling the topology feature in csi-provisioner is apparently not enough. With latest version of csi-provisioner, vSphere CSI driver is unable to delete intree vSphere PVs - https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_vmware-vsphere-csi-driver-operator/241/pull-ci-openshift-vmware-vsphere-csi-driver-operator-master-e2e-vsphere/1816155462064672768

jsafrane commented 1 month ago

~The reason is that the CSI driver is not idempotent. When processing a migrated in-tree volume, it gets the first DeleteVolume requests and succeeds. But then the provisioner sends the same DeleteVolume request again (https://github.com/kubernetes-csi/external-provisioner/issues/1235) and the CSI driver returns failure instead of success.~

~Sure, the provisioner should not always call DeleteVolume twice, we're going to fix it, still, it's a bug in the driver that it's not idempotent. The external-provisioner can send DeleteVolume multiple times, it's allowed in CSI and it will happen e.g. during container updates or node drains.~

Moved to https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/2981