kubernetes-sigs / vsphere-csi-driver

vSphere storage Container Storage Interface (CSI) plugin
https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/index.html
Apache License 2.0
293 stars 177 forks source link

CSI Driver 2.5.4: error "failed to get shared datastores in kubernetes cluster" #2377

Open jsoule6 opened 1 year ago

jsoule6 commented 1 year ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened:

Installed the CSI driver and Cloud Controller Manager on a K8s cluster running on vSphere VMs. Everything installed well and it is connecting to vSphere and successfully getting node information. However, when we try to deploy a pod on the cluster with the new Storage Class that we created, we are getting the following error:

failed to get shared datastores in kubernetes cluster. Error: no shared datastores found for nodeVm.

We have a single vCenter and are not using a topology aware setup. We have checked the permissions on the vSphere side for the account that we are using and all looks good. The only thing we can think of that might be causing this is that the host that the Control Plane node is on does not have access to the same datastore that the Worker nodes all do. We tried applying and using a Storage Policy as well, but with the same result.

Is it a requirement that all nodes including the Control Plane have access to at least one Shared datastore?

What you expected to happen:

I would expect the PVC to be created

How to reproduce it (as minimally and precisely as possible):

Create a K8s cluster with CSI 2.5.4 and the other versions mentioned above.

Anything else we need to know?:

Environment: Using the following versions:

vSphere: 6.7 Update 3 Kubernetes: 1.21 Cloud Controller Manager: 1.21 CSI Driver: 2.5.4

divyenpatel commented 1 year ago

The only thing we can think of that might be causing this is that the host that the Control Plane node is on does not have access to the same datastore that the Worker nodes all do.

This is the reason the driver is not able to find the shared accessible datastore for all nodes. We should have a datastore accessible to all nodes in the cluster including control plane nodes.

sba30 commented 1 year ago

We encountered the same issue, when we deploy our workers to a single Vsphere cluster using VSAN storage it works fine, but when we split the workers to be deployed across 2 vsphere clusters, each cluster with their own VSAN storage we get the same error when creating the PVC.

Is there a way in this setup for the PVC to only go to 1 of the vsphere clusters and its VSAN storage?

in our setup its now possible for the 2 vsphere clusters to have shared storage, they each have their own VSAN Storage

divyenpatel commented 1 year ago

@sba30 you can define topology on the nodes, and utilize volume topology feature to provision volume on specific vSphere cluster. https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-162E7582-723B-4A0F-A937-3ACE82EAFD31.html

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

divyenpatel commented 5 months ago

/remove-lifecycle rotten

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

YuanPeterGao commented 2 months ago

Is there a way in this setup for the PVC to only go to 1 of the vsphere clusters and its VSAN storage?

We have a similar request for CSI driver to have PVC only on a subset of worker nodes. In our use cases, control-plane and worker nodes are in different vSphere datacenters, and we only need volumns on the worker nodes.

@sba30 you can define topology on the nodes, and utilize volume topology feature to provision volume on specific vSphere cluster. https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-162E7582-723B-4A0F-A937-3ACE82EAFD31.html

It is mentioned that "If you already use vSphere Container Storage Plug-in to run applications, but haven't used the topology feature, you must re-create the entire cluster and delete any existing PVCs in the system to be able to use the topology feature.". As we are looking for migrating existing product clusters to the aforementioned setup, re-creating the entire cluster is likely not an option for us.

jingxu97 commented 2 months ago

@xing-yang @divyenpatel Could you please help take a look at this issue? Thank you!

YuanPeterGao commented 2 months ago

To add more details, we are looking for support to exclude certain nodes in the clusters from CSI driver and PVs won't be attached to those ever.

As for our use cases, we plan to add new nodes in one vSphere datacenter, to an existing cluster where all the nodes are in another vSpehre datacenter and the CSI is functional.

jingxu97 commented 1 month ago

Learned that this change https://github.com/kubernetes-sigs/vsphere-csi-driver/pull/2412 by @gn will allow day2 topology operation, we do not need to recreate the whole cluster.

@YuanPeterGao

GN commented 1 month ago

Learned that this change https://github.com/kubernetes-sigs/vsphere-csi-driver/pull/2412 by @gn will allow day2 topology operation, we do not need to recreate the whole cluster.

@YuanPeterGao

NotTheDroids.png

k8s-triage-robot commented 3 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

agowa commented 1 week ago

/remove-lifecycle rotten

Issue still exists. I just hit it today...

Environment:

vSphere Client Version: 8.0.2.00300 CSI Driver: v4.0.0_vmware.1 Kubernetes: v1.28.7+vmware.1

# kubectl version --output=yaml
clientVersion:
  buildDate: "2023-06-27708:59: 392"
  compiler: gc
  gitCommit: 094a5d36c0a04fec8700031caf9c63cec5fda2c8
  gitTreeState: clean
  gitVersion: v1.27.2+vmware.2
  goVersion: go1.20.4
  major: "1"
  minor: "27"
  platform: linux/amd64
kustomizeVersion: v5.0.1
serverVersion:
  buildDate: "2024-02-19T11:29: 19"
  compiler: gc
  gitCommit: 1630090845297a4603596750ce2833d35761bfe
  gitTreeState: clean
  gitVersion: v1.28.7+vmware.1
  goVersion: go1.21.7
  major: "1"
  minor: "28"
  platform: linux/amd64