Bug Report - vsphere-csi-driver disabled px storage cluster

kubernetes-sigs / vsphere-csi-driver

vSphere storage Container Storage Interface (CSI) plugin

https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/index.html

Apache License 2.0

296 stars 179 forks source link

Bug Report - vsphere-csi-driver disabled px storage cluster #2562

Closed ibrassfield closed 7 months ago

ibrassfield commented 1 year ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature

What happened: I have an OpenShift 4.12 baremetal cluster with a mix of vSphere VMs and Baremetal nodes. The baremetal nodes are configured to be portworx storage providers so I do not want the vsphere.csi.driver to interact with those specific worker nodes. I want to apply the vsphere.csi.driver only to the infrastructure nodes in my cluster -- meaning vsphere would only apply store to 3 specific nodes. When I tried to install following instructions. The vsphere.csi.driver knocked my portworx cluster offline which caused a bunch of problems.

What you expected to happen: I was expecting to be able to have multiple csi.drivers in mycluster and also be able to only apply this VMware store to the infrastructure nodes, which are sitting in vmware.

How to reproduce it (as minimally and precisely as possible): To reproduce this error i would just apply the vsphere-csi-driver configs to the cluster

Anything else we need to know?:

Environment:

csi-vsphere version: latest
vsphere-cloud-controller-manager version: latest
Kubernetes version: 1.25/OpenShift 4.12
vSphere version:
OS (e.g. from /etc/os-release): RHCOS/OpenShift
Kernel (e.g. uname -a):
Install tools: oc client
Others:

divyenpatel commented 1 year ago

cc: @gnufied

gnufied commented 1 year ago

So I assume that - this cluster was deployed as baremetal cluster type when deploying OCP? Because Openshift by default installs a vSphere CSI driver on all nodes in the cluster in 4.12 and it can't be disable or turned off.

Can you confirm, what kind of platform integration you chose when installing OCP?

Assuming baremetal installs - it should be possible to install vsphere driver separately and portworx drivers separately (at least in theory).

The vsphere.csi.driver knocked my portworx cluster offline which caused a bunch of problems.

Can you elaborate? Did you set node-selectors for both controller and daemonset appropriately?

ibrassfield commented 1 year ago

So I assume that - this cluster was deployed as baremetal cluster type when deploying OCP? Because Openshift by default installs a vSphere CSI driver on all nodes in the cluster in 4.12 and it can't be disable or turned off.

Can you confirm, what kind of platform integration you chose when installing OCP?

Assuming baremetal installs - it should be possible to install vsphere driver separately and portworx drivers separately (at least in theory).

The vsphere.csi.driver knocked my portworx cluster offline which caused a bunch of problems.

Thanks for the response.

Yes this is a baremetal cluster type. So there is no specific platform integration.

I did use node-selectors on the daemon sets but maybe not the controller and not sure how to do that.

When I mean knocked my portworx cluster online. What I mean is that it replaced my portworx csi driver in priority and set itself as default. Somehow that disconnected the Array Blade communication from the Openshift cluster - which forced us to redeploy and request new licensing for the cluster

gnufied commented 1 year ago

It is hard to say much without looking at logs and cluster configuration. I would recommend opening a ticket against Openshift and provide all details such as must-gather and then oc adm inspect ( https://docs.openshift.com/container-platform/4.13/cli_reference/openshift_cli/administrator-cli-commands.html#oc-adm-inspect ) output of both namespace in which vsphere and portworx drivers are deployed.

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 7 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/2562#issuecomment-2027129382): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.