Helm install of rancher-vsphere-csi/rancher-vsphere-csi-100.1.0+up2.3.0 : Deployment blocked by standard taints

crowne commented 2 years ago

Environmental Info: RKE2 Version:

rke2 version v1.21.9+rke2r1 (e48f07f7b208c0e43c537fca006cd5b6ce31b13b) go version go1.16.10b7

Node(s) CPU architecture, OS, and Version:

Linux vs-test1-controller-29377752-vp8h2 5.4.0-99-generic #112-Ubuntu SMP Thu Feb 3 13:50:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

2 controllers 1 worker I also tried 3 controllers 3 workers same issue

Describe the bug:

After successfully installing vSphere CPI (100.1.0+up1.0.100) via the charts on Rancher I tried to install vSphere CSI (100.1.0+up2.3.0) The installation hangs with the following logs: Connected Filter helm install --namespace=kube-system --timeout=10m0s --values=/home/shell/helm/values-rancher-vsphere-csi-100.1.0-up2.3.0.yaml --version=100.1.0+up2.3.0 --wait=true vsphere-csi /home/shell/helm/rancher-vsphere-csi-100.1.0-up2.3.0.tgz creating 13 resource(s) beginning wait for 13 resources with timeout of 10m0s Deployment is not ready: kube-system/vsphere-csi-controller. 0 out of 1 expected pods are ready DaemonSet is not ready: kube-system/vsphere-csi-node. 0 out of 1 expected pods are ready DaemonSet is not ready: kube-system/vsphere-csi-node. 0 out of 1 expected pods are ready DaemonSet is not ready: kube-system/vsphere-csi-node. 0 out of 1 expected pods are ready DaemonSet is not ready: kube-system/vsphere-csi-node. 0 out of 1 expected pods are ready DaemonSet is not ready: kube-system/vsphere-csi-node. 0 out of 1 expected pods are ready DaemonSet is not ready: kube-system/vsphere-csi-node. 0 out of 1 expected pods are ready DaemonSet is not ready: kube-system/vsphere-csi-node. 0 out of 1 expected pods are ready Deployment is not ready: kube-system/vsphere-csi-controller. 0 out of 1 expected pods are ready Deployment is not ready: kube-system/vsphere-csi-controller. 0 out of 1 expected pods are ready Deployment is not ready: kube-system/vsphere-csi-controller. 0 out of 1 expected pods are ready

while showing the following error on the deployment of vsphere-csi-controller 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 2 node(s) had taint {node-role.kubernetes.io/control-plane: }, that the pod didn't tolerate. I removed the taint with kubectl taint nodes controller1 node-role.kubernetes.io/control-plane- The the next error is 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 2 node(s) had taint {node-role.kubernetes.io/etcd: }, that the pod didn't tolerate. I similarly removed this taint with kubectl taint nodes controller1 node-role.kubernetes.io/control-plane-

Then the application installs however I never got the vSphere storage working and subsequently saw strange behaviour, so deleted the cluster. Strange behaviour included PersistentVolumes and StorageClasses menu items disappearing from the menu, after having created Storage Class for testing. I was a concerned about having to remove the standard taints:

node-role.kubernetes.io/control-plane=:NoSchedule
node-role.kubernetes.io/etcd=:NoExecute

especially as the chart defines tolerarations for them as per below,

      tolerations:
        - key: node-role.kubernetes.io/master
          operator: Exists
          effect: NoSchedule
        # Rancher specific change: These tolerations are added to account for RKE1 and RKE2 taints
        - key: node-role.kubernetes.io/controlplane
          effect: NoSchedule
          value: "true"
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
          value: "true"
        - key: node-role.kubernetes.io/etcd
          effect: NoSchedule
          value: "true"

I was also surprised to see the error message complaining about the taint by key only(node-role.kubernetes.io/control-plane, node-role.kubernetes.io/etcd) without the effect (NoSchedule, NoExecute)

Steps To Reproduce:

I started with 3 manually provisioned VM's on a vmware cluster running ubuntu-20.04 (1 controller 2 workers) I followed this guide and installed RKE2 server https://rancher.com/docs/rancher/v2.5/en/installation/resources/k8s-tutorials/ha-rke2/#1-install-kubernetes-and-set-up-the-rke2-server Next I followed this guide to install rancher https://rancher.com/docs/rancher/v2.0-v2.4/en/installation/install-rancher-on-k8s/

Next I'm going to create an auto-provisioned cluster on vSphere Log into rancher as admin Create Cloud Credentials for VMware vSphere Create Cluster select VMware vSphere add all the details create a control pool and a worker pool accepts the default configs (kubernetes 1.21.9+rke2r1, calico, cloud-provider=none) click create wait for cluster to be provisioned navigate to Apps Marketplace/Charts install vSphere CPI install vSphere CSI

Expected behavior:

I expect the vSphere CSI installation to complete successfully without being blocked by standard taints on the control-nodes.

Actual behavior:

The installation didn't complete until the standard etcd and control-plane taints were removed.

Additional context / logs:

brandond commented 2 years ago

accepts the default configs (kubernetes 1.21.9+rke2r1, calico, cloud-provider=none) click create wait for cluster to be provisioned navigate to Apps Marketplace/Charts install vSphere CPI install vSphere CSI

This is not the expected way to install these charts. You should instead select "rancher-vsphere" as the cloud provider when creating the downstream cluster. This will install the version of the vsphere charts that is bundled with RKE2, and inject the appropriate cluster configuration.

crowne commented 2 years ago

Thanks Brandon, I think I avoided that option because it wasn't clear where to apply the config referred to in the message: "Configure the vSphere Cloud Provider and Storage Provider options in the Add-On Config tab."

I've seen this problem referred to here 35777.

@SoarinFerret says that it works when the CPI and CSI YAML is added manually, I don't know how to add the YAML manually, should it be pasted into the Additional Manifest field? What is the YAML meant to look like is there a working sample somewhere that I could take a look at?

The Add-On Config tab appears as follows:

brandond commented 2 years ago

Yes, in the Additional Manifests section you should provide a HelmChartConfig manifest with information on your vSphere instance:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-csi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "vcsa-xxxxx"
      datacenters: "datacenter"
      username: "xxxxx"
      password: "xxxx"
      clusterId: "rke2clutest"
      configSecret:
        generate: true
    storageClass:
      datastoreURL: "ds:///vmfs/volumes/xxxxxxxxxxxxxxxxxxxxxxxx/"

crowne commented 2 years ago

@brandond Thanks for the help, I added the YAML above, I modified it and added config for rancher-vsphere-cpi followed by separator then config for rancher-vsphere-csi. However the installation for the first controller node in the cluster doesn't complete, the logs keep scrolling with the following message

[INFO ] provisioning bootstrap node(s) vs2-controller-78d5fb5c46-xlmmz: waiting for cluster agent to be available  
[INFO ] non-ready bootstrap machine(s) vs2-controller-78d5fb5c46-xlmmz: waiting for cluster agent to be available and join url to be available on bootstrap node  
[INFO ] provisioning bootstrap node(s) vs2-controller-78d5fb5c46-xlmmz: waiting for cluster agent to be available  
[INFO ] non-ready bootstrap machine(s) vs2-controller-78d5fb5c46-xlmmz: waiting for cluster agent to be available and join url to be available on bootstrap node

Do you have any hints on how I should try to solve this?

brandond commented 2 years ago

That looks like logs from rancher-system-agent, is that correct? Can you look at the logs for rke2-server and the output of head -n -1 /var/log/pods/kube-system_*/*/*.log ?

crowne commented 2 years ago

Yes, the logs above are from the Rancher / Cluster / Provisioning Log screen I've attached the logs you mentioned, apologies they are quite verbose, I had left the server running but I've deleted a lot of the repeated lines. head_logs.txt

brandond commented 2 years ago

I'm not seeing any logs from the vsphere CPI, although the helm chart appears to have been installed successfully. The CSI however is complaining about the vsphere host not being set, which makes me suspect that the config file you set via the HelmChartConfig is perhaps not formatted properly:

==> /var/log/pods/kube-system_vsphere-csi-node-fprtw_12d3fad4-e5df-4c06-b16e-353b09c25a6b/vsphere-csi-node/0.log <==
2022-02-28T15:52:33.169771347Z stderr F {"level":"info","time":"2022-02-28T15:52:33.169404564Z","caller":"config/config.go:373","msg":"Could not stat /etc/cloud/csi-vsphere.conf, reading config params from env","TraceId":"b2ef7e54-9e14-4a3f-ad30-1ae49c6fbc78"}
2022-02-28T15:52:33.169778911Z stderr F {"level":"error","time":"2022-02-28T15:52:33.169429535Z","caller":"config/config.go:263","msg":"no Virtual Center hosts defined","TraceId":"b2ef7e54-9e14-4a3f-ad30-1ae49c6fbc78"
2022-02-28T15:52:33.16980536Z stderr F {"level":"error","time":"2022-02-28T15:52:33.169487748Z","caller":"config/config.go:377","msg":"Failed to get config params from env. Err: no Virtual Center hosts defined","TraceId":"b2ef7e54-9e14-4a3f-ad30-1ae49c6fbc78"

Can you share the output of:

kubectl get nodes -o wide
kubectl get pods -A
kubectl get helmchart,helmchartconfig -A

crowne commented 2 years ago

I had to install kubectl manually, the cluster isn't provisioned yet so all kubectl commands respond with
The connection to the server localhost:8080 was refused - did you specify the right host or port?

The root cause of the problem seems to be the first error that you highlighted above "Could not stat /etc/cloud/csi-vsphere.conf", it then tries to resolve with environment vars which obviously doesn't work.

I'm not sure why the file is not available, my Additional Manifest looks ok to me:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-cpi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "10.99.99.4"
      port: 443
      insecureFlag: "1"
      datacenters: "DC2-Loc"
      username: "vsuser@vsphere.local"
      password: "******"
      credentialsSecret:
        generate: true
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-csi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "10.99.99.4"
      port: 443
      insecureFlag: "1"
      datacenters: "DC2-Loc"
      username: "vsuser@vsphere.local"
      password: "******"
      clusterId: "vs3"
      configSecret:
        generate: true
    storageClass:
      datastoreURL: ds:///vmfs/volumes/5e4bac8b-c0242362-d279-44a8427e1bb5/

brandond commented 2 years ago

@rancher-max does this look correct to you? Any tips to offer?

I had to install kubectl manually, the cluster isn't provisioned yet so all kubectl commands respond with The connection to the server localhost:8080 was refused - did you specify the right host or port?

You should run: export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml KUBECONFIG=/etc/rancher/rke2/rke2.yaml PATH=$PATH:/var/lib/rancher/rke2/bin - once doing that you should be able to use kubectl on the server.

rancher-max commented 2 years ago

Hmmm one thing to try is possibly adding this additional part of the valuesContent to the csi config (posting larger snippet so it's clear where it goes):

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-csi
  namespace: kube-system
spec:
  valuesContent: |-
    csiController:
      nodeSelector:
        node-role.kubernetes.io/control-plane: "true"

crowne commented 2 years ago

@brandond thanks for the kubectl path settings. @rancher-max I tried with the additional csiController config, but I still get the same problem.

It still looks to me like the root cause is "Could not stat /etc/cloud/csi-vsphere.conf, reading config params from env"

I also tried creating the vsphere.conf and csi-vsphere.conf files in /etc/cloud/ using the cloud-init settings instead of the Additional Manifest and I could see them when I ssh into the vm, but the logs still give the same message above. It subsequently occurred to me that the process which is writing to the vsphere-csi-node/0.log is probably running in a container on the vm so it doesn't see the files which I created with cloud-init.

crowne commented 2 years ago

It looks like the vsphere-config-secret is not being created correctly. It appears to have blank default values.

Could this be because of the Additional Manifest containing a separator?

root@vs6-ctl-a15936a0-bfx78:/home/docker# kubectl get secrets -n kube-system
...
sh.helm.release.v1.rke2-coredns.v1                         helm.sh/release.v1                    1      153m
statefulset-controller-token-txhpd                         kubernetes.io/service-account-token   3      154m
ttl-after-finished-controller-token-mhn2w                  kubernetes.io/service-account-token   3      154m
ttl-controller-token-mgqg4                                 kubernetes.io/service-account-token   3      154m
vs6-ctl-a15936a0-bfx78.node-password.rke2                  Opaque                                1      154m
vsphere-config-secret                                      Opaque                                1      153m
vsphere-cpi-creds                                          Opaque                                2      153m
vsphere-csi-controller-token-lwtj5                         kubernetes.io/service-account-token   3      153m
vsphere-csi-node-token-kvfh6                               kubernetes.io/service-account-token   3      153m
root@vs6-ctl-a15936a0-bfx78:/home/docker#
root@vs6-ctl-a15936a0-bfx78:/home/docker# kubectl get secret vsphere-config-secret -n kube-system -o jsonpath="{$.data.csi-vsphere\.conf}" | base64 --decode
[Global]
cluster-id = "c-m-kkgmr27x"
user = ""
password = ""
port = "443"
insecure-flag = "1"

[VirtualCenter ""]
datacenters = ""
root@vs6-ctl-a15936a0-bfx78:/home/docker#
root@vs6-ctl-a15936a0-bfx78:/home/docker#

Additional Manifest

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-cpi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "10.99.99.4"
      port: 443
      insecureFlag: "1"
      datacenters: "DC2-Loc"
      username: "vsuser@vsphere.local"
      password: "******"
      credentialsSecret:
        generate: true
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-csi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "10.99.99.4"
      port: 443
      insecureFlag: "1"
      datacenters: "DC2-Loc"
      username: "vsuser@vsphere.local"
      password: "******"
      clusterId: "vs6"
      configSecret:
        generate: true
    csiController:
      nodeSelector:
        node-role.kubernetes.io/control-plane: "true"
    storageClass:
      datastoreURL: ds:///vmfs/volumes/5e4bac8b-c0242362-d279-44a8427e1bb5/

brandond commented 2 years ago

Can you check the contents of the vsphere-cpi-creds and vsphere-config-secret secrets? It looks like the former is what the CPI chart should create.

crowne commented 2 years ago

I can confirm that vsphere-cpi-creds is OK, I decoded the values and they are correct. vsphere-config-secret looks like it has default and empty values

vsphere-config-secret                                      Opaque                                1      9h
vsphere-cpi-creds                                          Opaque                                2      9h
vsphere-csi-controller-token-lwtj5                         kubernetes.io/service-account-token   3      9h
vsphere-csi-node-token-kvfh6                               kubernetes.io/service-account-token   3      9h
root@vs6-ctl-a15936a0-bfx78:~# kubectl get secret vsphere-cpi-creds -n kube-system -o jsonpath="{$}"
{"apiVersion":"v1","data":{"10.99.99.4.password":"******","10.99.99.4.username":"******"},"kind":"Secret","metadata":{"annotations":{"meta.helm.sh/release-name":"rancher-vsphere-cpi","meta.helm.sh/release-namespace":"kube-system"},"creationTimestamp":"2022-03-03T10:24:26Z","labels":{"app.kubernetes.io/managed-by":"Helm","component":"rancher-vsphere-cpi-cloud-controller-manager","vsphere-cpi-infra":"secret"},"managedFields":[{"apiVersion":"v1","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:10.99.99.4.password":{},"f:10.99.99.4.username":{}},"f:metadata":{"f:annotations":{".":{},"f:meta.helm.sh/release-name":{},"f:meta.helm.sh/release-namespace":{}},"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{},"f:component":{},"f:vsphere-cpi-infra":{}}},"f:type":{}},"manager":"helm","operation":"Update","time":"2022-03-03T10:24:26Z"}],"name":"vsphere-cpi-creds","namespace":"kube-system","resourceVersion":"733","uid":"c5e3cdfb-a4fb-4f58-91d9-11fbf4ac63be"},"type":"Opaque"}root@vs6-ctl-a15936a0-bfx78:~#
root@vs6-ctl-a15936a0-bfx78:~#
root@vs6-ctl-a15936a0-bfx78:~# kubectl get secret vsphere-config-secret -n kube-system -o jsonpath="{$}"
{"apiVersion":"v1","data":{"csi-vsphere.conf":"W0dsb2JhbF0KY2x1c3Rlci1pZCA9ICJjLW0ta2tnbXIyN3giCnVzZXIgPSAiIgpwYXNzd29yZCA9ICIiCnBvcnQgPSAiNDQzIgppbnNlY3VyZS1mbGFnID0gIjEiCgpbVmlydHVhbENlbnRlciAiIl0KZGF0YWNlbnRlcnMgPSAiIgo="},"kind":"Secret","metadata":{"annotations":{"meta.helm.sh/release-name":"rancher-vsphere-csi","meta.helm.sh/release-namespace":"kube-system"},"creationTimestamp":"2022-03-03T10:24:26Z","labels":{"app.kubernetes.io/managed-by":"Helm"},"managedFields":[{"apiVersion":"v1","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:csi-vsphere.conf":{}},"f:metadata":{"f:annotations":{".":{},"f:meta.helm.sh/release-name":{},"f:meta.helm.sh/release-namespace":{}},"f:labels":{".":{},"f:app.kubernetes.io/managed-by":{}}},"f:type":{}},"manager":"helm","operation":"Update","time":"2022-03-03T10:24:26Z"}],"name":"vsphere-config-secret","namespace":"kube-system","resourceVersion":"689","uid":"eb7697fb-7b85-400f-9a8a-70ad89d96612"},"type":"Opaque"}root@vs6-ctl-a15936a0-bfx78:~#
root@vs6-ctl-a15936a0-bfx78:~#
root@vs6-ctl-a15936a0-bfx78:~# kubectl get secret vsphere-config-secret -n kube-system -o jsonpath="{$.data.csi-vsphere\.conf}" | base64 --decode
[Global]
cluster-id = "c-m-kkgmr27x"
user = ""
password = ""
port = "443"
insecure-flag = "1"

[VirtualCenter ""]
datacenters = ""
root@vs6-ctl-a15936a0-bfx78:~#

brandond commented 2 years ago

Hmm, I'm not sure where that CusterID is coming from. Do you still have the chart installed from Rancher Apps or are you at this point only using the version that RKE2 deploys?

crowne commented 2 years ago

Its only from RKE2, basically with each round of testing I go to create a new cluster and then select the VMware vSphere option and proceed from there.

crowne commented 2 years ago

I ran another test where I switched the order of the Additional Manifest entries, this time putting rancher-vsphere-csi before rancher-vsphere-cpi but I still get the same result.

crowne commented 2 years ago

I finally managed to provision a cluster on vSphere with external storage.
However I was only able to do so by reverting to RKE1 with an in-tree cloud provider. So It seems like the RKE2 'Tech Preview' is still a bit buggy.

mitchellmaler commented 2 years ago

We are deploying rke2 v1.21.6+rke2r1 with the cloud-provider-name of rancher-vsphere which deploys the CPI and CSI charts. Then we provide the configuration for both using the HelmChartConfig files like this:

---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-cpi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "{{ vcenter_host }}"
      datacenters: "{{ vcenter_datacenters }}"
      username: "{{ vcenter_username }}"
      password: "{{ vcenter_password }}"
      credentialsSecret:
        generate: true
    cloudControllerManager:
      nodeSelector:
        node-role.kubernetes.io/control-plane: "true"
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-csi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "{{ vcenter_host }}"
      port: 443
      insecureFlag: '1'
      clusterId: "{{ kubernetes_cluster_name }}"
      datacenters: "{{ vcenter_datacenters }}"
      username: "{{ vcenter_username }}"
      password: "{{ vcenter_password }}"
      configSecret:
        generate: true
    csiController:
      nodeSelector:
        node-role.kubernetes.io/control-plane: "true"
      csiResizer:
        enabled: false
    storageClass:
      enabled: true
      name: vsphere
      isDefault: true

Not sure if this is helpful but it has deployed for us without an issue.

crowne commented 2 years ago

@mitchellmaler thanks I appreciate that, it helps a lot the provisioning finishes successfully with this nodeSelector config.

The only problem that I have now is the vsphere-csi-controller pod has status=Crashloopbackoff and the logs report: connection.go:172] Still connecting to unix:///csi/csi.sock

I see that the vsphere-config-secret is still not being created correctly, perhaps this is related.

crowne commented 2 years ago

I've made 2 changes and its working now.

I added datastoreURL to the storageClass

    storageClass:
      enabled: true
      name: vsphere
      isDefault: true
      datastoreURL: ds:///vmfs/volumes/5e4bac8b-c0242362-d279-44a8427e1bb5/

I edited the vsphere-config-secret, filling in the blank values and changing the default cluster-id value.

I think that there must be an underlying issue with the helm chart which is meant to create the vsphere-config-secret

crowne commented 2 years ago

It looks like I spoke too soon. I managed to get it running on a cluster with a single node with all 3 roles (etcd, control-plane & worker).

However if I create a pool with only etcd + control-plane then the provisioning doesn't finish and after the final probe for calico then the logs keep polling with the message: waiting for cluster agent to be available and join url to be available on bootstrap node

If I create a cluster with 2 pools : controller pool with all nodes provisions correctly, but the pool with only worker role is Waiting for agent to check in and apply initial plan

nfsouzaj commented 2 years ago

is there any news on this ticket? I am also trying to migrate to RKE2 which has proven to be a very difficult task.

seb-835 commented 2 years ago

Hi, trying to set vsphere cpi/csi with rke2, where i can find or got documentation / how to on values to be write in the csi and cpi HelmChartConfig manifest ? Thanks

brandond commented 2 years ago

@seb-835 where i can find or got documentation / how to on values to be write in the csi and cpi

https://github.com/rancher/rke2-charts/blob/main/charts/rancher-vsphere-cpi/rancher-vsphere-cpi/1.2.201/values.yaml

https://github.com/rancher/rke2-charts/blob/main/charts/rancher-vsphere-csi/rancher-vsphere-csi/2.5.1-rancher101/values.yaml

havkros commented 1 year ago

just wanted to comment on this.

This is still an issue when creating RKE2 clusters on vSphere with Rancher v2.7.

The solution for me was to add the two manifests mentioned in an earlier post. I used the .OVA version of the Ubuntu 22.04 cloud image which I found here: https://cloud-images.ubuntu.com/jammy/current/.

---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-cpi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "10.0.20.20"
      datacenters: "Datacenter-A"
      username: "administrator@internal.lab"
      password: "MyvCenterPassw0rd!"
      credentialsSecret:
        generate: true
    cloudControllerManager:
      nodeSelector:
        node-role.kubernetes.io/control-plane: "true"
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rancher-vsphere-csi
  namespace: kube-system
spec:
  valuesContent: |-
    vCenter:
      host: "10.0.20.20"
      port: 443
      insecureFlag: '1'
      clusterId: "cluster81"
      datacenters: "Datacenter-A"
      username: "administrator@internal.lab"
      password: "MyvCenterPassw0rd!"
      configSecret:
        generate: true
    csiController:
      nodeSelector:
        node-role.kubernetes.io/control-plane: "true"
      csiResizer:
        enabled: false
    storageClass:
      enabled: true
      name: vsphere
      isDefault: true
      datastoreURL: "ds:///vmfs/volumes/610996f6-af374ab9-7109-1c697aa3362c/"

brandond commented 1 year ago

@havkros You've added a nodeSelector to place the pods on the control-plane nodes. I don't see anything about taints or tolerations?

havkros commented 1 year ago

I didn’t do anything with taints and tolerations. My issue might have been something else than the original post, but pasting these two manifests made things work for me.

I struggled for at least a week to get rancher to create a RKE2 cluster using vsphere. Nothing seemed to work, so I found this post and basically copied these two manifests and used them as is.

i haven’t really looked into it, but it looks like rancher doesn’t pass the csi / cpi values to the rke2 installation process? I’ve always used the form fields in the add-on config menu to enter the details about vcenter, login creds, datacenter name etc, but the manifests are needed to get rid of the “agent is waiting to connect” error.

brandond commented 1 year ago

It is supposed to, but I believe there's an open issue on the Rancher side about it only passing them to the CPI chart instead of both CPI and CSI. That would be a Dashboard or Rancher bug though, not RKE2.

caroline-suse-rancher commented 1 year ago

Closing because there doesn't appear to be an RKE2 bug at this point and this issue has gone stale. Please open a new issue if a bug is identified.

rancher / rke2

Helm install of rancher-vsphere-csi/rancher-vsphere-csi-100.1.0+up2.3.0 : Deployment blocked by standard taints #2506