Closed galal-hussein closed 6 months ago
release-1.27
branch with commit 191329a6e3faac429ec145523e8d84c1b3be81fe
- Scenario 1 failing!New pvcs are failing to create their associated pvs after these chart changes. This happens on both fresh install and after upgrade. The pvc events are:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 29s csi.vsphere.vmware.com_vsphere-csi-controller-f57fb7df8-phs7b_8aa1a816-399e-4eb4-b950-62629596a402 External provisioner is provisioning volume for claim "default/claim1"
Warning ProvisioningFailed 29s csi.vsphere.vmware.com_vsphere-csi-controller-f57fb7df8-phs7b_8aa1a816-399e-4eb4-b950-62629596a402 failed to provision volume with StorageClass "vsphere-csi-sc": rpc error: code = Unavailable desc = error reading from server: EOF
Normal Provisioning 7s csi.vsphere.vmware.com_vsphere-csi-controller-f57fb7df8-g8wr6_f4f565d6-b559-4129-a08e-02da5bb0c197 External provisioner is provisioning volume for claim "default/claim1"
Warning ProvisioningFailed 7s csi.vsphere.vmware.com_vsphere-csi-controller-f57fb7df8-g8wr6_f4f565d6-b559-4129-a08e-02da5bb0c197 failed to provision volume with StorageClass "vsphere-csi-sc": rpc error: code = Unavailable desc = error reading from server: EOF
Normal ExternalProvisioning 6s (x4 over 29s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
I can get around this by setting csi-auth-check: "true"
either by manually updating the configmap after install or by including it in the helmchart ahead of time with:
csiAuthCheck:
enabled: true
Infrastructure
Node(s) CPU architecture, OS, and Version:
$ cat /etc/os-release | grep PRETTY
PRETTY_NAME="Ubuntu 22.04.3 LTS"
Cluster Configuration:
1 server
Config.yaml:
# /etc/rancher/rke2/config.yaml
write-kubeconfig-mode: 644
cloud-provider-name: "rancher-vsphere"
Additional files
# /var/lib/rancher/rke2/server/manifests/vsphere-values-1.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rancher-vsphere-cpi
namespace: kube-system
spec:
valuesContent: |-
vCenter:
host: "aa.bb.ccc.dd"
datacenters: "Datacenter"
username: "username"
password: "password"
credentialsSecret:
generate: true
cloudControllerManager:
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rancher-vsphere-csi
namespace: kube-system
spec:
valuesContent: |-
vCenter:
host: "aa.bb.ccc.dd"
datacenters: "Datacenter"
username: "username"
password: "password"
clusterId: "maxtestcluster1"
configSecret:
generate: true
storageClass:
datastoreURL: "ds:///vmfs/volumes/redacted/"
csiController:
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
# /var/lib/rancher/rke2/server/manifests/vsphere-values-2.yaml
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rancher-vsphere-cpi
namespace: kube-system
spec:
valuesContent: |-
vCenter:
host: "aa.bb.ccc.dd"
datacenters: "Datacenter"
username: "username"
password: "password"
credentialsSecret:
generate: true
cloudControllerManager:
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rancher-vsphere-csi
namespace: kube-system
spec:
valuesContent: |-
vCenter:
host: "aa.bb.ccc.dd"
datacenters: "Datacenter"
username: "username"
password: "password"
clusterId: "maxtestcluster1"
configSecret:
generate: true
storageClass:
datastoreURL: "ds:///vmfs/volumes/redacted/"
csiController:
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
topology:
enabled: true
multiVcenterCsiTopology:
enabled: false
csiAuthCheck:
enabled: true
# pvcpod.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claim1
spec:
accessModes:
- ReadWriteOnce
storageClassName: vsphere-csi-sc
resources:
requests:
storage: 1Gi
---
apiVersion: "v1"
kind: "Pod"
metadata:
name: "basic"
labels:
name: "basic"
spec:
nodeSelector:
kubernetes.io/os: linux
containers:
- name: "basic"
image: ranchertest/mytestcontainer:unprivileged
ports:
- containerPort: 8080
name: "basic"
volumeMounts:
- mountPath: "/data"
name: "pvol"
volumes:
- name: "pvol"
persistentVolumeClaim:
claimName: "claim1"
Scenario 1:
kubectl apply -f pvcpod.yaml
k describe node | grep -i providerid
(expecting vsphere://<something>
)k logs -n kube-system -l app=vsphere-csi-controller --all-containers | grep -i multi-vcenter-csi-topology
(expecting nothing to return)k get cm -n kube-system internal-feature-states.csi.vsphere.vmware.com -o yaml | grep -i multi
(expecting multi-vcenter-csi-topology: "true"
)helm ls -A
(expecting rancher-vsphere-csi-3.1.2-rancher300
)Scenario 2:
v1.26.14+rke2r1
)rancher-vsphere-csi-3.1.2-rancher101
, respectivelyScenario 3:
k get cm -n kube-system internal-feature-states.csi.vsphere.vmware.com -o yaml
(expected set values to be true and false as set)Replication Results:
rke2 version used for replication: v1.26.14+rke2r1
Not using multi-vcenter-csi-topology
I wasn't able to reproduce the issue with the csi controller failures
Validation Results:
rke2 version v1.27.12+dev.191329a6 (191329a6e3faac429ec145523e8d84c1b3be81fe)
go version go1.21.8 X:boringcrypto
$ helm ls -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
rancher-vsphere-cpi kube-system 3 2024-04-16 17:00:35.432543064 +0000 UTC deployed rancher-vsphere-cpi-1.7.001 1.28.0
rancher-vsphere-csi kube-system 3 2024-04-16 17:00:35.775364332 +0000 UTC deployed rancher-vsphere-csi-3.1.2-rancher300 3.1.2-rancher3
$ k get cm -n kube-system internal-feature-states.csi.vsphere.vmware.com -o yaml | grep -i multi
multi-vcenter-csi-topology: "true"
$ k get cm -n kube-system internal-feature-states.csi.vsphere.vmware.com -o yaml
apiVersion: v1
data:
async-query-volume: "false"
block-volume-snapshot: "false"
cnsmgr-suspend-create-volume: "false"
csi-auth-check: "true"
csi-migration: "false"
csi-windows-support: "false"
improved-csi-idempotency: "false"
improved-volume-topology: "false"
list-volumes: "false"
max-pvscsi-targets-per-vm: "false"
multi-vcenter-csi-topology: "false"
online-volume-extend: "false"
pv-to-backingdiskobjectid-mapping: "false"
topology-preferential-datastores: "false"
trigger-csi-fullsync: "false"
use-csinode-id: "true"
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: rancher-vsphere-csi
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-16T20:27:40Z"
labels:
app.kubernetes.io/managed-by: Helm
name: internal-feature-states.csi.vsphere.vmware.com
namespace: kube-system
resourceVersion: "756"
uid: 48a3b1b9-7708-4af6-a66b-7166d5b0976c
Validated on commit bfc28051f45a8d8d2794bdffaf0501491094eecb
on release-1.27
that the chart version has been updated to rancher-vsphere-csi-3.1.2-rancher400
, and csiAuthCheck
is now defaulted to enabled: true
. See all defaults now below:
$ k get cm internal-feature-states.csi.vsphere.vmware.com -n kube-system -o yaml
apiVersion: v1
data:
async-query-volume: "false"
block-volume-snapshot: "false"
cnsmgr-suspend-create-volume: "false"
csi-auth-check: "true"
csi-migration: "false"
csi-windows-support: "false"
improved-csi-idempotency: "false"
improved-volume-topology: "false"
list-volumes: "false"
max-pvscsi-targets-per-vm: "false"
multi-vcenter-csi-topology: "true"
online-volume-extend: "false"
pv-to-backingdiskobjectid-mapping: "false"
topology-preferential-datastores: "false"
trigger-csi-fullsync: "false"
use-csinode-id: "true"
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: rancher-vsphere-csi
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-19T17:30:01Z"
labels:
app.kubernetes.io/managed-by: Helm
name: internal-feature-states.csi.vsphere.vmware.com
namespace: kube-system
resourceVersion: "622"
uid: 4a11cb86-0887-4b98-bd46-254c4c61ce8a
$ helm ls -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
rancher-vsphere-cpi kube-system 1 2024-04-19 17:30:00.635933258 +0000 UTC deployed rancher-vsphere-cpi-1.7.001 1.28.0
rancher-vsphere-csi kube-system 1 2024-04-19 17:29:59.835948691 +0000 UTC deployed rancher-vsphere-csi-3.1.2-rancher400 3.1.2-rancher4
Backport fix for Vsphere-csi updates to 3.1.2-rancher300
5765