Closed bnason closed 1 year ago
@bnason Can you describe the vsphere-csi-controller
pod running in vmware-system-csi
namespace and paste the output here? Also, can you also paste the output of the command kubectl logs <vsphere-csi-controller-pod-name> -n vmware-system-csi -c vsphere-syncer
from all your controller pods.
/assign
I've encountered this same issue. I see a vsphere-syncer
container within the vsphere-csi-controller-797d86f788-7lxgd
pods -- but those pods are all pending
❯ kubectl get pods -n vmware-system-csi
NAME READY STATUS RESTARTS AGE
vsphere-csi-controller-797d86f788-7lxgd 0/7 Pending 0 19m
vsphere-csi-controller-797d86f788-gfc5d 0/7 Pending 0 19m
vsphere-csi-controller-797d86f788-n9vzb 0/7 Pending 0 19m
vsphere-csi-node-bm76f 2/3 CrashLoopBackOff 6 (56s ago) 7m29s
vsphere-csi-node-grs57 2/3 CrashLoopBackOff 6 (45s ago) 7m29s
in my case they are pending due to
│ Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Warning FailedScheduling 9m26s default-scheduler 0/2 nodes are available: 2 node(s) didn't match Pod's node affinity/selector. │
│ Warning FailedScheduling 9m55s (x10 over 19m) default-scheduler 0/2 nodes are available: 2 node(s) didn't match Pod's node affinity/selector. │
│ Warning FailedScheduling 82s (x7 over 8m22s) default-scheduler 0/2 nodes are available: 2 node(s) didn't match Pod's node affinity/selector. │
Seems the issue is somewhere in here
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- vsphere-csi-controller
topologyKey: "kubernetes.io/hostname"
serviceAccountName: vsphere-csi-controller
nodeSelector:
node-role.kubernetes.io/master: ""
current taints on nodes appear correct, so node-selectors should work
❯ kubectl describe nodes | egrep "Taints:"
Taints: node-role.kubernetes.io/master:NoSchedule
Taints: <none>
does it have anything to do with the replicas: 3
when there's only 1 master node for the vsphere-csi-controller
deployment?
related commit https://github.com/kubernetes-sigs/vsphere-csi-driver/commit/01ec59a33ad257911e600c4646ffe822e5aa318a
Looking over my node's LABELS, i noticed that the node-selector actually didn't match any of the node's labels
❯ kubectl describe nodes | egrep "Labels:" -A 10
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=vsphere-vm.cpu-2.mem-4gb.os-ubuntu
beta.kubernetes.io/os=linux
juju-application=kubernetes-control-plane
kubernetes.io/arch=amd64
kubernetes.io/hostname=juju-ab364b-0
kubernetes.io/os=linux
Annotations: alpha.kubernetes.io/provided-node-ip: 10.246.154.99
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 22 Mar 2022 12:37:53 -0500
--
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-4gb.os-ubuntu
beta.kubernetes.io/os=linux
juju-application=kubernetes-worker
kubernetes.io/arch=amd64
kubernetes.io/hostname=juju-ab364b-1
kubernetes.io/os=linux
Annotations: alpha.kubernetes.io/provided-node-ip: 10.246.154.111
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 22 Mar 2022 12:38:38 -0500
by labeling the node appropriately, i was able to get the deployment running
However, i'm still having an issue with the driver not being able to identify the nodes:
Couldn't find VM instance with nodeUUID 0c712942-5d15-0d37-83c0-6120d7ca04c5, failed to discover with err: virtual machine wasn't found
...
Couldn't find VM instance with nodeUUID 98512942-a570-bd66-20a2-e1d93935175f, failed to discover with err: virtual machine wasn't found
and the vsphere-csi-node-*
pods are in crashloopbackoff
│ I0322 21:47:06.374636 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to retrieve topology information for Node: "juju-ab364b-1". Error: "failed to retrieve nodeVM \"98512942-a570-bd66-20a2-e1d93935175f\" using the node manager. Error: virtual machine wasn't │
│ found",}
So, i'm still stumped
@bnason Can you describe the
vsphere-csi-controller
pod running invmware-system-csi
namespace and paste the output here? Also, can you also paste the output of the commandkubectl logs <vsphere-csi-controller-pod-name> -n vmware-system-csi -c vsphere-syncer
from all your controller pods.
Hello,
I have a similar issue.
Container node-driver-registrar
in vsphere-csi-node
daemonset fails with error:
I0330 12:57:56.449872 1 main.go:118] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to create CSINodeTopology CR. Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0330 12:57:56.449909 1 main.go:120] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to create CSINodeTopology CR. Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
Pod vsphere-csi-controller
describe:
Name: vsphere-csi-controller-76b656958d-gbkc7
Namespace: vmware-system-csi
Priority: 0
Node: k8s-master/192.168.134.8
Start Time: Wed, 30 Mar 2022 14:56:45 +0200
Labels: app=vsphere-csi-controller
pod-template-hash=76b656958d
role=vsphere-csi
Annotations: cni.projectcalico.org/containerID: ab0134062326ec1b46ffbccf83b01d2853a13b99fd5cbbc021e0d3f59bbbd370
cni.projectcalico.org/podIP: 10.233.92.7/32
cni.projectcalico.org/podIPs: 10.233.92.7/32
Status: Running
IP: 10.233.92.7
IPs:
IP: 10.233.92.7
Controlled By: ReplicaSet/vsphere-csi-controller-76b656958d
Containers:
csi-attacher:
Container ID: docker://c52985f21a7d2530319093f2b7727dcde11ff274167d757de0abfba8e5b6d02e
Image: k8s.gcr.io/sig-storage/csi-attacher:v3.3.0
Image ID: docker-pullable://k8s.gcr.io/sig-storage/csi-attacher@sha256:80dec81b679a733fda448be92a2331150d99095947d04003ecff3dbd7f2a476a
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--csi-address=$(ADDRESS)
--leader-election
--kube-api-qps=100
--kube-api-burst=100
State: Running
Started: Wed, 30 Mar 2022 14:56:46 +0200
Ready: True
Restart Count: 0
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4w2fd (ro)
vsphere-csi-controller:
Container ID: docker://2c8f13ea1df0e3c18b39afaa043db404bab0d1f2ca114bb9b4651dd5e7dfe437
Image: gcr.io/cloud-provider-vsphere/csi/release/driver:v2.4.0
Image ID: docker-pullable://gcr.io/cloud-provider-vsphere/csi/release/driver@sha256:ff865128421c8e248675814798582d72c35f7f77d8c7450ac4f00429b5281514
Ports: 9808/TCP, 2112/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 30 Mar 2022 14:59:47 +0200
Finished: Wed, 30 Mar 2022 14:59:47 +0200
Ready: False
Restart Count: 5
Liveness: http-get http://:healthz/healthz delay=10s timeout=3s period=5s #success=1 #failure=3
Environment:
CSI_ENDPOINT: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
X_CSI_MODE: controller
X_CSI_SPEC_DISABLE_LEN_CHECK: true
X_CSI_SERIAL_VOL_ACCESS_TIMEOUT: 3m
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
LOGGER_LEVEL: PRODUCTION
Mounts:
/etc/cloud from vsphere-config-volume (ro)
/var/lib/csi/sockets/pluginproxy from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4w2fd (ro)
liveness-probe:
Container ID: docker://24f8bebe747403fd7c66418758ffbb1a79f3807b3e94ac2e0375723028b3e4d9
Image: k8s.gcr.io/sig-storage/livenessprobe:v2.4.0
Image ID: docker-pullable://k8s.gcr.io/sig-storage/livenessprobe@sha256:529be2c9770add0cdd0c989115222ea9fc1be430c11095eb9f6dafcf98a36e2b
Port: <none>
Host Port: <none>
Args:
--v=4
--csi-address=$(ADDRESS)
State: Running
Started: Wed, 30 Mar 2022 14:56:46 +0200
Ready: True
Restart Count: 0
Environment:
ADDRESS: /var/lib/csi/sockets/pluginproxy/csi.sock
Mounts:
/var/lib/csi/sockets/pluginproxy from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4w2fd (ro)
vsphere-syncer:
Container ID: docker://51d9a45cdae188d4e9189a16fcb4fba42502712223901677846f3b1acd9247da
Image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v2.4.0
Image ID: docker-pullable://gcr.io/cloud-provider-vsphere/csi/release/syncer@sha256:b6da4448adf8cc2eb363198748d9f26d17acb500a80c53629379d83cecb6cefe
Port: 2113/TCP
Host Port: 0/TCP
Args:
--leader-election
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 30 Mar 2022 15:00:13 +0200
Finished: Wed, 30 Mar 2022 15:00:13 +0200
Ready: False
Restart Count: 5
Environment:
FULL_SYNC_INTERVAL_MINUTES: 30
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
LOGGER_LEVEL: PRODUCTION
Mounts:
/etc/cloud from vsphere-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4w2fd (ro)
csi-provisioner:
Container ID: docker://a8b952ba5cb26104ff604218383cd1ba62751b11288afafb6204b608c47b2814
Image: k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
Image ID: docker-pullable://k8s.gcr.io/sig-storage/csi-provisioner@sha256:6477988532358148d2e98f7c747db4e9250bbc7ad2664bf666348abf9ee1f5aa
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--csi-address=$(ADDRESS)
--leader-election
--default-fstype=ext4
--kube-api-qps=100
--kube-api-burst=100
State: Running
Started: Wed, 30 Mar 2022 14:56:47 +0200
Ready: True
Restart Count: 0
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-4w2fd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
vsphere-config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: vsphere-config-secret
Optional: false
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-4w2fd:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/control-plane=
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m34s default-scheduler 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 node(s) didn't match Pod's node affinity/selector.
Normal Scheduled 4m5s default-scheduler Successfully assigned vmware-system-csi/vsphere-csi-controller-76b656958d-gbkc7 to k8s-master
Warning FailedScheduling 4m36s default-scheduler 0/4 nodes are available: 1 node(s) didn't match pod anti-affinity rules, 3 node(s) didn't match Pod's node affinity/selector.
Normal Created 4m4s kubelet Created container liveness-probe
Normal Created 4m4s kubelet Created container csi-attacher
Normal Started 4m4s kubelet Started container csi-attacher
Normal Created 4m4s kubelet Created container vsphere-syncer
Normal Pulled 4m4s kubelet Container image "gcr.io/cloud-provider-vsphere/csi/release/syncer:v2.4.0" already present on machine
Normal Started 4m4s kubelet Started container liveness-probe
Normal Pulled 4m4s kubelet Container image "k8s.gcr.io/sig-storage/csi-attacher:v3.3.0" already present on machine
Normal Pulled 4m4s kubelet Container image "k8s.gcr.io/sig-storage/livenessprobe:v2.4.0" already present on machine
Normal Started 4m3s kubelet Started container csi-provisioner
Normal Started 4m3s kubelet Started container vsphere-syncer
Normal Pulled 4m3s kubelet Container image "k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0" already present on machine
Normal Created 4m3s kubelet Created container csi-provisioner
Normal Created 3m46s (x3 over 4m4s) kubelet Created container vsphere-csi-controller
Normal Pulled 3m46s (x3 over 4m4s) kubelet Container image "gcr.io/cloud-provider-vsphere/csi/release/driver:v2.4.0" already present on machine
Normal Started 3m45s (x3 over 4m4s) kubelet Started container vsphere-csi-controller
Warning BackOff 3m45s (x4 over 4m2s) kubelet Back-off restarting failed container
Pod vsphere-syncer
logs:
{"level":"error","time":"2022-03-30T13:08:05.324540426Z","caller":"kubernetes/kubernetes.go:430","msg":"Failed to update \"csinodetopologies.cns.vmware.com\" CRD with err: resource name may not be empty","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/kubernetes.createCustomResourceDefinition\n\t/build/pkg/kubernetes/kubernetes.go:430\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/kubernetes.CreateCustomResourceDefinitionFromManifest\n\t/build/pkg/kubernetes/kubernetes.go:392\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/syncer/cnsoperator/manager.InitCnsOperator\n\t/build/pkg/syncer/cnsoperator/manager/init.go:168\nmain.initSyncerComponents.func1.2\n\t/build/cmd/syncer/main.go:180"}
{"level":"error","time":"2022-03-30T13:08:05.324585513Z","caller":"manager/init.go:171","msg":"Failed to create \"csinodetopology\" CRD. Error: resource name may not be empty","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/syncer/cnsoperator/manager.InitCnsOperator\n\t/build/pkg/syncer/cnsoperator/manager/init.go:171\nmain.initSyncerComponents.func1.2\n\t/build/cmd/syncer/main.go:180"}
{"level":"error","time":"2022-03-30T13:08:05.324607236Z","caller":"syncer/main.go:181","msg":"Error initializing Cns Operator. Error: resource name may not be empty","stacktrace":"main.initSyncerComponents.func1.2\n\t/build/cmd/syncer/main.go:181"}
@tandrez what is the log output of container vsphere-csi-controller?
@tandrez what is the log output of container vsphere-csi-controller?
{"level":"info","time":"2022-03-30T13:43:19.131765092Z","caller":"logger/logger.go:41","msg":"Setting default log level to :\"PRODUCTION\""}
{"level":"info","time":"2022-03-30T13:43:19.131881884Z","caller":"vsphere-csi/main.go:56","msg":"Version : v2.4.0","TraceId":"7415a968-2dd7-48b3-898f-7a55ec261e24"}
{"level":"info","time":"2022-03-30T13:43:19.13190827Z","caller":"commonco/utils.go:56","msg":"Defaulting feature states configmap name to \"internal-feature-states.csi.vsphere.vmware.com\"","TraceId":"7415a968-2dd7-48b3-898f-7a55ec261e24"}
{"level":"info","time":"2022-03-30T13:43:19.131925624Z","caller":"commonco/utils.go:60","msg":"Defaulting feature states configmap namespace to \"vmware-system-csi\"","TraceId":"7415a968-2dd7-48b3-898f-7a55ec261e24"}
{"level":"info","time":"2022-03-30T13:43:19.132307995Z","caller":"logger/logger.go:41","msg":"Setting default log level to :\"PRODUCTION\""}
{"level":"info","time":"2022-03-30T13:43:19.132533752Z","caller":"k8sorchestrator/k8sorchestrator.go:152","msg":"Initializing k8sOrchestratorInstance","TraceId":"761c3004-7437-473a-a499-bbdc4cfe778a"}
{"level":"info","time":"2022-03-30T13:43:19.13256854Z","caller":"kubernetes/kubernetes.go:85","msg":"k8s client using in-cluster config","TraceId":"761c3004-7437-473a-a499-bbdc4cfe778a"}
{"level":"info","time":"2022-03-30T13:43:19.133942631Z","caller":"kubernetes/kubernetes.go:352","msg":"Setting client QPS to 100.000000 and Burst to 100.","TraceId":"761c3004-7437-473a-a499-bbdc4cfe778a"}
{"level":"info","time":"2022-03-30T13:43:19.149838052Z","caller":"k8sorchestrator/k8sorchestrator.go:258","msg":"New internal feature states values stored successfully: map[async-query-volume:true block-volume-snapshot:false csi-auth-check:true csi-migration:false csi-windows-support:false improved-csi-idempotency:true improved-volume-topology:true online-volume-extend:true trigger-csi-fullsync:false]","TraceId":"761c3004-7437-473a-a499-bbdc4cfe778a"}
{"level":"info","time":"2022-03-30T13:43:19.149926732Z","caller":"k8sorchestrator/k8sorchestrator.go:178","msg":"k8sOrchestratorInstance initialized","TraceId":"761c3004-7437-473a-a499-bbdc4cfe778a"}
{"level":"info","time":"2022-03-30T13:43:19.153485356Z","caller":"config/config.go:339","msg":"No Net Permissions given in Config. Using default permissions.","TraceId":"761c3004-7437-473a-a499-bbdc4cfe778a"}
{"level":"info","time":"2022-03-30T13:43:19.153592437Z","caller":"vanilla/controller.go:84","msg":"Initializing CNS controller","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"info","time":"2022-03-30T13:43:19.154030896Z","caller":"vsphere/utils.go:163","msg":"Defaulting timeout for vCenter Client to 5 minutes","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"info","time":"2022-03-30T13:43:19.154068346Z","caller":"vsphere/virtualcentermanager.go:73","msg":"Initializing defaultVirtualCenterManager...","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"info","time":"2022-03-30T13:43:19.154084005Z","caller":"vsphere/virtualcentermanager.go:75","msg":"Successfully initialized defaultVirtualCenterManager","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"info","time":"2022-03-30T13:43:19.15409378Z","caller":"vsphere/virtualcentermanager.go:121","msg":"Successfully registered VC \"vcenter.local\"","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"info","time":"2022-03-30T13:43:19.154110471Z","caller":"vanilla/controller.go:102","msg":"CSI Volume manager idempotency handling feature flag is enabled.","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"info","time":"2022-03-30T13:43:19.154125191Z","caller":"cnsvolumeoperationrequest/cnsvolumeoperationrequest.go:80","msg":"Creating CnsVolumeOperationRequest definition on API server and initializing VolumeOperationRequest instance","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"info","time":"2022-03-30T13:43:19.153980969Z","caller":"k8sorchestrator/k8sorchestrator.go:484","msg":"configMapAdded: Internal feature state values from \"internal-feature-states.csi.vsphere.vmware.com\" stored successfully: map[async-query-volume:true block-volume-snapshot:false csi-auth-check:true csi-migration:false csi-windows-support:false improved-csi-idempotency:true improved-volume-topology:true online-volume-extend:true trigger-csi-fullsync:false]","TraceId":"bfe17697-8d9a-4288-bac7-788250fcd51f"}
{"level":"info","time":"2022-03-30T13:43:19.156870831Z","caller":"kubernetes/kubernetes.go:85","msg":"k8s client using in-cluster config","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"info","time":"2022-03-30T13:43:19.156994064Z","caller":"kubernetes/kubernetes.go:352","msg":"Setting client QPS to 100.000000 and Burst to 100.","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6"}
{"level":"error","time":"2022-03-30T13:43:19.158244627Z","caller":"kubernetes/kubernetes.go:430","msg":"Failed to update \"cnsvolumeoperationrequests.cns.vmware.com\" CRD with err: resource name may not be empty","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/kubernetes.createCustomResourceDefinition\n\t/build/pkg/kubernetes/kubernetes.go:430\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/kubernetes.CreateCustomResourceDefinitionFromManifest\n\t/build/pkg/kubernetes/kubernetes.go:392\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/internalapis/cnsvolumeoperationrequest.InitVolumeOperationRequestInterface\n\t/build/pkg/internalapis/cnsvolumeoperationrequest/cnsvolumeoperationrequest.go:83\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:103\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:142\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve.func1\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:246\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:59\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:211\ngithub.com/rexray/gocsi.Run\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:130\nmain.main\n\t/build/cmd/vsphere-csi/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
{"level":"error","time":"2022-03-30T13:43:19.158319466Z","caller":"cnsvolumeoperationrequest/cnsvolumeoperationrequest.go:87","msg":"failed to create CnsVolumeOperationRequest CRD with error: resource name may not be empty","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/internalapis/cnsvolumeoperationrequest.InitVolumeOperationRequestInterface\n\t/build/pkg/internalapis/cnsvolumeoperationrequest/cnsvolumeoperationrequest.go:87\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:103\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:142\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve.func1\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:246\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:59\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:211\ngithub.com/rexray/gocsi.Run\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:130\nmain.main\n\t/build/cmd/vsphere-csi/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
{"level":"error","time":"2022-03-30T13:43:19.158357616Z","caller":"vanilla/controller.go:109","msg":"failed to initialize VolumeOperationRequestInterface with error: resource name may not be empty","TraceId":"3a6a7955-76a3-46a1-ad85-55c98c4aabe6","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:109\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:142\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve.func1\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:246\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:59\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:211\ngithub.com/rexray/gocsi.Run\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:130\nmain.main\n\t/build/cmd/vsphere-csi/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
{"level":"error","time":"2022-03-30T13:43:19.158394961Z","caller":"service/driver.go:143","msg":"failed to init controller. Error: resource name may not be empty","TraceId":"761c3004-7437-473a-a499-bbdc4cfe778a","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:143\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve.func1\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:246\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:68\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:59\ngithub.com/rexray/gocsi.(*StoragePlugin).Serve\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:211\ngithub.com/rexray/gocsi.Run\n\t/go/pkg/mod/github.com/rexray/gocsi@v1.2.2/gocsi.go:130\nmain.main\n\t/build/cmd/vsphere-csi/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"}
{"level":"info","time":"2022-03-30T13:43:19.158431473Z","caller":"service/driver.go:109","msg":"Configured: \"csi.vsphere.vmware.com\" with clusterFlavor: \"VANILLA\" and mode: \"controller\"","TraceId":"761c3004-7437-473a-a499-bbdc4cfe778a"}
time="2022-03-30T13:43:19Z" level=info msg="removed sock file" path=/var/lib/csi/sockets/pluginproxy/csi.sock
time="2022-03-30T13:43:19Z" level=fatal msg="grpc failed" error="resource name may not be empty"
@tandrez show me your vsphere-config-secret (csi-vsphere.conf)
@tandrez show me your vsphere-config-secret (csi-vsphere.conf)
[Global]
cluster-id = "kubernetes-cluster-id"
[VirtualCenter "vcenter.local"]
insecure-flag = "true"
user = "someuser@vsphere.local"
password = "somepassword"
port = "443"
datacenters = "DC1"
I'm new to this plugin also.
What i did: I tried to list all nodes and datastores with https://github.com/vmware/govmomi/blob/master/govc/USAGE.md. Try if you can do it same. I used AD accounts and username didn't work like this. Eventually i had use usename in DOMAIN\user format. But what i discovered was that if something is missing in conf, it gave similar errors.
Are you following this manual to create secret? https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-BFF39F1D-F70A-4360-ABC9-85BDAFBE8864.html
I can list objects with govc
without any problem.
I am deploying with Kubespray but I also checked the VMware docs and as far as I can tell, the configuration seems OK.
So for my setup, this issue was caused by the vSphere CPI not working correctly and thus not untainting the nodes which never allowed the csi pods to run and I believe one of them is responsible for creating the CRD.
My CPI issue is documented here: https://github.com/kubernetes/cloud-provider-vsphere/issues/614
does it have anything to do with the
replicas: 3
when there's only 1 master node for thevsphere-csi-controller
deployment?related commit 01ec59a
@addyess Yes you are supposed to update the replica count in CSI controller deployment to the number of master nodes in your environment. After doing this, the controller pods will start running. The CSI nodes depend on the syncer container in the controller pod to fetch certain information about the underlying environment. The nodes will not come up till the controller provides this information to them.
However, i'm still having an issue with the driver not being able to identify the nodes:
Couldn't find VM instance with nodeUUID 0c712942-5d15-0d37-83c0-6120d7ca04c5, failed to discover with err: virtual machine wasn't found ... Couldn't find VM instance with nodeUUID 98512942-a570-bd66-20a2-e1d93935175f, failed to discover with err: virtual machine wasn't found
and the
vsphere-csi-node-*
pods are in crashloopbackoff│ I0322 21:47:06.374636 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to retrieve topology information for Node: "juju-ab364b-1". Error: "failed to retrieve nodeVM \"98512942-a570-bd66-20a2-e1d93935175f\" using the node manager. Error: virtual machine wasn't │ │ found",}
So, i'm still stumped
Which version of vSphere CSI are you running? You need not have to set the node labels manually. If the cloud provider you are using is doing its job correctly, it will set the required labels and remove the NoSchedule taint on the nodes. Can you confirm if this is working?
Enabling "CSI full sync" causes this for me, disabling it and everything starts to work again
Enabling Enable Improved Volume Topology
causes this error for us. Removing the selection and redeploying brings it online and stable.
@shalini-b I think i'm making forward progress, my current deployment succeeds, but fails to launch all the containers in the daemonset:
each deamonset container vmware-system-csi/vsphere-csi-node-2pdr4:node-driver-registrar presents a similar log description:
I0518 18:00:18.524418 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to retrieve topology information for Node: "juju-78116a-5". Error: "failed to retrieve nodeVM \"f3ef2942-6724-aff1-6ddb-cea417d0f5aa\" using the node manager. Error: virtual machine wasn't found",}
E0518 18:00:18.524476 1 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to retrieve topology information for Node: "juju-78116a-5". Error: "failed to retrieve nodeVM \"f3ef2942-6724-aff1-6ddb-cea417d0f5aa\" using the node manager. Error: virtual machine wasn't found", restarting registration container.
the machine's UUID:
ubuntu@juju-78116a-5:~$ sudo dmidecode | grep UUID
UUID: 4229eff3-2467-f1af-6ddb-cea417d0f5aa
the machines provider-id:
Name: juju-78116a-5
ProviderID: vsphere://4229eff3-2467-f1af-6ddb-cea417d0f5aa
both match. But the CSI-node driver is seems to swap the bytes around?
4229eff3-2467-f1af-6ddb-cea417d0f5aa # from provider-id and dmidecode
f3ef2942-6724-aff1-6ddb-cea417d0f5aa # from container logs
if i reverse the first 12 bytes, they match
AABBCCDD-EEFF-GGHH-IIJJ-KKLLMMNNOOPP
DDCCBBAA-FFEE-HHGG-IIJJ-KKLLMMNNOOPP
edit: i think this is https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1629, updating to v2.5.1 images final-edit: cool. I can make PVCs! way to go
Enabling
Enable Improved Volume Topology
causes this error for us. Removing the selection and redeploying brings it online and stable.
Please note that disabling an already enabled feature in vSphere CSI driver is not supported. Kindly root cause the actual issue.
I am also facing similar issue...anyone can help
ivsphere-csi-controller-76b656958d-7rftq 3/5 CrashLoopBackOff 8 (23s ago) 3m12s vsphere-csi-controller-76b656958d-kfwqs 2/5 CrashLoopBackOff 21 (47s ago) 12m vsphere-csi-controller-76b656958d-lmcg6 0/5 ContainerCreating 0 3m12s vsphere-csi-node-4tvs8 2/3 CrashLoopBackOff 7 (48s ago) 12m vsphere-csi-node-5st28 2/3 CrashLoopBackOff 7 (2m54s ago) 14m vsphere-csi-node-8pcg9 2/3 CrashLoopBackOff 7 (3m9s ago) 14m vsphere-csi-node-fmfjt 2/3 CrashLoopBackOff 7 (53s ago) 12m vsphere-csi-node-h229j 2/3 CrashLoopBackOff 7 (52s ago) 12m vsphere-csi-node-l5fls 2/3 CrashLoopBackOff 7 (47s ago) 12m vsphere-csi-node-mlskw 2/3 CrashLoopBackOff 7 (46s ago) 12m vsphere-csi-node-xp97z 2/3 CrashLoopBackOff 7 (47s ago) 12m
@shalini-b
Enabling
Enable Improved Volume Topology
causes this error for us. Removing the selection and redeploying brings it online and stable.Please note that disabling an already enabled feature in vSphere CSI driver is not supported. Kindly root cause the actual issue.
Without that disabled it will not successfully deploy. So there is no way to disable a feature for a already deployed driver, when the driver can't deploy in the first place... Hence the use of the word redeploy
, and not reconfigure
If you want to fill me in on where to look to find the root cause for your driver failing when that is enabled, I'd be more than happy to.
Enable Improved Volume Topology How to disable it ?? I am not able to find it in kubespray code KIndly find my node-driver-registrar error
kubectl logs vsphere-csi-node-4gfr9 -c node-driver-registrar
I0527 19:19:56.618396 1 main.go:166] Version: v2.4.0
I0527 19:19:56.618456 1 main.go:167] Running node-driver-registrar in mode=registration
I0527 19:19:56.619961 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0527 19:19:56.619997 1 connection.go:154] Connecting to unix:///csi/csi.sock
I0527 19:19:56.620597 1 main.go:198] Calling CSI driver to discover driver name
I0527 19:19:56.620660 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0527 19:19:56.620670 1 connection.go:184] GRPC request: {}
I0527 19:19:56.625349 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"${VERSION}"}
I0527 19:19:56.625437 1 connection.go:187] GRPC error:
@shalini-b
Enabling
Enable Improved Volume Topology
causes this error for us. Removing the selection and redeploying brings it online and stable.Please note that disabling an already enabled feature in vSphere CSI driver is not supported. Kindly root cause the actual issue.
Without that disabled it will not successfully deploy. So there is no way to disable a feature for a already deployed driver, when the driver can't deploy in the first place... Hence the use of the word
redeploy
, and notreconfigure
If you want to fill me in on where to look to find the root cause for your driver failing when that is enabled, I'd be more than happy to.
Which driver version are you using? Kindly use v2.4.1+ to get rid of this issue. If you are still unable to deploy the driver, collect the following logs: For each vSphere CSI controller pod running in your env:
kubectl logs <controller-pod-name> -c vsphere-syncer -n vmware-system-csi
For any of the vSphere CSI nodes in a CrashLoopBackOff:
kubectl logs <node-pod-name> -c vsphere-csi-node -n vmware-system-csi
We ran into a similar issue with vsphere-csi-driver v2.5.2
The root cause was the cloud account used did not have enough permissions to read zone info from vcenter. CPI tried to read tags for setting failure domain labels on the node and ended up failing because of not having enough permissions. When this happened, we saw the exact error
I0824 19:54:27.091878 1 main.go:166] Version: v2.5.0
I0824 19:54:27.091907 1 main.go:167] Running node-driver-registrar in mode=registration
I0824 19:54:27.092263 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0824 19:54:27.092288 1 connection.go:154] Connecting to unix:///csi/csi.sock
I0824 19:54:27.092662 1 main.go:198] Calling CSI driver to discover driver name
I0824 19:54:27.092682 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0824 19:54:27.092686 1 connection.go:184] GRPC request: {}
I0824 19:54:27.094514 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v2.5.2"}
I0824 19:54:27.094581 1 connection.go:187] GRPC error: <nil>
I0824 19:54:27.094587 1 main.go:208] CSI driver name: "csi.vsphere.vmware.com"
I0824 19:54:27.094639 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock
I0824 19:54:27.094799 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock
I0824 19:54:27.094859 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0824 19:54:28.397818 1 main.go:102] Received GetInfo call: &InfoRequest{}
I0824 19:54:28.397991 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I0824 19:54:28.413900 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "res-cal-p8-cp-np6cq". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0824 19:54:28.413942 1 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "res-cal-p8-cp-np6cq". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
vsphere-cloud-controller-manager had the below error
I0824 19:55:12.021205 1 node_controller.go:390] Initializing node res-cal-p8-cp-np6cq with cloud provider
I0824 19:55:12.021237 1 instances.go:113] instances.InstanceID() CACHED with res-cal-p8-cp-np6cq
I0824 19:55:12.021244 1 instances.go:83] instances.NodeAddressesByProviderID() CACHED with 42274b7c-bfd9-960e-1c91-7f0fa5b014c6
E0824 19:55:12.185474 1 zones.go:195] Failed to get host system properties. err: NoPermission
E0824 19:55:12.205266 1 zones.go:124] Failed to get host system properties. err: NoPermission
E0824 19:55:12.205304 1 node_controller.go:212] error syncing 'res-cal-p8-cp-np6cq': failed to get instance metadata for node res-cal-p8-cp-np6cq: failed to get zone from cloud provider: Zone: Error fetching by providerID: NoPermission Error fetching by NodeName: NoPermission, requeuing
If you don't need the feature you may set improved-volume-topology: 'false'
in ConfigMaps/internal-feature-states.csi.vsphere.vmware.com
. Otherwise this can fail for multiple reasons (e.g. as pointed out because of missing permissions in vCenter). Simply disabling the feature we didn't want to use fixed the issue for us. It seems as it is enabled by default in more recent vSphere CSI releases.
I'm not sure why this is needed since the manifest still has the comments args you'd need to enable toplogy awareness. The new feature gates are not very well documented.
I can confirm that I've hit this problem also (new deployment, vsphere 7.0u3, k3s v1.24.4+k3s1)
as mentioned here, and in https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1948 the default setting of improved-volume-topology: 'true'
in vsphere-csi-driver.yaml
seems to be the cause, and changing it to false
allows the pods to deploy.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
as mentioned here, and in #1948 the default setting of
improved-volume-topology: 'true'
in vsphere-csi-driver.yamlseems to be the cause, and changing it to
false
allows the pods to deploy.
I can confirm it fixes the problem for me too:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
Same with me issue with CSI ..
root@k8s-master01:/etc/kubernetes# kubectl get pods -n vmware-system-csi NAME READY STATUS RESTARTS AGE vsphere-csi-controller-5664789fc9-gg4wn 6/7 CrashLoopBackOff 1 (4s ago) 17s vsphere-csi-controller-5664789fc9-mz5b8 6/7 CrashLoopBackOff 1 (3s ago) 17s vsphere-csi-controller-5664789fc9-q47jz 6/7 CrashLoopBackOff 1 (3s ago) 17s vsphere-csi-node-7rqdd 2/3 CrashLoopBackOff 1 (6s ago) 17s vsphere-csi-node-rk497 2/3 CrashLoopBackOff 1 (7s ago) 17s vsphere-csi-node-rl2ws 2/3 Error 1 (10s ago) 17s vsphere-csi-node-vm8tw 2/3 CrashLoopBackOff 1 (7s ago) 17s vsphere-csi-node-vv9x5 2/3 CrashLoopBackOff 1 (7s ago) 17s vsphere-csi-node-wbmcp 2/3 CrashLoopBackOff 1 (9s ago) 17s vsphere-csi-node-wkk27 2/3 CrashLoopBackOff 1 (9s ago) 17s root@k8s-master01:/etc/kubernetes# kubectl logs vsphere-csi-controller-5664789fc9-mz5b8 -n vmware-system-csi -c vsphere-syncer {"level":"info","time":"2023-03-19T22:18:04.163624732Z","caller":"logger/logger.go:41","msg":"Setting default log level to :\"PRODUCTION\""} {"level":"info","time":"2023-03-19T22:18:04.164377027Z","caller":"syncer/main.go:76","msg":"Version : v2.7.0","TraceId":"29e0f1e2-c4a5-4d43-85b3-1bbc9c441343"} {"level":"info","time":"2023-03-19T22:18:04.164637218Z","caller":"syncer/main.go:93","msg":"Starting container with operation mode: METADATA_SYNC","TraceId":"29e0f1e2-c4a5-4d43-85b3-1bbc9c441343"} {"level":"info","time":"2023-03-19T22:18:04.164802222Z","caller":"kubernetes/kubernetes.go:85","msg":"k8s client using in-cluster config","TraceId":"29e0f1e2-c4a5-4d43-85b3-1bbc9c441343"} {"level":"info","time":"2023-03-19T22:18:04.165106234Z","caller":"syncer/main.go:115","msg":"Starting the http server to expose Prometheus metrics..","TraceId":"29e0f1e2-c4a5-4d43-85b3-1bbc9c441343"} {"level":"info","time":"2023-03-19T22:18:04.165457343Z","caller":"kubernetes/kubernetes.go:389","msg":"Setting client QPS to 100.000000 and Burst to 100.","TraceId":"29e0f1e2-c4a5-4d43-85b3-1bbc9c441343"} I0319 22:18:04.168818 1 leaderelection.go:248] attempting to acquire leader lease vmware-system-csi/vsphere-syncer... I0319 22:18:04.198090 1 leaderelection.go:258] successfully acquired lease vmware-system-csi/vsphere-syncer {"level":"error","time":"2023-03-19T22:18:04.201127391Z","caller":"config/config.go:459","msg":"error while reading config file: 1:1: illegal character U+0024 '$'","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/common/config.ReadConfig\n\t/build/pkg/common/config/config.go:459\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/common/config.GetCnsconfig\n\t/build/pkg/common/config/config.go:489\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.GetConfig\n\t/build/pkg/csi/service/common/util.go:281\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.InitConfigInfo\n\t/build/pkg/csi/service/common/util.go:292\nmain.initSyncerComponents.func1\n\t/build/cmd/syncer/main.go:161\ngithub.com/kubernetes-csi/csi-lib-utils/leaderelection.(leaderElection).Run.func1\n\t/go/pkg/mod/github.com/kubernetes-csi/csi-lib-utils@v0.11.0/leaderelection/leader_election.go:179"} {"level":"error","time":"2023-03-19T22:18:04.201572149Z","caller":"config/config.go:491","msg":"failed to parse config. Err: 1:1: illegal character U+0024 '$'","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/common/config.GetCnsconfig\n\t/build/pkg/common/config/config.go:491\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.GetConfig\n\t/build/pkg/csi/service/common/util.go:281\nsigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.InitConfigInfo\n\t/build/pkg/csi/service/common/util.go:292\nmain.initSyncerComponents.func1\n\t/build/cmd/syncer/main.go:161\ngithub.com/kubernetes-csi/csi-lib-utils/leaderelection.(leaderElection).Run.func1\n\t/go/pkg/mod/github.com/kubernetes-csi/csi-lib-utils@v0.11.0/leaderelection/leader_election.go:179"} {"level":"error","time":"2023-03-19T22:18:04.201844658Z","caller":"common/util.go:294","msg":"failed to read config. Error: 1:1: illegal character U+0024 '$'","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common.InitConfigInfo\n\t/build/pkg/csi/service/common/util.go:294\nmain.initSyncerComponents.func1\n\t/build/cmd/syncer/main.go:161\ngithub.com/kubernetes-csi/csi-lib-utils/leaderelection.(leaderElection).Run.func1\n\t/go/pkg/mod/github.com/kubernetes-csi/csi-lib-utils@v0.11.0/leaderelection/leader_election.go:179"} {"level":"error","time":"2023-03-19T22:18:04.202088917Z","caller":"syncer/main.go:163","msg":"failed to initialize the configInfo. Err: 1:1: illegal character U+0024 '$'","stacktrace":"main.initSyncerComponents.func1\n\t/build/cmd/syncer/main.go:163\ngithub.com/kubernetes-csi/csi-lib-utils/leaderelection.(leaderElection).Run.func1\n\t/go/pkg/mod/github.com/kubernetes-csi/csi-lib-utils@v0.11.0/leaderele
I got the similar error in my k3s cluster.
vSphere version: 7.0.3
lei@leik3svSphere01:~$ sudo kubectl logs -n vmware-system-csi vsphere-csi-node-mrrxx
Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I0509 12:13:58.557564 1 main.go:167] Version: v2.7.0
I0509 12:13:58.557609 1 main.go:168] Running node-driver-registrar in mode=registration
I0509 12:13:58.558041 1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0509 12:13:58.558067 1 connection.go:154] Connecting to unix:///csi/csi.sock
I0509 12:13:58.558907 1 main.go:199] Calling CSI driver to discover driver name
I0509 12:13:58.558919 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0509 12:13:58.558923 1 connection.go:184] GRPC request: {}
I0509 12:13:58.561096 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.0.0"}
I0509 12:13:58.561142 1 connection.go:187] GRPC error:
If you don't need the feature you may set
improved-volume-topology: 'false'
inConfigMaps/internal-feature-states.csi.vsphere.vmware.com
. Otherwise this can fail for multiple reasons (e.g. as pointed out because of missing permissions in vCenter). Simply disabling the feature we didn't want to use fixed the issue for us. It seems as it is enabled by default in more recent vSphere CSI releases. I'm not sure why this is needed since the manifest still has the comments args you'd need to enable toplogy awareness. The new feature gates are not very well documented.
@omniproc Thanks, this helped me!
What happened:
vsphere-csi-node
DaemonSetnode-driver-registrar
fails withfailed to get CsiNodeTopology for the node
What you expected to happen: The only information I can find on CSINodeTopology with respect to this driver is on the guide Deploying vSphere Container Storage Plug-in with Topology, however, I do NOT have the 2 arguments for the external-provisioner sidecar uncommented as instructed. Other than that, I can't even locate the
CSINodeTopology
cns.vmware.com/v1alpha1
CRD.How to reproduce it (as minimally and precisely as possible): Deploy the
vsphere-csi-driver
as instructed at Install vSphere Container Storage Plug-inAnything else we need to know?:
Environment:
/kind bug