Closed hirendave47 closed 5 months ago
We hit similar bugs with vSphere csi driver 3.0. More details as follows.
Starting from 3.0, cns topology feature flag is removed, and cannot be turned off OSS change. By comparing the passed log and failed logs, looks like there might be race condition happened during node registration. And the logic does not has a retry so the the following logic depending on it will fail.
vsphere-csi-node-mbdjf 1/2 CrashLoopBackOff
Node driver registry log
2023-03-31T12:28:22.245291043Z I0331 12:28:22.245133 1 main.go:102] Received GetInfo call: &InfoRequest{}
2023-03-31T12:28:22.245783165Z I0331 12:28:22.245690 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
2023-03-31T12:29:22.269924126Z I0331 12:29:22.269721 1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = timed out while waiting for topology labels to be updated in "c01057a88824-qual-323-0afbb584" CSINodeTopology instance.,}
2023-03-31T12:29:22.269969201Z E0331 12:29:22.269834 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = timed out while waiting for topology labels to be updated in "c01057a88824-qual-323-0afbb584" CSINodeTopology instance., restarting registration container.
vsphere-csi-driver log
2023-03-31T11:33:28.319787659Z {"level":"info","time":"2023-03-31T11:33:28.319728628Z","caller":"kubernetes/kubernetes.go:395","msg":"Setting client QPS to 100.000000 and Burst to 100.","TraceId":"a7e5dbf2-7c1d-4fb3-9fb3-730e47f69001"}
2023-03-31T11:33:28.345686928Z {"level":"info","time":"2023-03-31T11:33:28.344293416Z","caller":"k8sorchestrator/topology.go:727","msg":"Topology service initiated successfully","TraceId":"a7e5dbf2-7c1d-4fb3-9fb3-730e47f69001"}
2023-03-31T11:33:28.372612710Z {"level":"info","time":"2023-03-31T11:33:28.37245886Z","caller":"k8sorchestrator/topology.go:895","msg":"Successfully created a CSINodeTopology instance for NodeName: \"c01057a88824-qual-323-0afbb584\"","TraceId":"a7e5dbf2-7c1d-4fb3-9fb3-730e47f69001"}
2023-03-31T11:34:28.375379515Z {"level":"error","time":"2023-03-31T11:34:28.374319044Z","caller":"k8sorchestrator/topology.go:837","msg":"timed out while waiting for topology labels to be updated in \"c01057a88824-qual-323-0afbb584\" CSINodeTopology instance.","TraceId":"a7e5dbf2-7c1d-4fb3-9fb3-730e47f69001","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.(*nodeVolumeTopology).GetNodeTopologyLabels\n\t/build/pkg/csi/service/common/commonco/k8sorchestrator/topology.go:837\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).NodeGetInfo\n\t/build/pkg/csi/service/node.go:429\ngithub.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/spec@v1.7.0/lib/go/csi/csi.pb.go:6231\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}
2023-03-31T11:34:30.358756419Z {"level":"info","time":"2023-03-31T11:34:30.358584895Z","caller":"service/node.go:338","msg":"NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"f21e5f14-20bf-4bb0-a70b-2322ee91fa34"}
2023-03-31T11:34:30.372726452Z {"level":"info","time":"2023-03-31T11:34:30.37258219Z","caller":"k8sorchestrator/topology.go:892","msg":"CSINodeTopology instance already exists for NodeName: \"c01057a88824-qual-323-0afbb584\"","TraceId":"f21e5f14-20bf-4bb0-a70b-2322ee91fa34"}
2023-03-31T11:35:30.375537038Z {"level":"error","time":"2023-03-31T11:35:30.374942664Z","caller":"k8sorchestrator/topology.go:837","msg":"timed out while waiting for topology labels to be updated in \"c01057a88824-qual-323-0afbb584\" CSINodeTopology instance.","TraceId":"f21e5f14-20bf-4bb0-a70b-2322ee91fa34","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.(*nodeVolumeTopology).GetNodeTopologyLabels\n\t/build/pkg/csi/service/common/commonco/k8sorchestrator/topology.go:837\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).NodeGetInfo\n\t/build/pkg/csi/service/node.go:429\ngithub.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/spec@v1.7.0/lib/go/csi/csi.pb.go:6231\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}
"kubectl_logs_vsphere-csi-node-mbdjf_--container_vsphere-csi-node_--kubeconfig_.tmp.user-kubeconfig-56089268_--request-timeout_30s_--namespace_kube-system_--timestamps" 63L, 29225B
Some logs from vsphere-csi-controller:
2023-03-30T03:27:10.857292378Z {"level":"info","time":"2023-03-30T03:27:10.857253925Z","caller":"vsphere/virtualcentermanager.go:123","msg":"Successfully registered VC mtv-qual-vc03.anthos:443"}
2023-03-30T03:27:10.857373990Z {"level":"info","time":"2023-03-30T03:27:10.857306174Z","caller":"vsphere/virtualcenter.go:283","msg":"VirtualCenter.connect() creating new client"}
2023-03-30T03:27:10.860343329Z {"level":"info","time":"2023-03-30T03:27:10.860255209Z","caller":"node/manager.go:128","msg":"Discovering the node vm using uuid: \"4211ec99-cb15-a5cd-3193-49bb2883a3fb\"","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.860354679Z {"level":"info","time":"2023-03-30T03:27:10.860281279Z","caller":"vsphere/virtualmachine.go:159","msg":"Initiating asynchronous datacenter listing with uuid 4211ec99-cb15-a5cd-3193-49bb2883a3fb","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.883477322Z {"level":"info","time":"2023-03-30T03:27:10.883355912Z","caller":"k8sorchestrator/k8sorchestrator.go:644","msg":"configMapAdded: Internal feature state values from \"internal-feature-states.csi.vsphere.vmware.com\" stored successfully: map[async-query-volume:true block-volume-snapshot:true csi-migration:true csi-windows-support:true improved-csi-idempotency:true online-volume-extend:true trigger-csi-fullsync:false]","TraceId":"bd419e1b-0881-4e2e-b9d4-b90f744c9a1b"}
2023-03-30T03:27:10.914441786Z {"level":"info","time":"2023-03-30T03:27:10.914259728Z","caller":"vsphere/virtualcenter.go:202","msg":"New session ID for 'VSPHERE.LOCAL\\herc-32210b8fe0bd' = 52611e95-9457-8d31-c11e-9edbca63b82e"}
2023-03-30T03:27:10.914464578Z {"level":"info","time":"2023-03-30T03:27:10.914314422Z","caller":"vsphere/virtualcenter.go:291","msg":"VirtualCenter.connect() successfully created new client"}
2023-03-30T03:27:10.914468743Z {"level":"info","time":"2023-03-30T03:27:10.914338151Z","caller":"vsphere/virtualcenter.go:606","msg":"vCenterInstance initialized"}
2023-03-30T03:27:10.914549752Z {"level":"info","time":"2023-03-30T03:27:10.914476906Z","caller":"volume/manager.go:193","msg":"Initializing new defaultManager..."}
2023-03-30T03:27:10.914677990Z {"level":"info","time":"2023-03-30T03:27:10.914582011Z","caller":"syncer/metadatasyncer.go:417","msg":"Adding watch on path: \"/etc/cloud\""}
2023-03-30T03:27:10.914813271Z {"level":"info","time":"2023-03-30T03:27:10.914732425Z","caller":"volume/manager.go:190","msg":"Retrieving existing defaultManager..."}
2023-03-30T03:27:10.917155580Z {"level":"info","time":"2023-03-30T03:27:10.917045953Z","caller":"kubernetes/kubernetes.go:79","msg":"k8s client using kubeconfig from /etc/kubernetes/kubeconfig.conf"}
2023-03-30T03:27:10.917932242Z {"level":"info","time":"2023-03-30T03:27:10.917828978Z","caller":"kubernetes/kubernetes.go:395","msg":"Setting client QPS to 100.000000 and Burst to 100."}
2023-03-30T03:27:10.924330377Z {"level":"info","time":"2023-03-30T03:27:10.924234917Z","caller":"vsphere/datacenter.go:154","msg":"Publishing datacenter Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.924393747Z {"level":"info","time":"2023-03-30T03:27:10.924310978Z","caller":"vsphere/virtualmachine.go:196","msg":"AsyncGetAllDatacenters with uuid 4211ec99-cb15-a5cd-3193-49bb2883a3fb sent a dc Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933106473Z {"level":"info","time":"2023-03-30T03:27:10.933012951Z","caller":"vsphere/virtualmachine.go:210","msg":"Found VM VirtualMachine:vm-623336 [VirtualCenterHost: mtv-qual-vc03.anthos, UUID: 4211ec99-cb15-a5cd-3193-49bb2883a3fb, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]] given uuid 4211ec99-cb15-a5cd-3193-49bb2883a3fb on DC Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933121137Z {"level":"info","time":"2023-03-30T03:27:10.93304699Z","caller":"vsphere/virtualmachine.go:221","msg":"Returning VM VirtualMachine:vm-623336 [VirtualCenterHost: mtv-qual-vc03.anthos, UUID: 4211ec99-cb15-a5cd-3193-49bb2883a3fb, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]] for UUID 4211ec99-cb15-a5cd-3193-49bb2883a3fb","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933125305Z {"level":"info","time":"2023-03-30T03:27:10.933063049Z","caller":"node/manager.go:151","msg":"Successfully discovered node with nodeUUID 4211ec99-cb15-a5cd-3193-49bb2883a3fb in vm VirtualMachine:vm-623336 [VirtualCenterHost: mtv-qual-vc03.anthos, UUID: 4211ec99-cb15-a5cd-3193-49bb2883a3fb, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-2 @ /mtv-qual-vc03, VirtualCenterHost: mtv-qual-vc03.anthos]]","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933128956Z {"level":"info","time":"2023-03-30T03:27:10.933073905Z","caller":"node/manager.go:134","msg":"Successfully discovered node: \"32210b8fe0bd-qual-private306-15003d10\" with nodeUUID \"4211ec99-cb15-a5cd-3193-49bb2883a3fb\"","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"}
2023-03-30T03:27:10.933136796Z {"level":"info","time":"2023-03-30T03:27:10.933081969Z","caller":"node/manager.go:136","msg":"Successfully registered node: \"32210b8fe0bd-qual-private306-15003d10\" with nodeUUID \"4211ec99-cb15-a5cd-3193-49bb2883a3fb\"","TraceId":"a360dbae-71de-43a2-8b2b-38fff67feebb"
2023-03-31T11:33:19.880026310Z {"level":"info","time":"2023-03-31T11:33:19.691025909Z","caller":"vsphere/virtualcentermanager.go:74","msg":"Initializing defaultVirtualCenterManager..."}
2023-03-31T11:33:19.880028344Z {"level":"info","time":"2023-03-31T11:33:19.69104239Z","caller":"vsphere/virtualcentermanager.go:76","msg":"Successfully initialized defaultVirtualCenterManager"}
2023-03-31T11:33:19.880031590Z {"level":"error","time":"2023-03-31T11:33:19.691146997Z","caller":"vsphere/virtualmachine.go:227","msg":"Returning VM not found err for UUID 420efa50-4b8b-f108-347a-0a8f21fdb714","TraceId":"973eef74-6eac-4af0-abeb-5cd217b8eaa1","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualMachineByUUID\n\t/build/pkg/common/cns-lib/vsphere/virtualmachine.go:227\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).DiscoverNode\n\t/build/pkg/common/cns-lib/node/manager.go:145\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:129\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880046207Z {"level":"error","time":"2023-03-31T11:33:19.691213382Z","caller":"node/manager.go:147","msg":"Couldn't find VM instance with nodeUUID 420efa50-4b8b-f108-347a-0a8f21fdb714, failed to discover with err: virtual machine wasn't found","TraceId":"973eef74-6eac-4af0-abeb-5cd217b8eaa1","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).DiscoverNode\n\t/build/pkg/common/cns-lib/node/manager.go:147\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:129\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880051146Z {"level":"error","time":"2023-03-31T11:33:19.691256924Z","caller":"node/manager.go:131","msg":"failed to discover VM with uuid: \"420efa50-4b8b-f108-347a-0a8f21fdb714\" for node: \"c01057a88824-qual-323-0afbb5d7\"","TraceId":"973eef74-6eac-4af0-abeb-5cd217b8eaa1","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:131\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880056226Z {"level":"warn","time":"2023-03-31T11:33:19.691282823Z","caller":"node/nodes.go:72","msg":"failed to register node:\"c01057a88824-qual-323-0afbb5d7\". err=virtual machine wasn't found","TraceId":"973eef74-6eac-4af0-abeb-5cd217b8eaa1"}
2023-03-31T11:33:19.880058280Z {"level":"info","time":"2023-03-31T11:33:19.691529217Z","caller":"node/manager.go:128","msg":"Discovering the node vm using uuid: \"420e47c7-ba23-d2a8-6656-526432f8313b\"","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be"}
2023-03-31T11:33:19.880060013Z {"level":"info","time":"2023-03-31T11:33:19.691588529Z","caller":"vsphere/virtualmachine.go:159","msg":"Initiating asynchronous datacenter listing with uuid 420e47c7-ba23-d2a8-6656-526432f8313b","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be"}
2023-03-31T11:33:19.880066836Z {"level":"error","time":"2023-03-31T11:33:19.691638263Z","caller":"vsphere/virtualmachine.go:227","msg":"Returning VM not found err for UUID 420e47c7-ba23-d2a8-6656-526432f8313b","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualMachineByUUID\n\t/build/pkg/common/cns-lib/vsphere/virtualmachine.go:227\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).DiscoverNode\n\t/build/pkg/common/cns-lib/node/manager.go:145\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:129\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880069351Z {"level":"error","time":"2023-03-31T11:33:19.691715308Z","caller":"node/manager.go:147","msg":"Couldn't find VM instance with nodeUUID 420e47c7-ba23-d2a8-6656-526432f8313b, failed to discover with err: virtual machine wasn't found","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).DiscoverNode\n\t/build/pkg/common/cns-lib/node/manager.go:147\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:129\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880080141Z {"level":"error","time":"2023-03-31T11:33:19.691763489Z","caller":"node/manager.go:131","msg":"failed to discover VM with uuid: \"420e47c7-ba23-d2a8-6656-526432f8313b\" for node: \"c01057a88824-qual-323-0afbb584\"","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).RegisterNode\n\t/build/pkg/common/cns-lib/node/manager.go:131\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).nodeAdd\n\t/build/pkg/common/cns-lib/node/nodes.go:69\nk8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/controller.go:232\nk8s.io/client-go/tools/cache.(*processorListener).run.func1\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:818\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:135\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:92\nk8s.io/client-go/tools/cache.(*processorListener).run\n\t/go/pkg/mod/k8s.io/client-go@v0.25.2/tools/cache/shared_informer.go:812\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.25.2/pkg/util/wait/wait.go:75"}
2023-03-31T11:33:19.880082445Z {"level":"warn","time":"2023-03-31T11:33:19.691811169Z","caller":"node/nodes.go:72","msg":"failed to register node:\"c01057a88824-qual-323-0afbb584\". err=virtual machine wasn't found","TraceId":"2ceea003-59f4-4aa9-a37c-b2af5e4c48be"}
2023-03-31T11:33:19.880084559Z {"level":"info","time":"2023-03-31T11:33:19.692076979Z","caller":"vsphere/virtualcentermanager.go:123","msg":"Successfully registered VC atl-qual-vc06.anthos:443"}
2023-03-31T11:33:19.880086253Z {"level":"info","time":"2023-03-31T11:33:19.692285863Z","caller":"node/manager.go:128","msg":"Discovering the node vm using uuid: \"420e8daf-1b9e-e49f-e9f9-2c59ab10c59f\"","TraceId":"c42c3ab6-ed72-4328-9ec1-ea34824bdbd4"}
2023-03-31T11:33:19.880088266Z {"level":"info","time":"2023-03-31T11:33:19.692375572Z","caller":"vsphere/virtualmachine.go:159","msg":"Initiating asynchronous datacenter listing with uuid 420e8daf-1b9e-e49f-e9f9-2c59ab10c59f","TraceId":"c42c3ab6-ed72-4328-9ec1-ea34824bdbd4"}
2023-03-31T11:33:19.880090110Z {"level":"info","time":"2023-03-31T11:33:19.692324796Z","caller":"vsphere/virtualcenter.go:283","msg":"VirtualCenter.connect() creating new client"}
/cc @divyenpatel @xing-yang @msau42 @gnufied @jsafrane
Similar issue opened before https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1661
Before the driver can create CR instances, it is necessary to register the CsiNodeTopology CRD. Once this CRD has been successfully registered, the CSI Node Daemonset's pod will be able to create CsiNodeTopology instances for the node. Node discovery should then take place. If the CRD registration has not taken place yet, you may see the Node Daemonsets pod in a crash loop back off state.
which component register "register the CsiNodeTopology CRD"? how to make sure CRD successfully registered before CSI node daemonset pod to create the instance?
syncer component in the driver registers the CRD. Refer to https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/f762a45e4db2b59e36e80ee943bc42a84f4980cc/pkg/syncer/cnsoperator/manager/init.go#L205-L212
how to make sure CRD successfully registered before CSI node daemonset pod to create the instance?
you can install the CSI controller Pod first, let all required CRDs to register and wait for complete initialization and then deploy CSI Node Daemonset pods.
Please see more logs https://gist.github.com/jingxu97/cc013868270f4d05497a7aba2b59221c
From what I searched, some logic related to discover VM and register does not have a retry logic, and causing the following VM not found error.
how to make sure CRD successfully registered before CSI node daemonset pod to create the instance?
you can install the CSI controller Pod first, let all required CRDs to register and wait for complete initialization and then deploy CSI Node Daemonset pods.
Could it possible to add a retry logic instead of having the strict requirement on ordering? It is hard to add the ordering logic when deploying the controller and driver, I think.
guys I have the same problem, I am using however version 3.0.0 I have the pods the CrashLoopBackOff:
vsphere-csi-controller-68c65dbdd5-cb9jb 0/7 Pending 0 19m
vsphere-csi-controller-68c65dbdd5-whswk 0/7 Pending 0 19m
vsphere-csi-node-9qlc6 2/3 CrashLoopBackOff 5 (28s ago) 3m40s
vsphere-csi-node-h9hkq 2/3 CrashLoopBackOff 5 (30s ago) 3m40s
vsphere-csi-node-nbvfp 2/3 CrashLoopBackOff 5 (45s ago) 3m40s
and going into the logs in one of the pods I get this:
Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I0403 22:57:51.418542 1 main.go:167] Version: v2.7.0
I0403 22:57:51.418588 1 main.go:168] Running node-driver-registrar in mode=registration
I0403 22:57:51.419473 1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0403 22:57:51.419515 1 connection.go:154] Connecting to unix:///csi/csi.sock
I0403 22:57:51.420762 1 main.go:199] Calling CSI driver to discover driver name
I0403 22:57:51.420772 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0403 22:57:51.420776 1 connection.go:184] GRPC request: {}
I0403 22:57:51.424195 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.0.0"}
I0403 22:57:51.424239 1 connection.go:187] GRPC error: <nil>
I0403 22:57:51.424247 1 main.go:209] CSI driver name: "csi.vsphere.vmware.com"
I0403 22:57:51.424312 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock
I0403 22:57:51.424466 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock
I0403 22:57:51.424537 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0403 22:57:52.522333 1 main.go:102] Received GetInfo call: &InfoRequest{}
I0403 22:57:52.522670 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I0403 22:57:52.533985 1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "k8s-worker02". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0403 22:57:52.534009 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "k8s-worker02". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
I'm following step by step this guide: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-54BB79D2-B13F-4673-8CC2-63A772D17B3C.html
My env consists of: k8s cluster 1.26.3 1 master node 2 worker node esxi 7.0.3 vcenter 7.0.3
@gabrieletosca I see you have pending vsphere-csi-controller Pods? Can you make CSI controller pod up and running and later check CSI Node Daemonset Pod status?
@gabrieletosca I see you have pending vsphere-csi-controller Pods? Can you make CSI controller pod up and running and later check CSI Node Daemonset Pod status?
I can't unfortunatly...
this is the log of kubectl describe pods vsphere-csi-controller-68c65dbdd5-cb9jb --namespace=vmware-system-csi
:
0/3 nodes are available: 3 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
and this is the grep for kubectl describe nodes | egrep "Taints:|Name:"
:
Name: k8s-master01
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: k8s-worker01
Taints: node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
Name: k8s-worker02
Taints: node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
I see that in the guide it says to taint only the master (https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-4E47B9F1-B250-4B36-8FEC-8F45E6529D23.html), but at this link (https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-0AB6E692-AA47-4B6A-8CEA-38B754E16567.html) it says to do it on all nodes... I also tried to remove the taint from the workers but it still doesn't work
@jingxu97 During our debugging session we observed that some of the feature gates were disabled in the v3.0.0 release you were using.
Do you see this issue getting resolved on Anthos setup after enabling all required feature gates for the release v3.0.0?
I am running into the same issue with vanilla (kubeadm) kubernetes version 1.26.3 when installing the csi-driver version 3.0 (using this manifest: https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.0.0/manifests/vanilla/vsphere-csi-driver.yaml):
I0424 16:09:37.991893 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I0424 16:09:38.030150 1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "omni-kube-controlplane-1". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0424 16:09:38.030228 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "omni-kube-controlplane-1". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
From what I understand reading this thread is that there's no hack to solve this in the meantime, right?
I tried installing only the csi controller deployment first, but I've come across these logs (that might have existed initially too):
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.945536568Z","caller":"vsphere/virtualcenter.go:171","msg":"failed to create new client with err: Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).NewClient\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:171\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:284\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).Connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:259\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:645\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.945727725Z","caller":"vsphere/virtualcenter.go:285","msg":"failed to create govmomi client with err: Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:285\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).Connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:259\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:645\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.946527487Z","caller":"vsphere/virtualcenter.go:287","msg":"failed to connect to vCenter using CA file: \"\"","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:287\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).Connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:259\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:645\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.947057249Z","caller":"vsphere/virtualcenter.go:261","msg":"Cannot connect to vCenter with err: Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.(*VirtualCenter).Connect\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:261\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:645\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.947548438Z","caller":"vsphere/virtualcenter.go:647","msg":"failed to connect to VirtualCenter host: \"vmcenter.example.com\". Err: Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/vsphere.GetVirtualCenterInstanceForVCenterConfig\n\t/build/pkg/common/cns-lib/vsphere/virtualcenter.go:647\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:234\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.947945134Z","caller":"vanilla/controller.go:236","msg":"failed to get vCenterInstance for vCenter \"vmcenter.example.com\"err=Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"36e48351-db5d-4deb-9681-9e8f7fca695b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).Init\n\t/build/pkg/csi/service/vanilla/controller.go:236\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:188\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.948286142Z","caller":"service/driver.go:189","msg":"failed to init controller. Error: failed to get vCenterInstance for vCenter \"vmcenter.example.com\"err=Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"734c882c-aa75-4f16-be5c-21146bdf9e4b","TraceId":"e0546129-339a-4946-a9cc-195f5b5549b3","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/build/pkg/csi/service/driver.go:189\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:202\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"info","time":"2023-04-24T16:38:27.948620496Z","caller":"service/driver.go:109","msg":"Configured: \"csi.vsphere.vmware.com\" with clusterFlavor: \"VANILLA\" and mode: \"controller\"","TraceId":"734c882c-aa75-4f16-be5c-21146bdf9e4b","TraceId":"e0546129-339a-4946-a9cc-195f5b5549b3"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-csi-controller {"level":"error","time":"2023-04-24T16:38:27.948939301Z","caller":"service/driver.go:203","msg":"failed to run the driver. Err: +failed to get vCenterInstance for vCenter \"vmcenter.example.com\"err=Post \"https://vmcenter.example.com:443/sdk\": dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused","TraceId":"734c882c-aa75-4f16-be5c-21146bdf9e4b","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:203\nmain.main\n\t/build/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Stream closed EOF for vmware-system-csi/vsphere-csi-controller-68c65dbdd5-z6g85 (vsphere-csi-controller)
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.08636366Z","caller":"logger/logger.go:41","msg":"Setting default log level to :\"PRODUCTION\""}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.090380782Z","caller":"syncer/main.go:86","msg":"Version : v3.0.0","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.090656443Z","caller":"syncer/main.go:103","msg":"Starting container with operation mode: METADATA_SYNC","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.090779546Z","caller":"kubernetes/kubernetes.go:86","msg":"k8s client using in-cluster config","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.091272287Z","caller":"kubernetes/kubernetes.go:395","msg":"Setting client QPS to 100.000000 and Burst to 100.","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
vsphere-csi-controller-68c65dbdd5-z6g85 vsphere-syncer {"level":"info","time":"2023-04-24T16:35:30.092658207Z","caller":"syncer/main.go:125","msg":"Starting the http server to expose Prometheus metrics..","TraceId":"920d74a9-f82a-47d6-8a0d-20633809c584"}
What stands out is:
dial tcp: lookup vmcenter.example.com on 127.0.0.1:53: read udp 127.0.0.1:38377->127.0.0.1:53: read: connection refused
From what I understand, the pod is trying to connect to a DNS on the localhost, which isn't responding, which for me doesn't make any sense, because it's supposed to be connecting to coredns. The pod has its own ip, so it's using a separate network namespace. It's rather hard to follow the logic of it all.
@lethargosapatheia Once you fix connectivity between vSpehre CSI Controller Pod and vCenter server, the issue regarding CSINodeTopology CRD is not found will be fixed.
basically, registration of CRD happens in the syncer container, and if it crashes before registering CRDs then Node Daemonset Pods will not be unable to create required CRD instances.
@divyenpatel You're right, something isn't actually working properly, I had tinkered with the cluster a little bit before and the coredns service doesn't respond correctly, even if the pods themselves work. I'll have to have a look at that and get back if it all works ok. Thank you for your fast answer!
Ok, I've actually mixed some things up. The thing is, there's no connectivity issue to the DNS. The service actually works fine. Entering the network namespace of a random container (csi-snapshotter) inside the same pod works perfectly:
nsenter -n -t 107665 dig A vmcenter.example.com
; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> A vmcenter.example.com @10.96.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18982
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 56dddadc2663ae20 (echoed)
;; QUESTION SECTION:
;vmcenter.example.com. IN A
;; ANSWER SECTION:
vmcenter.example.com. 30 IN A 10.0.0.1
;; Query time: 4 msec
;; SERVER: 10.96.0.10#53(10.96.0.10) (UDP)
;; WHEN: Mon Apr 24 17:42:53 UTC 2023
;; MSG SIZE rcvd: 91
Is the container explicitly trying to connect to a different DNS server (localhost) than the one it's supposed to (coredns service)? I know it's a stupid question, but I don't get the error at all :)
I also see that the deployment assumes there are three controlplane nodes. I also have two controlplane nodes (and three etcd nodes), so I don't need more. Do you strictly need three controller pods to create a cluster or would it work with two replicas too? Or should I maybe let then run on the worker nodes also?
Having looked again at that pod, I see that the bindmount to /etc/resolv.conf leads to a file on the host whose contents are:
nameserver 127.0.0.1
On a normal deployment, I should have something like:
nameserver 10.96.0.10
options ndots:5
On the exact same node I have a calico-kube-controller pod which also has the right resolver (10.96.0.10).
I've added what I consider to be an issue here: https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/2354 It doesn't look as though the dns behaves correctly inside the pod. But I'm not sure how this is happening.
We will move the registration of CRs in the deployment YAML file so we do not have internal container dependencies while Pod is coming up.
cc: @vdkotkar
/assign @vdkotkar
@divyenpatel: GitHub didn't allow me to assign the following users: vdkotkar.
Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
/assign vdkotkar
Hello Guys,
I am seeing the same problem as discussed in this issue. Same is the case with 3.0.0 and 3.0.2.
I also had a look at https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1661
Wanted to check if there are other workarounds / real-fix to this? Or am I making some mistakes in configuring the vSphere CSI Driver.
More details =>
Pods are the CrashLoopBackOff:
vmware-system-csi vsphere-csi-controller-5867b9fc45-5kft8 0/7 Pending
vmware-system-csi vsphere-csi-controller-5867b9fc45-hc9c8 0/7 Pending
vmware-system-csi vsphere-csi-controller-5867b9fc45-z7pqf 0/7 Pending
vmware-system-csi vsphere-csi-node-8zlhk 2/3 CrashLoopBackOff
vmware-system-csi vsphere-csi-node-pr4bf 2/3 CrashLoopBackOff
vmware-system-csi vsphere-csi-node-t7m7b 2/3 CrashLoopBackOff
My setup is:
kubectl get no
NAME STATUS ROLES AGE VERSION
rke2vm1 Ready control-plane,etcd,master 41m v1.25.10+rke2r1
rke2vm2 Ready control-plane,etcd,master 37m v1.25.10+rke2r1
rke2vm3 Ready control-plane,etcd,master 37m v1.25.10+rke2r1
kubectl -n vmware-system-csi logs vsphere-csi-node-8zlhk
Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I0809 15:57:23.745216 1 main.go:167] Version: v2.7.0
I0809 15:57:23.745254 1 main.go:168] Running node-driver-registrar in mode=registration
I0809 15:57:23.745845 1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0809 15:57:23.745873 1 connection.go:154] Connecting to unix:///csi/csi.sock
I0809 15:57:23.746532 1 main.go:199] Calling CSI driver to discover driver name
I0809 15:57:23.746545 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0809 15:57:23.746553 1 connection.go:184] GRPC request: {}
I0809 15:57:23.751808 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.0.2"}
I0809 15:57:23.751927 1 connection.go:187] GRPC error: <nil>
I0809 15:57:23.751939 1 main.go:209] CSI driver name: "csi.vsphere.vmware.com"
I0809 15:57:23.752785 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock
I0809 15:57:23.753406 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock
I0809 15:57:23.753607 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I0809 15:57:25.116945 1 main.go:102] Received GetInfo call: &InfoRequest{}
I0809 15:57:25.117192 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I0809 15:57:25.134647 1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "rke2vm2". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E0809 15:57:25.134679 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "rke2vm2". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
When trying with CSI v3.0.2, I have used as-is this => https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.0.2/manifests/vanilla/vsphere-csi-driver.yaml
Any help please?
Hi Venkat,
I see that your controller pods are in pending state, that is eventually causing node pods to go in CrashLoopBackOff state.
Please check why your controller pods are in pending state. You can paste the output of kubectl describe pod vsphere-csi-controller-5867b9fc45-5kft8 -n vmware-system-csi
. Check if some affinity, anti-affinity rules on pod are causing this issue.
Also, in kubectl get nodes output, I can see only master nodes. Don't you have any worker nodes?
Hi Vipul, thanks for the reply. I could get past that problem and get to the finish successfully (vSphere CSI Driver v3.0.2). I have a 3 Node RKE2 cluster, no dedicated Compute nodes. Control and Compute runs on those VMs
But I had a question..
In this https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.0.2/manifests/vanilla/vsphere-csi-driver.yaml YAML, I see =>
nodeSelector:
node-role.kubernetes.io/control-plane: ""
Due to that, those PODS were not coming up.
Labels on the nodes are like this =>
# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
rke2vm1 Ready control-plane,etcd,master 67m v1.25.10+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=mantaray
rke2vm2 Ready control-plane,etcd,master 64m v1.25.10+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=mantaray
rke2vm3 Ready control-plane,etcd,master 63m v1.25.10+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm3,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=mantaray
So, I changed to =>
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
Then all PODS came up and everything is OK.
I wonder why the earlier selector did not select my nodes. I was thinking the that selector being empty, it must act as wildcard? Any help?
Anyway, thanks for all the help.
@vu3oim only key is needed in the label for node-role-kubernetes-io-control-plane
https://kubernetes.io/docs/reference/labels-annotations-taints/#node-role-kubernetes-io-control-plane
How are you deploying k8s cluster?
Hi Divyen, I am installing RKE2
RKE2 config file => /etc/rancher/rke2/config.yaml =>
token: mytoken
write-kubeconfig-mode: "0644"
cluster-cidr: "10.128.0.0/14,fd02::/48"
service-cidr: "172.30.0.0/16,fd03::/112"
tls-san:
- rke2vm1
- rke2vm2
- rke2vm3
node-label:
- "product=test"
disable-cloud-controller: "true"
debug: true
Its a 3 Node K8s cluster (runs both control and compute)
# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
rke2vm1 Ready control-plane,etcd,master 67m v1.25.10+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=test
rke2vm2 Ready control-plane,etcd,master 64m v1.25.10+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm2,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=test
rke2vm3 Ready control-plane,etcd,master 63m v1.25.10+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=rke2vm3,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-6gb.os-unknown,product=test
Then I just follow the vSphere CPI/CSI 3.0 documentation instructions.
Strangely, I need to vSphere CSI Driver yaml (vsphere-csi-driver.yaml) with this to get successful installation of vSphere CSI => node-role.kubernetes.io/control-plane: "true"
Getting similar issue:
/kind bug
What happened: Trying to install vSphere CSI drivers v3.0.2 with RKE2 cluster v1.26.9+rke2r1
csi-vsphere.conf is added to the secrets kubectl create secret generic vsphere-config-secret --from-file=/etc/kubernetes/secret.conf --namespace=vmware-system-csi
I can ping IP of the vCenter from any k8s nodes.
[Global]
cluster-id = "my-k8s-vmw"
cluster-distribution = "native"
[VirtualCenter "172.16.1.1"]
insecure-flag = "true"
user = "k8s-vsphere-csi@local"
password = "mypwd"
port = "443"
datacenters = "mydc"
after deploying csi:
kubectl --namespace=vmware-system-csi get all
NAME READY STATUS RESTARTS AGE
pod/vsphere-csi-controller-5d88977cdf-6bb7w 0/7 Pending 0 12m
pod/vsphere-csi-controller-5d88977cdf-gfq52 0/7 Pending 0 12m
pod/vsphere-csi-controller-78cb9ff564-x8djs 0/7 Pending 0 12m
pod/vsphere-csi-node-2pg4x 2/3 CrashLoopBackOff 7 (25s ago) 11m
pod/vsphere-csi-node-ffhc6 2/3 CrashLoopBackOff 7 (35s ago) 11m
pod/vsphere-csi-node-grbwh 2/3 CrashLoopBackOff 7 (41s ago) 11m
pod/vsphere-csi-node-k8jkb 2/3 CrashLoopBackOff 7 (31s ago) 11m
pod/vsphere-csi-node-w8bps 2/3 CrashLoopBackOff 7 (23s ago) 11m
pod/vsphere-csi-node-xkxvg 2/3 CrashLoopBackOff 7 (27s ago) 11m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/vsphere-csi-controller ClusterIP 10.43.84.182 <none> 2112/TCP,2113/TCP 3d5h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/vsphere-csi-node 6 6 0 6 0 kubernetes.io/os=linux 3d5h
daemonset.apps/vsphere-csi-node-windows 0 0 0 0 0 kubernetes.io/os=windows 3d5h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/vsphere-csi-controller 0/3 1 0 3d5h
NAME DESIRED CURRENT READY AGE
replicaset.apps/vsphere-csi-controller-5d88977cdf 2 2 0 3d5h
replicaset.apps/vsphere-csi-controller-78cb9ff564 1 1 0 44m
Controller is not ready
kubectl get deployment --namespace=vmware-system-csi
NAME READY UP-TO-DATE AVAILABLE AGE
vsphere-csi-controller 0/3 1 0 3d5h
kubectl logs -n vmware-system-csi vsphere-csi-node-xkxvg
Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I1009 02:18:59.471111 1 main.go:167] Version: v2.7.0
I1009 02:18:59.471155 1 main.go:168] Running node-driver-registrar in mode=registration
I1009 02:18:59.471675 1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I1009 02:18:59.471696 1 connection.go:154] Connecting to unix:///csi/csi.sock
I1009 02:18:59.472696 1 main.go:199] Calling CSI driver to discover driver name
I1009 02:18:59.472764 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I1009 02:18:59.472785 1 connection.go:184] GRPC request: {}
I1009 02:18:59.474901 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.0.2"}
I1009 02:18:59.475020 1 connection.go:187] GRPC error: <nil>
I1009 02:18:59.475050 1 main.go:209] CSI driver name: "csi.vsphere.vmware.com"
I1009 02:18:59.475132 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock
I1009 02:18:59.475484 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock
I1009 02:18:59.475806 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I1009 02:19:00.760178 1 main.go:102] Received GetInfo call: &InfoRequest{}
I1009 02:19:00.760487 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I1009 02:19:00.779742 1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "so-m002". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E1009 02:19:00.779785 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "so-m002". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
Controller is not starting
kubectl logs -n vmware-system-csi vsphere-csi-controller-5d88977cdf-6bb7w
Defaulted container "csi-attacher" out of: csi-attacher, csi-resizer, vsphere-csi-controller, liveness-probe, vsphere-syncer, csi-provisioner, csi-snapshotter
kubectl get csinodetopologies
error: the server doesn't have a resource type "csinodetopologies"
6 nodes, 3 master & 3 workers
kubectl get nodes
NAME STATUS ROLES AGE VERSION
so-m001 Ready control-plane,etcd,master 12d v1.26.9+rke2r1
so-m002 Ready control-plane,etcd,master 12d v1.26.9+rke2r1
so-m003 Ready control-plane,etcd,master 12d v1.26.9+rke2r1
so-w001 Ready <none> 12d v1.26.9+rke2r1
so-w002 Ready <none> 12d v1.26.9+rke2r1
so-w003 Ready <none> 12d v1.26.9+rke2r1
Example of storage class & PVC with block device:
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: example-sc
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.vsphere.vmware.com
parameters:
storagepolicyname: "vSAN Default Storage Policy" #Optional Parameter
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: example-sc
---
apiVersion: v1
kind: Pod
metadata:
name: pod-with-pvc
spec:
containers:
- name: test-container
image: gcr.io/google_containers/busybox:1.24
command: ["/bin/sh", "-c", "echo 'hello from appl' >> /mnt/volume1/index.html && while true ; do sleep 2 ; done"]
volumeMounts:
- name: volume1
mountPath: /mnt/volume1
restartPolicy: Always
volumes:
- name: volume1
persistentVolumeClaim:
claimName: example-pvc
Example of storage class & PVC with shared storage via nfs. NFS service is enabled in Center & any IP available, VM added to VLAN and can ping NFS IP service from vSAN:
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: nfs-sc
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.vsphere.vmware.com
parameters:
csi.storage.k8s.io/fstype: "nfs4"
storagepolicyname: "ftt-1-vsan-eq" #Optional Parameter
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
storageClassName: nfs-sc
---
apiVersion: v1
kind: Pod
metadata:
name: pod-with-nfs-pvc1
spec:
containers:
- name: test-container-nfs
image: gcr.io/google_containers/busybox:1.24
command: ["/bin/sh", "-c", "echo 'hello from appl' >> /mnt/volume1/index.html && while true ; do sleep 2 ; done"]
volumeMounts:
- name: file-volume
mountPath: /mnt/volume1
restartPolicy: Always
volumes:
- name: file-volume
persistentVolumeClaim:
claimName: nfs-pvc
---
apiVersion: v1
kind: Pod
metadata:
name: pod-with-nfs-pvc2
spec:
containers:
- name: test-container-nfs
image: gcr.io/google_containers/busybox:1.24
command: ["/bin/sh", "-c", "echo 'hello from appl' >> /mnt/volume1/index.html && while true ; do sleep 2 ; done"]
volumeMounts:
- name: file-volume
mountPath: /mnt/volume1
restartPolicy: Always
volumes:
- name: file-volume
persistentVolumeClaim:
claimName: nfs-pvc
csi-vsphere version: v3.0.2 Kubernetes version: v1.26.9+rke2r1 vSphere version: 7.0.3 OS (e.g. from /etc/os-release): Ubuntu 22.04 Kernel (e.g. uname -a): 5.15.0-86-generic Install tools: NA Others: MetalLB load balancer with BGP IPv4 & IPv6 CNI: Cilium with DualStack config
Getting similar issue:
/kind bug
What happened: Trying to install vSphere CSI drivers v3.0.2 with RKE2 cluster v1.26.9+rke2r1
csi-vsphere.conf is added to the secrets
kubectl create secret generic vsphere-config-secret --from-file=/etc/kubernetes/secret.conf --namespace=vmware-system-csi
I can ping IP of the vCenter from any k8s nodes.[Global] cluster-id = "my-k8s-vmw" cluster-distribution = "native" [VirtualCenter "172.16.1.1"] insecure-flag = "true" user = "k8s-vsphere-csi@local" password = "mypwd" port = "443" datacenters = "mydc"
after deploying csi:
kubectl --namespace=vmware-system-csi get all NAME READY STATUS RESTARTS AGE pod/vsphere-csi-controller-5d88977cdf-6bb7w 0/7 Pending 0 12m pod/vsphere-csi-controller-5d88977cdf-gfq52 0/7 Pending 0 12m pod/vsphere-csi-controller-78cb9ff564-x8djs 0/7 Pending 0 12m pod/vsphere-csi-node-2pg4x 2/3 CrashLoopBackOff 7 (25s ago) 11m pod/vsphere-csi-node-ffhc6 2/3 CrashLoopBackOff 7 (35s ago) 11m pod/vsphere-csi-node-grbwh 2/3 CrashLoopBackOff 7 (41s ago) 11m pod/vsphere-csi-node-k8jkb 2/3 CrashLoopBackOff 7 (31s ago) 11m pod/vsphere-csi-node-w8bps 2/3 CrashLoopBackOff 7 (23s ago) 11m pod/vsphere-csi-node-xkxvg 2/3 CrashLoopBackOff 7 (27s ago) 11m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/vsphere-csi-controller ClusterIP 10.43.84.182 <none> 2112/TCP,2113/TCP 3d5h NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/vsphere-csi-node 6 6 0 6 0 kubernetes.io/os=linux 3d5h daemonset.apps/vsphere-csi-node-windows 0 0 0 0 0 kubernetes.io/os=windows 3d5h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/vsphere-csi-controller 0/3 1 0 3d5h NAME DESIRED CURRENT READY AGE replicaset.apps/vsphere-csi-controller-5d88977cdf 2 2 0 3d5h replicaset.apps/vsphere-csi-controller-78cb9ff564 1 1 0 44m
Getting failed to get CsiNodeTopology
Controller is not ready
kubectl get deployment --namespace=vmware-system-csi NAME READY UP-TO-DATE AVAILABLE AGE vsphere-csi-controller 0/3 1 0 3d5h
kubectl logs -n vmware-system-csi vsphere-csi-node-xkxvg Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe I1009 02:18:59.471111 1 main.go:167] Version: v2.7.0 I1009 02:18:59.471155 1 main.go:168] Running node-driver-registrar in mode=registration I1009 02:18:59.471675 1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock" I1009 02:18:59.471696 1 connection.go:154] Connecting to unix:///csi/csi.sock I1009 02:18:59.472696 1 main.go:199] Calling CSI driver to discover driver name I1009 02:18:59.472764 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo I1009 02:18:59.472785 1 connection.go:184] GRPC request: {} I1009 02:18:59.474901 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.0.2"} I1009 02:18:59.475020 1 connection.go:187] GRPC error: <nil> I1009 02:18:59.475050 1 main.go:209] CSI driver name: "csi.vsphere.vmware.com" I1009 02:18:59.475132 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock I1009 02:18:59.475484 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock I1009 02:18:59.475806 1 node_register.go:92] Skipping HTTP server because endpoint is set to: "" I1009 02:19:00.760178 1 main.go:102] Received GetInfo call: &InfoRequest{} I1009 02:19:00.760487 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration" I1009 02:19:00.779742 1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "so-m002". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",} E1009 02:19:00.779785 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "so-m002". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
Controller is not starting
kubectl logs -n vmware-system-csi vsphere-csi-controller-5d88977cdf-6bb7w Defaulted container "csi-attacher" out of: csi-attacher, csi-resizer, vsphere-csi-controller, liveness-probe, vsphere-syncer, csi-provisioner, csi-snapshotter
CSINodeTopology is not there
kubectl get csinodetopologies error: the server doesn't have a resource type "csinodetopologies"
6 nodes, 3 master & 3 workers
kubectl get nodes NAME STATUS ROLES AGE VERSION so-m001 Ready control-plane,etcd,master 12d v1.26.9+rke2r1 so-m002 Ready control-plane,etcd,master 12d v1.26.9+rke2r1 so-m003 Ready control-plane,etcd,master 12d v1.26.9+rke2r1 so-w001 Ready <none> 12d v1.26.9+rke2r1 so-w002 Ready <none> 12d v1.26.9+rke2r1 so-w003 Ready <none> 12d v1.26.9+rke2r1
Examples of SC & PVC
Example of storage class & PVC with block device:
--- kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: example-sc annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: csi.vsphere.vmware.com parameters: storagepolicyname: "vSAN Default Storage Policy" #Optional Parameter --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: example-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi storageClassName: example-sc --- apiVersion: v1 kind: Pod metadata: name: pod-with-pvc spec: containers: - name: test-container image: gcr.io/google_containers/busybox:1.24 command: ["/bin/sh", "-c", "echo 'hello from appl' >> /mnt/volume1/index.html && while true ; do sleep 2 ; done"] volumeMounts: - name: volume1 mountPath: /mnt/volume1 restartPolicy: Always volumes: - name: volume1 persistentVolumeClaim: claimName: example-pvc
Example of storage class & PVC with shared storage via nfs. NFS service is enabled in Center & any IP available, VM added to VLAN and can ping NFS IP service from vSAN:
--- kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: nfs-sc annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: csi.vsphere.vmware.com parameters: csi.storage.k8s.io/fstype: "nfs4" storagepolicyname: "ftt-1-vsan-eq" #Optional Parameter --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nfs-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 5Gi storageClassName: nfs-sc --- apiVersion: v1 kind: Pod metadata: name: pod-with-nfs-pvc1 spec: containers: - name: test-container-nfs image: gcr.io/google_containers/busybox:1.24 command: ["/bin/sh", "-c", "echo 'hello from appl' >> /mnt/volume1/index.html && while true ; do sleep 2 ; done"] volumeMounts: - name: file-volume mountPath: /mnt/volume1 restartPolicy: Always volumes: - name: file-volume persistentVolumeClaim: claimName: nfs-pvc --- apiVersion: v1 kind: Pod metadata: name: pod-with-nfs-pvc2 spec: containers: - name: test-container-nfs image: gcr.io/google_containers/busybox:1.24 command: ["/bin/sh", "-c", "echo 'hello from appl' >> /mnt/volume1/index.html && while true ; do sleep 2 ; done"] volumeMounts: - name: file-volume mountPath: /mnt/volume1 restartPolicy: Always volumes: - name: file-volume persistentVolumeClaim: claimName: nfs-pvc
Environment:
csi-vsphere version: v3.0.2 Kubernetes version: v1.26.9+rke2r1 vSphere version: 7.0.3 OS (e.g. from /etc/os-release): Ubuntu 22.04 Kernel (e.g. uname -a): 5.15.0-86-generic Install tools: NA Others: NA CNI: Cilium
I observed that your controller pods are in pending state, that is eventually causing node pods to go in CrashLoopBackOff state.
Please check why your controller pods are in pending state. You can paste the output of kubectl describe pod vsphere-csi-controller-5d88977cdf-6bb7w -n vmware-system-csi
. Check if some affinity, anti-affinity rules on pod are causing this issue.
@vdkotkar I'm using a freshly installed rancher rke2 k8s cluster; almost nothing is configured on the cluster. I have k8s in DualStack, Cilium CNI, and MetalLB load balancer with BGP & IPv4+IPv6, and installed vsphere csi.
kubectl describe pod vsphere-csi-controller-5d88977cdf-6bb7w -n vmware-system-csi
Name: vsphere-csi-controller-5d88977cdf-6bb7w
Namespace: vmware-system-csi
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: vsphere-csi-controller
Node: <none>
Labels: app=vsphere-csi-controller
pod-template-hash=5d88977cdf
role=vsphere-csi
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/vsphere-csi-controller-5d88977cdf
Containers:
csi-attacher:
Image: k8s.gcr.io/sig-storage/csi-attacher:v4.2.0
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--csi-address=$(ADDRESS)
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
--kube-api-qps=100
--kube-api-burst=100
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xmcst (ro)
csi-resizer:
Image: k8s.gcr.io/sig-storage/csi-resizer:v1.7.0
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--handle-volume-inuse-error=false
--csi-address=$(ADDRESS)
--kube-api-qps=100
--kube-api-burst=100
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xmcst (ro)
vsphere-csi-controller:
Image: gcr.io/cloud-provider-vsphere/csi/release/driver:v3.0.2
Ports: 9808/TCP, 2112/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--fss-name=internal-feature-states.csi.vsphere.vmware.com
--fss-namespace=$(CSI_NAMESPACE)
Liveness: http-get http://:healthz/healthz delay=30s timeout=10s period=180s #success=1 #failure=3
Environment:
CSI_ENDPOINT: unix:///csi/csi.sock
X_CSI_MODE: controller
X_CSI_SPEC_DISABLE_LEN_CHECK: true
X_CSI_SERIAL_VOL_ACCESS_TIMEOUT: 3m
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
LOGGER_LEVEL: PRODUCTION
INCLUSTER_CLIENT_QPS: 100
INCLUSTER_CLIENT_BURST: 100
CSI_NAMESPACE: vmware-system-csi (v1:metadata.namespace)
Mounts:
/csi from socket-dir (rw)
/etc/cloud from vsphere-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xmcst (ro)
liveness-probe:
Image: k8s.gcr.io/sig-storage/livenessprobe:v2.9.0
Port: <none>
Host Port: <none>
Args:
--v=4
--csi-address=/csi/csi.sock
Environment: <none>
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xmcst (ro)
vsphere-syncer:
Image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.0.2
Port: 2113/TCP
Host Port: 0/TCP
Args:
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
--fss-name=internal-feature-states.csi.vsphere.vmware.com
--fss-namespace=$(CSI_NAMESPACE)
Environment:
FULL_SYNC_INTERVAL_MINUTES: 30
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
LOGGER_LEVEL: PRODUCTION
INCLUSTER_CLIENT_QPS: 100
INCLUSTER_CLIENT_BURST: 100
GODEBUG: x509sha1=1
CSI_NAMESPACE: vmware-system-csi (v1:metadata.namespace)
Mounts:
/etc/cloud from vsphere-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xmcst (ro)
csi-provisioner:
Image: k8s.gcr.io/sig-storage/csi-provisioner:v3.4.0
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--csi-address=$(ADDRESS)
--kube-api-qps=100
--kube-api-burst=100
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
--default-fstype=ext4
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xmcst (ro)
csi-snapshotter:
Image: k8s.gcr.io/sig-storage/csi-snapshotter:v6.2.1
Port: <none>
Host Port: <none>
Args:
--v=4
--kube-api-qps=100
--kube-api-burst=100
--timeout=300s
--csi-address=$(ADDRESS)
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xmcst (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
vsphere-config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: vsphere-config-secret
Optional: false
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-xmcst:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/control-plane=
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m45s (x133 over 11h) default-scheduler 0/6 nodes are available: 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
I can see the issue 0/6 nodes are available: 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
But I do not understand why it's happening and how to fix this. Would you be so kind as to help me in this direction?
kubectl describe nodes | egrep "Taints:|Name:"
Name: so-m001
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: so-m002
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: so-m003
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: so-w001
Taints: <none>
Name: so-w002
Taints: <none>
Name: so-w003
Taints: <none>
kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
so-m001 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m001,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2
so-m002 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m002,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2
so-m003 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m003,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2
so-w001 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w001,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2
so-w002 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w002,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2
so-w003 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w003,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2
I also tested with csi v3.1.0 and got the same result.
Before installing csi I have tained master nodes for the pod toleration:
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.1.0/manifests/vanilla/namespace.yaml
mkdir -p /root/vsan
cat <<'EOXF' > /root/vsan/csi-vsphere.conf
[Global]
cluster-id = "my-k8s-vmw"
cluster-distribution = "native"
[VirtualCenter "172.16.1.1"]
insecure-flag = "true"
user = "k8s-vsphere-csi@local"
password = "password"
port = "443"
datacenters = "dc01"
EOXF
kubectl create secret generic vsphere-config-secret --from-file=/root/vsan/csi-vsphere.conf --namespace=vmware-system-csi
kubectl taint nodes so-m001 node-role.kubernetes.io/control-plane=:NoSchedule
kubectl taint nodes so-m002 node-role.kubernetes.io/control-plane=:NoSchedule
kubectl taint nodes so-m003 node-role.kubernetes.io/control-plane=:NoSchedule
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.1.0/manifests/vanilla/vsphere-csi-driver.yaml
kubectl get pods,sc,pvc,pv -n vmware-system-csi
NAME READY STATUS RESTARTS AGE
pod/vsphere-csi-controller-699f9799f8-7pq89 0/7 Pending 0 4m25s
pod/vsphere-csi-controller-699f9799f8-9vbcd 0/7 Pending 0 4m25s
pod/vsphere-csi-controller-699f9799f8-pdcc2 0/7 Pending 0 4m25s
pod/vsphere-csi-node-flwvd 2/3 CrashLoopBackOff 5 (68s ago) 4m25s
pod/vsphere-csi-node-g6g5b 2/3 CrashLoopBackOff 5 (63s ago) 4m25s
pod/vsphere-csi-node-hmgzf 2/3 CrashLoopBackOff 5 (76s ago) 4m25s
pod/vsphere-csi-node-wt7sp 2/3 CrashLoopBackOff 5 (58s ago) 4m25s
pod/vsphere-csi-node-wtgwj 2/3 CrashLoopBackOff 5 (71s ago) 4m25s
pod/vsphere-csi-node-zvdl7 2/3 CrashLoopBackOff 5 (80s ago) 4m25s
kubectl describe pod vsphere-csi-controller-699f9799f8-7pq89 -n vmware-system-csi
Name: vsphere-csi-controller-699f9799f8-7pq89
Namespace: vmware-system-csi
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: vsphere-csi-controller
Node: <none>
Labels: app=vsphere-csi-controller
pod-template-hash=699f9799f8
role=vsphere-csi
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/vsphere-csi-controller-699f9799f8
Containers:
csi-attacher:
Image: registry.k8s.io/sig-storage/csi-attacher:v4.3.0
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--csi-address=$(ADDRESS)
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
--kube-api-qps=100
--kube-api-burst=100
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro)
csi-resizer:
Image: registry.k8s.io/sig-storage/csi-resizer:v1.8.0
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--handle-volume-inuse-error=false
--csi-address=$(ADDRESS)
--kube-api-qps=100
--kube-api-burst=100
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro)
vsphere-csi-controller:
Image: gcr.io/cloud-provider-vsphere/csi/release/driver:v3.1.0
Ports: 9808/TCP, 2112/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--fss-name=internal-feature-states.csi.vsphere.vmware.com
--fss-namespace=$(CSI_NAMESPACE)
Liveness: http-get http://:healthz/healthz delay=30s timeout=10s period=180s #success=1 #failure=3
Environment:
CSI_ENDPOINT: unix:///csi/csi.sock
X_CSI_MODE: controller
X_CSI_SPEC_DISABLE_LEN_CHECK: true
X_CSI_SERIAL_VOL_ACCESS_TIMEOUT: 3m
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
LOGGER_LEVEL: PRODUCTION
INCLUSTER_CLIENT_QPS: 100
INCLUSTER_CLIENT_BURST: 100
CSI_NAMESPACE: vmware-system-csi (v1:metadata.namespace)
Mounts:
/csi from socket-dir (rw)
/etc/cloud from vsphere-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro)
liveness-probe:
Image: registry.k8s.io/sig-storage/livenessprobe:v2.10.0
Port: <none>
Host Port: <none>
Args:
--v=4
--csi-address=/csi/csi.sock
Environment: <none>
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro)
vsphere-syncer:
Image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.1.0
Port: 2113/TCP
Host Port: 0/TCP
Args:
--leader-election
--leader-election-lease-duration=30s
--leader-election-renew-deadline=20s
--leader-election-retry-period=10s
--fss-name=internal-feature-states.csi.vsphere.vmware.com
--fss-namespace=$(CSI_NAMESPACE)
Environment:
FULL_SYNC_INTERVAL_MINUTES: 30
VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf
LOGGER_LEVEL: PRODUCTION
INCLUSTER_CLIENT_QPS: 100
INCLUSTER_CLIENT_BURST: 100
GODEBUG: x509sha1=1
CSI_NAMESPACE: vmware-system-csi (v1:metadata.namespace)
Mounts:
/etc/cloud from vsphere-config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro)
csi-provisioner:
Image: registry.k8s.io/sig-storage/csi-provisioner:v3.5.0
Port: <none>
Host Port: <none>
Args:
--v=4
--timeout=300s
--csi-address=$(ADDRESS)
--kube-api-qps=100
--kube-api-burst=100
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
--default-fstype=ext4
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro)
csi-snapshotter:
Image: registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2
Port: <none>
Host Port: <none>
Args:
--v=4
--kube-api-qps=100
--kube-api-burst=100
--timeout=300s
--csi-address=$(ADDRESS)
--leader-election
--leader-election-lease-duration=120s
--leader-election-renew-deadline=60s
--leader-election-retry-period=30s
Environment:
ADDRESS: /csi/csi.sock
Mounts:
/csi from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
vsphere-config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: vsphere-config-secret
Optional: false
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-xm6k4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/control-plane=
Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5m3s default-scheduler 0/6 nodes are available: 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
kubectl describe nodes | egrep "Taints:|Name:"
Name: so-m001
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: so-m002
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: so-m003
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Name: so-w001
Taints: <none>
Name: so-w002
Taints: <none>
Name: so-w003
Taints: <none>
kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
so-m001 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m001,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2
so-m002 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m002,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2
so-m003 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m003,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2
so-w001 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w001,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2
so-w002 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w002,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2
so-w003 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w003,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2
kubectl logs -n vmware-system-csi vsphere-csi-node-flwvd
Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe
I1009 21:08:03.930857 1 main.go:167] Version: v2.8.0
I1009 21:08:03.930911 1 main.go:168] Running node-driver-registrar in mode=registration
I1009 21:08:03.931414 1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock"
I1009 21:08:03.931470 1 connection.go:164] Connecting to unix:///csi/csi.sock
I1009 21:08:03.932081 1 main.go:199] Calling CSI driver to discover driver name
I1009 21:08:03.932093 1 connection.go:193] GRPC call: /csi.v1.Identity/GetPluginInfo
I1009 21:08:03.932096 1 connection.go:194] GRPC request: {}
I1009 21:08:03.933865 1 connection.go:200] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.1.0"}
I1009 21:08:03.933873 1 connection.go:201] GRPC error: <nil>
I1009 21:08:03.933879 1 main.go:209] CSI driver name: "csi.vsphere.vmware.com"
I1009 21:08:03.933919 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock
I1009 21:08:03.934026 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock
I1009 21:08:03.934616 1 node_register.go:92] Skipping HTTP server because endpoint is set to: ""
I1009 21:08:05.325659 1 main.go:102] Received GetInfo call: &InfoRequest{}
I1009 21:08:05.326700 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration"
I1009 21:08:05.344075 1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "so-m003". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",}
E1009 21:08:05.344095 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "so-m003". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
kubectl logs -n vmware-system-csi vsphere-csi-controller-699f9799f8-7pq89
Defaulted container "csi-attacher" out of: csi-attacher, csi-resizer, vsphere-csi-controller, liveness-probe, vsphere-syncer, csi-provisioner, csi-snapshotter
And in the manifest file
https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/v3.1.0/manifests/vanilla/vsphere-csi-driver.yaml
I see only these podAntiAffinity that are supposed to prevent multiple pods and run only one instance per node. The tolerations are supposed to let pods run on control-plane master nodes:
spec:
priorityClassName: system-cluster-critical # Guarantees scheduling for critical system pods
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- vsphere-csi-controller
topologyKey: "kubernetes.io/hostname"
serviceAccountName: vsphere-csi-controller
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
How do I fix Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
and FailedScheduling 5m3s default-scheduler 0/6 nodes are available: 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
?
I also tested with csi v3.1.0 and got the same result.
Before installing csi I have tained master nodes for the pod toleration:
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.1.0/manifests/vanilla/namespace.yaml mkdir -p /root/vsan cat <<'EOXF' > /root/vsan/csi-vsphere.conf [Global] cluster-id = "my-k8s-vmw" cluster-distribution = "native" [VirtualCenter "172.16.1.1"] insecure-flag = "true" user = "k8s-vsphere-csi@local" password = "password" port = "443" datacenters = "dc01" EOXF kubectl create secret generic vsphere-config-secret --from-file=/root/vsan/csi-vsphere.conf --namespace=vmware-system-csi kubectl taint nodes so-m001 node-role.kubernetes.io/control-plane=:NoSchedule kubectl taint nodes so-m002 node-role.kubernetes.io/control-plane=:NoSchedule kubectl taint nodes so-m003 node-role.kubernetes.io/control-plane=:NoSchedule kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v3.1.0/manifests/vanilla/vsphere-csi-driver.yaml
controller pod: didn't match Pod's node affinity/selector
kubectl get pods,sc,pvc,pv -n vmware-system-csi NAME READY STATUS RESTARTS AGE pod/vsphere-csi-controller-699f9799f8-7pq89 0/7 Pending 0 4m25s pod/vsphere-csi-controller-699f9799f8-9vbcd 0/7 Pending 0 4m25s pod/vsphere-csi-controller-699f9799f8-pdcc2 0/7 Pending 0 4m25s pod/vsphere-csi-node-flwvd 2/3 CrashLoopBackOff 5 (68s ago) 4m25s pod/vsphere-csi-node-g6g5b 2/3 CrashLoopBackOff 5 (63s ago) 4m25s pod/vsphere-csi-node-hmgzf 2/3 CrashLoopBackOff 5 (76s ago) 4m25s pod/vsphere-csi-node-wt7sp 2/3 CrashLoopBackOff 5 (58s ago) 4m25s pod/vsphere-csi-node-wtgwj 2/3 CrashLoopBackOff 5 (71s ago) 4m25s pod/vsphere-csi-node-zvdl7 2/3 CrashLoopBackOff 5 (80s ago) 4m25s kubectl describe pod vsphere-csi-controller-699f9799f8-7pq89 -n vmware-system-csi Name: vsphere-csi-controller-699f9799f8-7pq89 Namespace: vmware-system-csi Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: vsphere-csi-controller Node: <none> Labels: app=vsphere-csi-controller pod-template-hash=699f9799f8 role=vsphere-csi Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/vsphere-csi-controller-699f9799f8 Containers: csi-attacher: Image: registry.k8s.io/sig-storage/csi-attacher:v4.3.0 Port: <none> Host Port: <none> Args: --v=4 --timeout=300s --csi-address=$(ADDRESS) --leader-election --leader-election-lease-duration=120s --leader-election-renew-deadline=60s --leader-election-retry-period=30s --kube-api-qps=100 --kube-api-burst=100 Environment: ADDRESS: /csi/csi.sock Mounts: /csi from socket-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro) csi-resizer: Image: registry.k8s.io/sig-storage/csi-resizer:v1.8.0 Port: <none> Host Port: <none> Args: --v=4 --timeout=300s --handle-volume-inuse-error=false --csi-address=$(ADDRESS) --kube-api-qps=100 --kube-api-burst=100 --leader-election --leader-election-lease-duration=120s --leader-election-renew-deadline=60s --leader-election-retry-period=30s Environment: ADDRESS: /csi/csi.sock Mounts: /csi from socket-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro) vsphere-csi-controller: Image: gcr.io/cloud-provider-vsphere/csi/release/driver:v3.1.0 Ports: 9808/TCP, 2112/TCP Host Ports: 0/TCP, 0/TCP Args: --fss-name=internal-feature-states.csi.vsphere.vmware.com --fss-namespace=$(CSI_NAMESPACE) Liveness: http-get http://:healthz/healthz delay=30s timeout=10s period=180s #success=1 #failure=3 Environment: CSI_ENDPOINT: unix:///csi/csi.sock X_CSI_MODE: controller X_CSI_SPEC_DISABLE_LEN_CHECK: true X_CSI_SERIAL_VOL_ACCESS_TIMEOUT: 3m VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf LOGGER_LEVEL: PRODUCTION INCLUSTER_CLIENT_QPS: 100 INCLUSTER_CLIENT_BURST: 100 CSI_NAMESPACE: vmware-system-csi (v1:metadata.namespace) Mounts: /csi from socket-dir (rw) /etc/cloud from vsphere-config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro) liveness-probe: Image: registry.k8s.io/sig-storage/livenessprobe:v2.10.0 Port: <none> Host Port: <none> Args: --v=4 --csi-address=/csi/csi.sock Environment: <none> Mounts: /csi from socket-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro) vsphere-syncer: Image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v3.1.0 Port: 2113/TCP Host Port: 0/TCP Args: --leader-election --leader-election-lease-duration=30s --leader-election-renew-deadline=20s --leader-election-retry-period=10s --fss-name=internal-feature-states.csi.vsphere.vmware.com --fss-namespace=$(CSI_NAMESPACE) Environment: FULL_SYNC_INTERVAL_MINUTES: 30 VSPHERE_CSI_CONFIG: /etc/cloud/csi-vsphere.conf LOGGER_LEVEL: PRODUCTION INCLUSTER_CLIENT_QPS: 100 INCLUSTER_CLIENT_BURST: 100 GODEBUG: x509sha1=1 CSI_NAMESPACE: vmware-system-csi (v1:metadata.namespace) Mounts: /etc/cloud from vsphere-config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro) csi-provisioner: Image: registry.k8s.io/sig-storage/csi-provisioner:v3.5.0 Port: <none> Host Port: <none> Args: --v=4 --timeout=300s --csi-address=$(ADDRESS) --kube-api-qps=100 --kube-api-burst=100 --leader-election --leader-election-lease-duration=120s --leader-election-renew-deadline=60s --leader-election-retry-period=30s --default-fstype=ext4 Environment: ADDRESS: /csi/csi.sock Mounts: /csi from socket-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro) csi-snapshotter: Image: registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2 Port: <none> Host Port: <none> Args: --v=4 --kube-api-qps=100 --kube-api-burst=100 --timeout=300s --csi-address=$(ADDRESS) --leader-election --leader-election-lease-duration=120s --leader-election-renew-deadline=60s --leader-election-retry-period=30s Environment: ADDRESS: /csi/csi.sock Mounts: /csi from socket-dir (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xm6k4 (ro) Conditions: Type Status PodScheduled False Volumes: vsphere-config-volume: Type: Secret (a volume populated by a Secret) SecretName: vsphere-config-secret Optional: false socket-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> kube-api-access-xm6k4: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/control-plane= Tolerations: node-role.kubernetes.io/control-plane:NoSchedule op=Exists node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 5m3s default-scheduler 0/6 nodes are available: 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
Labels & Taints I have on the cluster
kubectl describe nodes | egrep "Taints:|Name:" Name: so-m001 Taints: node-role.kubernetes.io/control-plane:NoSchedule Name: so-m002 Taints: node-role.kubernetes.io/control-plane:NoSchedule Name: so-m003 Taints: node-role.kubernetes.io/control-plane:NoSchedule Name: so-w001 Taints: <none> Name: so-w002 Taints: <none> Name: so-w003 Taints: <none> kubectl get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS so-m001 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m001,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2 so-m002 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m002,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2 so-m003 Ready control-plane,etcd,master 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-m003,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/etcd=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=rke2 so-w001 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w001,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2 so-w002 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w002,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2 so-w003 Ready <none> 13d v1.26.9+rke2r1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=rke2,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=so-w003,kubernetes.io/os=linux,node.kubernetes.io/instance-type=rke2
failed to get CsiNodeTopology from node
kubectl logs -n vmware-system-csi vsphere-csi-node-flwvd Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe I1009 21:08:03.930857 1 main.go:167] Version: v2.8.0 I1009 21:08:03.930911 1 main.go:168] Running node-driver-registrar in mode=registration I1009 21:08:03.931414 1 main.go:192] Attempting to open a gRPC connection with: "/csi/csi.sock" I1009 21:08:03.931470 1 connection.go:164] Connecting to unix:///csi/csi.sock I1009 21:08:03.932081 1 main.go:199] Calling CSI driver to discover driver name I1009 21:08:03.932093 1 connection.go:193] GRPC call: /csi.v1.Identity/GetPluginInfo I1009 21:08:03.932096 1 connection.go:194] GRPC request: {} I1009 21:08:03.933865 1 connection.go:200] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v3.1.0"} I1009 21:08:03.933873 1 connection.go:201] GRPC error: <nil> I1009 21:08:03.933879 1 main.go:209] CSI driver name: "csi.vsphere.vmware.com" I1009 21:08:03.933919 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock I1009 21:08:03.934026 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock I1009 21:08:03.934616 1 node_register.go:92] Skipping HTTP server because endpoint is set to: "" I1009 21:08:05.325659 1 main.go:102] Received GetInfo call: &InfoRequest{} I1009 21:08:05.326700 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration" I1009 21:08:05.344075 1 main.go:121] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "so-m003". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",} E1009 21:08:05.344095 1 main.go:123] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "so-m003". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
Controller csi-attacher out of...
kubectl logs -n vmware-system-csi vsphere-csi-controller-699f9799f8-7pq89 Defaulted container "csi-attacher" out of: csi-attacher, csi-resizer, vsphere-csi-controller, liveness-probe, vsphere-syncer, csi-provisioner, csi-snapshotter
And in the manifest file https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/v3.1.0/manifests/vanilla/vsphere-csi-driver.yaml
I see only these podAntiAffinity that are supposed to prevent multiple pods and run only one instance per node. The tolerations are supposed to let pods run on control-plane master nodes:
spec: priorityClassName: system-cluster-critical # Guarantees scheduling for critical system pods affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: - vsphere-csi-controller topologyKey: "kubernetes.io/hostname" serviceAccountName: vsphere-csi-controller nodeSelector: node-role.kubernetes.io/control-plane: "" tolerations: - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule
How do I fix
Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container.
andFailedScheduling 5m3s default-scheduler 0/6 nodes are available: 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling..
?
@qdrddr After looking at labels on your nodes, I think you need to change nodeSelector in the deployment yaml file. Current nodeSelector is:
nodeSelector:
node-role.kubernetes.io/control-plane: ""
Try to change it as mentioned below and see if it works:
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
As per my observation, on RKE k8s cluster's master node we get label as "node-role.kubernetes.io/control-plane=true", whereas normal on-prem k8s cluster's master node has label "node-role.kubernetes.io/control-plane=".
Please let us know if pod scheduling works after making this change. I will recommend deleting old deployment first and then re-deploy CSI driver after making this change.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
I am encountering the same issue. I'm running kubernetes version 1.29.1 with the --cloud-provider=external
flag and running vsphere-csi version 3.0.0.
I have the same problem as gabrieletosca does. Because I'm setting the cloud provider to external, the nodes are tainted with:
node.cloudprovider.kubernetes.io/uninitialized
The csi-controller are in pending state and won't start exactly because of this taint:
0/5 nodes are available: 5 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true
And the csi-node pods don't start because of the error mentioned in the topic. So how are you supposed to get out of this issue?
I am encountering the same issue. I'm running kubernetes version 1.29.1 with the
--cloud-provider=external
flag and running vsphere-csi version 3.0.0. I have the same problem as gabrieletosca does. Because I'm setting the cloud provider to external, the nodes are tainted with:node.cloudprovider.kubernetes.io/uninitialized
The csi-controller are in pending state and won't start exactly because of this taint:
0/5 nodes are available: 5 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true
And the csi-node pods don't start because of the error mentioned in the topic. So how are you supposed to get out of this issue?
I think you need to add following toleration in your CSI driver deployment yaml (for vsphere-csi-controller deployment), so that csi-controller pods will be scheduled on K8s control plane nodes.
- key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
effect: NoSchedule
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/kind bug
What happened: Trying to install vSphere CSI drivers v2.7.0 with RKE2 cluster v1.24.10+rke2r1.
$ cat /etc/rancher/rke2/config.yaml cloud-provider-name: external
_$ cat csi-vsphere.conf [Global] cluster-id = "${CLUSTER_NAME}" cluster-distribution = "Kubernetes"
[VirtualCenter "172.16.16.110"] insecure-flag = "true" user = "user1@vsphere.local" password = "password12345" port = "443" datacenters = "datacenter1"_
_root@urnpk8sm60:~# kubectl --namespace=vmware-system-csi get all NAME READY STATUS RESTARTS AGE pod/vsphere-csi-controller-7589ccbcf8-6w7pw 0/7 Pending 0 3m2s pod/vsphere-csi-controller-7589ccbcf8-phl5c 0/7 Pending 0 3m2s pod/vsphere-csi-controller-7589ccbcf8-wwwfc 0/7 Pending 0 3m2s pod/vsphere-csi-node-6vljg 2/3 CrashLoopBackOff 4 (79s ago) 3m2s pod/vsphere-csi-node-dpnh9 2/3 CrashLoopBackOff 5 (7s ago) 3m2s pod/vsphere-csi-node-jd4wt 2/3 CrashLoopBackOff 4 (78s ago) 3m2s pod/vsphere-csi-node-wtlp7 2/3 CrashLoopBackOff 4 (72s ago) 3m2s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/vsphere-csi-controller ClusterIP 10.43.162.210 2112/TCP,2113/TCP 3m2s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/vsphere-csi-node 4 4 0 4 0 kubernetes.io/os=linux 3m2s daemonset.apps/vsphere-csi-node-windows 0 0 0 0 0 kubernetes.io/os=windows 3m2s
NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/vsphere-csi-controller 0/3 3 0 3m2s
NAME DESIRED CURRENT READY AGE replicaset.apps/vsphere-csi-controller-7589ccbcf8 3 3 0 3m2s root@urnpk8sm60:~#_
root@urnpk8sm60:~# kubectl --namespace=vmware-system-csi logs pod/vsphere-csi-node-wtlp7 Defaulted container "node-driver-registrar" out of: node-driver-registrar, vsphere-csi-node, liveness-probe I0315 11:27:48.852737 1 main.go:166] Version: v2.5.1 I0315 11:27:48.852835 1 main.go:167] Running node-driver-registrar in mode=registration I0315 11:27:48.854993 1 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock" I0315 11:27:48.855119 1 connection.go:154] Connecting to unix:///csi/csi.sock I0315 11:27:48.859495 1 main.go:198] Calling CSI driver to discover driver name I0315 11:27:48.859554 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo I0315 11:27:48.859566 1 connection.go:184] GRPC request: {} I0315 11:27:48.875719 1 connection.go:186] GRPC response: {"name":"csi.vsphere.vmware.com","vendor_version":"v2.7.0"} I0315 11:27:48.876170 1 connection.go:187] GRPC error: <nil> I0315 11:27:48.876774 1 main.go:208] CSI driver name: "csi.vsphere.vmware.com" I0315 11:27:48.877323 1 node_register.go:53] Starting Registration Server at: /registration/csi.vsphere.vmware.com-reg.sock I0315 11:27:48.878695 1 node_register.go:62] Registration Server started at: /registration/csi.vsphere.vmware.com-reg.sock I0315 11:27:48.879412 1 node_register.go:92] Skipping HTTP server because endpoint is set to: "" I0315 11:27:49.996391 1 main.go:102] Received GetInfo call: &InfoRequest{} I0315 11:27:49.998477 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/csi.vsphere.vmware.com/registration" I0315 11:27:50.069958 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "urnpk8sm60". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1",} E0315 11:27:50.070058 1 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = failed to get CsiNodeTopology for the node: "urnpk8sm60". Error: no matches for kind "CSINodeTopology" in version "cns.vmware.com/v1alpha1", restarting registration container. root@urnpk8sm60:~#
After changing improved-volume-topology: 'true' to false in vsphere-csi-driver.yaml, pod/vsphere-csi-node are running but pod/vsphere-csi-controller are still in Pending state due to node affinity/selector.
Warning FailedScheduling 26s default-scheduler 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
What you expected to happen: Same steps are working fine with vanilla Kubernetes but not working with RKE2.
Environment:
uname -a
): 5.15.0-60-generic