CPI node discovery is finding VMs outside of the folder where the cluster is installed on

parogui commented 2 years ago

What happened?

After installing CSI driver, we noticed CPI is discovering nodes out of the cluster (OpenShift in this case) VMs folder.

If the nodes are manually deleted, they are re-discovered.

The only way we were able to workaround the issue is by reducing the permissions on the user used for vSphere so it only has access to its cluster, making us require a per-cluster vSphere user with permissions to their own folder.

What did you expect to happen?

We'd expect the CPI discovery to hone the folder configuration on the vSphere CSI driver configmap to prevent discovering nodes that are not supposed to be present in node discovery.

How can we reproduce it (as minimally and precisely as possible)?

Install vSphere CSI driver on OpenShift 4.x on a vSphere cluster where more than one vSphere cluster co-exists on different folders.

Anything else we need to know (please consider providing level 4 or above logs of CPI)?

Every 5 minutes node discovery is run and adds the nodes that were removed because they do not belong to the cluster.

cat 0010-vsphere-cloud-controller-manager-9d29b-vsphere-cloud-controller-manager.log |grep -e 423b0811-84d1-75ec-e2ed-8f494ddb7a9d -e 423b319a-2819-4079-946d-bf1ce6071b8a -e 423b71a3-eee9-5415-dd15-1f35165b27fa -e 423baae3-aa32-e8d0-7d80-d4f5d2b08102 -e 423bed19-d21c-6e90-dba8-507b459b7bd8 -e 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9

I0611 19:00:18.808450       1 search.go:76] WhichVCandDCByNodeID nodeID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:00:18.819997       1 search.go:208] Found node 423bed19-d21c-6e90-dba8-507b459b7bd8 as vm=VirtualMachine:vm-7020 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:18.820022       1 search.go:210] Hostname: master1, UUID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:00:18.823439       1 nodemanager.go:389] Found node 423bed19-d21c-6e90-dba8-507b459b7bd8 as vm=VirtualMachine:vm-7020 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:18.823451       1 nodemanager.go:391] Hostname: master1 UUID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:00:18.875799       1 search.go:76] WhichVCandDCByNodeID nodeID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:00:18.887292       1 search.go:208] Found node 423baae3-aa32-e8d0-7d80-d4f5d2b08102 as vm=VirtualMachine:vm-7024 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:18.887314       1 search.go:210] Hostname: worker2, UUID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:00:18.890575       1 nodemanager.go:389] Found node 423baae3-aa32-e8d0-7d80-d4f5d2b08102 as vm=VirtualMachine:vm-7024 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:18.890596       1 nodemanager.go:391] Hostname: worker2 UUID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:00:19.832625       1 search.go:76] WhichVCandDCByNodeID nodeID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:00:19.846000       1 search.go:208] Found node 423baae3-aa32-e8d0-7d80-d4f5d2b08102 as vm=VirtualMachine:vm-7024 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.846026       1 search.go:210] Hostname: worker2, UUID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:00:19.849188       1 nodemanager.go:389] Found node 423baae3-aa32-e8d0-7d80-d4f5d2b08102 as vm=VirtualMachine:vm-7024 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.849198       1 nodemanager.go:391] Hostname: worker2 UUID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:00:19.849208       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:00:19.859539       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:00:19.872931       1 search.go:208] Found node 423b71a3-eee9-5415-dd15-1f35165b27fa as vm=VirtualMachine:vm-5019 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.872956       1 search.go:210] Hostname: master0, UUID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:00:19.876226       1 nodemanager.go:389] Found node 423b71a3-eee9-5415-dd15-1f35165b27fa as vm=VirtualMachine:vm-5019 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.876236       1 nodemanager.go:391] Hostname: master0 UUID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:00:19.876246       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:00:19.889492       1 search.go:76] WhichVCandDCByNodeID nodeID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:00:19.904458       1 search.go:208] Found node 423bed19-d21c-6e90-dba8-507b459b7bd8 as vm=VirtualMachine:vm-7020 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.904489       1 search.go:210] Hostname: master1, UUID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:00:19.910062       1 nodemanager.go:389] Found node 423bed19-d21c-6e90-dba8-507b459b7bd8 as vm=VirtualMachine:vm-7020 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.910083       1 nodemanager.go:391] Hostname: master1 UUID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:00:19.910100       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:00:19.920592       1 search.go:76] WhichVCandDCByNodeID nodeID: 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9
I0611 19:00:19.933892       1 search.go:208] Found node 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9 as vm=VirtualMachine:vm-5035 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.933986       1 search.go:210] Hostname: master2, UUID: 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9
I0611 19:00:19.938195       1 nodemanager.go:389] Found node 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9 as vm=VirtualMachine:vm-5035 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.938207       1 nodemanager.go:391] Hostname: master2 UUID: 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9
I0611 19:00:19.938220       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9
I0611 19:00:19.953053       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:00:19.977752       1 search.go:208] Found node 423b319a-2819-4079-946d-bf1ce6071b8a as vm=VirtualMachine:vm-5007 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.977786       1 search.go:210] Hostname: worker0, UUID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:00:19.981462       1 nodemanager.go:389] Found node 423b319a-2819-4079-946d-bf1ce6071b8a as vm=VirtualMachine:vm-5007 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:19.981473       1 nodemanager.go:391] Hostname: worker0 UUID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:00:19.981490       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:00:20.003660       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:00:20.020121       1 search.go:208] Found node 423b0811-84d1-75ec-e2ed-8f494ddb7a9d as vm=VirtualMachine:vm-5037 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:20.020153       1 search.go:210] Hostname: worker1, UUID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:00:20.023862       1 nodemanager.go:389] Found node 423b0811-84d1-75ec-e2ed-8f494ddb7a9d as vm=VirtualMachine:vm-5037 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:00:20.023886       1 nodemanager.go:391] Hostname: worker1 UUID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:00:20.023919       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:05:20.042858       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:05:20.073291       1 search.go:208] Found node 423b71a3-eee9-5415-dd15-1f35165b27fa as vm=VirtualMachine:vm-5019 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.073382       1 search.go:210] Hostname: master0, UUID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:05:20.077349       1 nodemanager.go:389] Found node 423b71a3-eee9-5415-dd15-1f35165b27fa as vm=VirtualMachine:vm-5019 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.077380       1 nodemanager.go:391] Hostname: master0 UUID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:05:20.077403       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:05:20.092656       1 search.go:76] WhichVCandDCByNodeID nodeID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:05:20.117389       1 search.go:208] Found node 423bed19-d21c-6e90-dba8-507b459b7bd8 as vm=VirtualMachine:vm-7020 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.117482       1 search.go:210] Hostname: master1, UUID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:05:20.122036       1 nodemanager.go:389] Found node 423bed19-d21c-6e90-dba8-507b459b7bd8 as vm=VirtualMachine:vm-7020 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.122049       1 nodemanager.go:391] Hostname: master1 UUID: 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:05:20.122060       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423bed19-d21c-6e90-dba8-507b459b7bd8
I0611 19:05:20.137627       1 search.go:76] WhichVCandDCByNodeID nodeID: 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9
I0611 19:05:20.157130       1 search.go:208] Found node 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9 as vm=VirtualMachine:vm-5035 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.157240       1 search.go:210] Hostname: master2, UUID: 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9
I0611 19:05:20.166576       1 nodemanager.go:389] Found node 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9 as vm=VirtualMachine:vm-5035 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.166590       1 nodemanager.go:391] Hostname: master2 UUID: 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9
I0611 19:05:20.166611       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423bf1e5-de7d-e96b-e5fa-a731a5a5a2b9
I0611 19:05:20.179719       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:05:20.196984       1 search.go:208] Found node 423b319a-2819-4079-946d-bf1ce6071b8a as vm=VirtualMachine:vm-5007 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.197016       1 search.go:210] Hostname: worker0, UUID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:05:20.200908       1 nodemanager.go:389] Found node 423b319a-2819-4079-946d-bf1ce6071b8a as vm=VirtualMachine:vm-5007 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.200919       1 nodemanager.go:391] Hostname: worker0 UUID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:05:20.200930       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:05:20.222748       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:05:20.244698       1 search.go:208] Found node 423b0811-84d1-75ec-e2ed-8f494ddb7a9d as vm=VirtualMachine:vm-5037 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.244723       1 search.go:210] Hostname: worker1, UUID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:05:20.248446       1 nodemanager.go:389] Found node 423b0811-84d1-75ec-e2ed-8f494ddb7a9d as vm=VirtualMachine:vm-5037 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.248458       1 nodemanager.go:391] Hostname: worker1 UUID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:05:20.248476       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:05:20.262749       1 search.go:76] WhichVCandDCByNodeID nodeID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:05:20.284078       1 search.go:208] Found node 423baae3-aa32-e8d0-7d80-d4f5d2b08102 as vm=VirtualMachine:vm-7024 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.284105       1 search.go:210] Hostname: worker2, UUID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:05:20.288665       1 nodemanager.go:389] Found node 423baae3-aa32-e8d0-7d80-d4f5d2b08102 as vm=VirtualMachine:vm-7024 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:05:20.288677       1 nodemanager.go:391] Hostname: worker2 UUID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:05:20.288689       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:10:20.310348       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:10:20.336151       1 search.go:208] Found node 423b319a-2819-4079-946d-bf1ce6071b8a as vm=VirtualMachine:vm-5007 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:10:20.336178       1 search.go:210] Hostname: worker0, UUID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:10:20.339567       1 nodemanager.go:389] Found node 423b319a-2819-4079-946d-bf1ce6071b8a as vm=VirtualMachine:vm-5007 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:10:20.339713       1 nodemanager.go:391] Hostname: worker0 UUID: 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:10:20.339751       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b319a-2819-4079-946d-bf1ce6071b8a
I0611 19:10:20.352476       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:10:20.374425       1 search.go:208] Found node 423b0811-84d1-75ec-e2ed-8f494ddb7a9d as vm=VirtualMachine:vm-5037 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:10:20.374453       1 search.go:210] Hostname: worker1, UUID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:10:20.377782       1 nodemanager.go:389] Found node 423b0811-84d1-75ec-e2ed-8f494ddb7a9d as vm=VirtualMachine:vm-5037 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:10:20.377791       1 nodemanager.go:391] Hostname: worker1 UUID: 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:10:20.377801       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b0811-84d1-75ec-e2ed-8f494ddb7a9d
I0611 19:10:20.388541       1 search.go:76] WhichVCandDCByNodeID nodeID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:10:20.410018       1 search.go:208] Found node 423baae3-aa32-e8d0-7d80-d4f5d2b08102 as vm=VirtualMachine:vm-7024 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:10:20.410043       1 search.go:210] Hostname: worker2, UUID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:10:20.414721       1 nodemanager.go:389] Found node 423baae3-aa32-e8d0-7d80-d4f5d2b08102 as vm=VirtualMachine:vm-7024 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:10:20.414734       1 nodemanager.go:391] Hostname: worker2 UUID: 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:10:20.414792       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423baae3-aa32-e8d0-7d80-d4f5d2b08102
I0611 19:10:20.437453       1 search.go:76] WhichVCandDCByNodeID nodeID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:10:20.454998       1 search.go:208] Found node 423b71a3-eee9-5415-dd15-1f35165b27fa as vm=VirtualMachine:vm-5019 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:10:20.455083       1 search.go:210] Hostname: master0, UUID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:10:20.460850       1 nodemanager.go:389] Found node 423b71a3-eee9-5415-dd15-1f35165b27fa as vm=VirtualMachine:vm-5019 in vc=vcenter.my.url and datacenter=dc-openshift_tst
I0611 19:10:20.460860       1 nodemanager.go:391] Hostname: master0 UUID: 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:10:20.460870       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 423b71a3-eee9-5415-dd15-1f35165b27fa
I0611 19:10:20.471020       1 search.go:76] WhichVCandDCByNodeID nodeID: 423bed19-d21c-6e90-dba8-507b459b7bd8

Kubernetes version

``` $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.35 True False 59m Cluster version is 4.8.35 ``` OpenShift 4.8 runs Kubernetes 1.21 vSphere CSI driver "Version : v2.5.1"

Cloud provider or hardware configuration

vSphere platform configured on vSphere CSI driver

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Kernel (e.g. `uname -a`)

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Others

lubronzhan commented 1 year ago

Thanks for opening up. About this issue, when initializing the node, the Cloud-provider-vsphere will first try find the providerID of each node by searching with node name. Related code. Then the Cloud provider vSphere will try find VM by searching by DNS Name. And only return the first result from the list

I noticed the node name you use in the cluster are form like master0 master1, I wonder are they within the same vSphere, if that's the case, then it could be a case that CPI find different node with same (node name)/(dns name) in the same vsphere.

In CAPV we use distinct node name for each cluster node, each node also has distinct DNS name, so each cluster won't collide with same node name. I wonder how could different VM has same name in Openshift cluster in vSphere? Or they are using different VM name but same DNS name? Could you share those VM information? Does nodes in different clusters has same DNS name? You are using single VC right?

The log is missing some of the initialization parts, so I can't verify that the duplicate DNS Name is the root cause. Could you create a new cluster and install CPI with log level setting to 5, then return the log to me？ I was expecting to see log in below code

https://github.com/kubernetes/cloud-provider-vsphere/blob/master/pkg/cloudprovider/vsphere/instances.go#L98 https://github.com/kubernetes/cloud-provider/blob/master/controllers/node/node_controller.go#L415

So we could see during the initialization, CPI is returning the wrong provider ID for corresponding node name or not.

parogui commented 1 year ago

Hi, @lubronzhan thanks for your update.

Our folder structure in VC looks like this (e.g. for the clusters int-ocp and tst-ocp; we have 11 clusters in total):


/dc-openshift_tst/vm/int-ocp/master0_int-ocp
/dc-openshift_tst/vm/int-ocp/master1_int-ocp
/dc-openshift_tst/vm/int-ocp/master2_int-ocp
/dc-openshift_tst/vm/int-ocp/worker0_int-ocp
/dc-openshift_tst/vm/int-ocp/worker1_int-ocp
...
/dc-openshift_tst/vm/int-ocp/worker(n)_int-ocp

/dc-openshift_tst/vm/tst-ocp/master0_tst-ocp
/dc-openshift_tst/vm/tst-ocp/master1_tst-ocp
/dc-openshift_tst/vm/tst-ocp/master2_tst-ocp
/dc-openshift_tst/vm/tst-ocp/worker0_tst-ocp
/dc-openshift_tst/vm/tst-ocp/worker1_tst-ocp
...
/dc-openshift_tst/vm/tst-ocp-acpr/worker(n)_tst-ocp

The nodes of each cluster have the following hostnames:

master0
master1
master2
worker0
worker1
...
worker(n)

The DNS records look like this:

master0.int-ocp.my.org.name
worker0.int-ocp.my.org.name
...
master0.tst-ocp.my.org.name
worker0.tst-ocp.my.org.name

All the nodes are on the same vSphere but in different folders. They may share hostname (master(n), worker(n)...) but not FQDN.

Does it sound like something may causing the reported issue?

lubronzhan commented 1 year ago

Hi @parogui Unfortunately, looks like CPI using the hostname as node identifier at the very begining. Is it possible to change VM to have distinct Hostname in Openshift? If you check kubelet log, probably it's telling it try to register node with this name, so that's why CPI is reconciling using this name

parogui commented 1 year ago

Hi Lubron, thanks for your answer.

Doesn't it look for the full FQDN? It makes sense to simplify the hostname on hosts across clusters to make it easier to maintain infra if the names are recognizable.

lubronzhan commented 1 year ago

Hi @parogui at the beginning, kubelet will register the node, and the in your setup node name is the hostname. CPI can only use this name to find corresponding VM for the first time. CPI couldn't get FQDN from the node that kubelet registered. Unless kubelet use the full FQDN as the node name, otherwise CPI can't get the correct VM. So it's up to kubelet to provide the distinct identification

parogui commented 1 year ago

Hi @lubronzhan. Thanks for the update.

I'm checking the Kubernetes docs and it looks like it is the CPI who requests what is going to be used for the hostname field (--cloud-provider field)

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

Is there any configuration applied from the CPI side to determine the hostname?

Thanks!

lubronzhan commented 1 year ago

Hi @parogui CPI will set the hostname field on the node, and the hostname is get from VM, so it doesn't mean the CPI will set the hostname of the VM, it just fetch the hostName from VM once it locate the exact VM. As you can see here https://github.com/kubernetes/cloud-provider-vsphere/blob/master/pkg/cloudprovider/vsphere/nodemanager.go#L257-L262.

But before it sets the hostname on Node, it needs to locate the VM. You can see CPI use the node name to locate the VM at the initialization before it has the ProviderUUID. We didn't assume node name is Hostname, because we need to support multi-VC deployment, hostname could be same on different IAAS with different domain name, but DNS will be identical.

status:
  addresses:
  - address: tkg-mgmt-vc-2m4tr-9gfp6
    type: Hostname
  - address: 10.180.205.188
    type: InternalIP
  - address: 10.180.205.188
    type: ExternalIP

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/cloud-provider-vsphere/issues/661#issuecomment-1596353354): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes / cloud-provider-vsphere