Open amsuggs37 opened 2 months ago
@amsuggs37 are you actually using cloud-provider-azure
when deploying the nodes, or are you initially provisioning the cluster with the RKE2 embedded cloud provider, and then attempting to switch later?
The later will not work, as the node providerID and instance-type are set by the cloud provider that is in use when the node joins, and cannot be changed later. I suspect this is why you still see no nodes in the pool even after changing the excludeMasterFromStandardLB
setting - the nodes do not have the correct fields to be managed by the azure cloud provider.
Check the output of kubectl get node -o yaml
and confirm that the node.kubernetes.io/instance-type
label and providerID
are set correctly on all your nodes.
Hi @brandond, thanks for commenting.
I am fairly certain I am using the cloud-provider-azure
when deploying the nodes.
I followed the documentation here and set the appropriate flags in the /etc/rancher/rke2/config.yaml
file per that documentation.
That config is in the initial issue description, but I do set cloud-provider=external
in the kubelet and kube-controller-manager and also disable-cloud-controller: true
.
My cloud-provider-config.json
is also in the initial description.
From what I understand the helm deployment of the cloud-provider-azure
should have all the correct default values.
Anyway, I checked my nodes as you suggested...
The node.kubernetes.io/instance-type
label is set to: Standard_DS2_v2
for all three of my nodes.
The providerID
is set to: azure:///subscriptions/<my_subscription>/resourceGroups/<my_resourcegroup>/providers/Microsoft.Compute/virtualMachineScaleSets/<my_vmss>/virtualMachines/X
where "X" is the number 0-2 depending on which of the three nodes you are looking at.
I cannot find any documentation on what determines these values as valid, but I can continue to dig. Do you know off the top of your head what determines them as valid? Also, I would expect that if what you are saying is the issue, then I would not be able to add worker nodes to the LoadBalance either. If I create a cluster with a worker nodepool, those nodes are successfully added to the LoadBalancer BackendPool when I create LoadBalancer type services.
Let me know your thoughts, and thanks again for the ideas!
That all sounds correct then!
What happened:
When using the out-of-tree cloud provider azure as documented, I have been unable to get the cloud-controller-manager to add master nodes to the backend pool of a standard load balancer. I followed the notes about
excludeMasterFromStandardLB
by setting to "false" and removed thenode-role.kubernetes.io/master
label from all the master nodes. Even then, the master nodes are not added to the load balancer backend pool.What you expected to happen:
Configuring the cloud provider to not
excludeMasterFromStandardLB
and also removing thenode-role.kubernetes.io/master
label from all nodes should allow the cloud controller manager to add the master nodes to the standard load balancer backend pool.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
The same cloud provider configuration works as expected when the cluster has 1 or more worker nodes (not control-plane/etcd). Nodes with no node-role label successfully are added to the load balancer backend pool.
The following logs are present in the cloud-controller-manager deployment for a LoadBalancer type service called "http-echo". They indicate the load balancer being deleted from the vmss for seemingly no reason.
Environment:
kubectl version
): 1.24.12 (rke2 version: v1.24.12+rke2r1)K8s bootstrap args:
cat /etc/os-release
): rhel 8.9uname -a
): 4.18.0-513.18.1.el8_9.x86_64