kubernetes-sigs / cloud-provider-huaweicloud

HUAWEI CLOUD Controller Manager is an external cloud controller manager for running kubernetes in a HUAWEI CLOUD cluster.
Apache License 2.0
41 stars 26 forks source link

Problems with INTERNAL-IP assignment in Kubernetes 1.29+ #247

Open diasbro opened 3 months ago

diasbro commented 3 months ago

What happened: We are experiencing issues deploying Kubernetes clusters version 1.29 and above on Huawei Cloud. Without passing the --node-ip flag to the kubelet service, INTERNAL-IP addresses for nodes are shown as <none>:

root@k8s-master-001:# kubectl get nodes -o wide
NAME             STATUS     ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
k8s-master-001   NotReady   <none>   25s   v1.29.7   <none>        <none>        Ubuntu 22.04.4 LTS   5.15.0-117-generic   containerd://1.7.20

We believe this is due to these changes in the Kubernetes 1.29: https://github.com/kubernetes/kubernetes/pull/121028 Specifically, if the kubelet is started with the --cloud-provider=external flag and --node-ip is not specified, the external cloud-controller-manager should pass the IP address. To solve issue-related problems, a deployment strategy was suggested in a PR's comment, where the external cloud-controller-manager is deployed as a static pod or the --node-ip flag is used: https://github.com/kubernetes/kubernetes/pull/121028#issuecomment-2256834163

We tried the following steps: initialized the cluster using kubeadm, passed --cloud-provider=external (for all controlplane components) and --node-ip=<node_address> to kubelet on master nodes, deployed cni (the ccm falls down trying to get extension-apiserver-authentication configmap otherwise) and huaweicloud-controller-manager version v0.26.8 according to the documentation. The logs show:

I0807 07:22:11.239701       1 leaderelection.go:253] failed to acquire lease kube-system/cloud-controller-manager
I0807 07:22:15.289291       1 request.go:1370] body was not decodable (unable to check for Status): provided data does not appear to be a protobuf message, expected prefix [107 56 115 0]
E0807 07:22:15.289309       1 leaderelection.go:330] error retrieving resource lock kube-system/cloud-controller-manager: the server rejected our request for an unknown reason (get leases.coordination.k8s.io cloud-controller-manager)

Question: Is huaweicloud-controller-manager version v0.26.8 incompatible with Kubernetes versions 1.29 and above, or are we missing something in our setup?

What you expected to happen: The huaweicloud-controller-manager returns no errors and internal IPs become visible on worker nodes.

How to reproduce it (as minimally and precisely as possible):

  1. Initialize a Kubernetes cluster version 1.29+ using kubeadm (we have tried versions 1.30.3 and 1.29.7)
  2. Deploy cni (we used cilium 1.16)
  3. Deploy huaweicloud-controller-manager version v0.26.8

Anything else we need to know?: Same setup works fine with kubernetes versions <1.29 (there are no problems with node IPs and huaweicloud-controller-manager as well)

Environment:

chengxiangdong commented 2 months ago

The current architecture that CCM relies on is not compatible with the 1.29 cluster. It needs to be upgraded to be compatible with version 1.29 and above.