kubernetes-sigs / node-ipam-controller

Out of tree implementation of https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2593-multiple-cluster-cidrs
Apache License 2.0
7 stars 5 forks source link

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart #17

Open EanWo opened 1 week ago

EanWo commented 1 week ago

I encountered an issue as follows:

I started the component. I created a new ClusterCIDR object and added a NodeSelector. I added a new node with a label that matches the NodeSelector of the new ClusterCIDR. I observed that the new node was not assigned the new ClusterCIDR object but instead used the default ClusterCIDR. However, if I follow the same steps to create a new ClusterCIDR object and then restart the component, the subsequent new nodes correctly use the new ClusterCIDR.

The official documentation states that it is not necessary to restart the component. Additionally, I checked the logs and found that the component is aware of the new ClusterCIDR object even without restarting.

For node addition, I used kubectl delete node to remove it, then went into the node and restarted kubelet.

My issue can be summarized as:

New nodes do not use the newly created ClusterCIDR unless the component is restarted. The component does recognize the new ClusterCIDR without restarting (as seen in the logs). Could you please help me understand why the new ClusterCIDR is not applied to new nodes without restarting the component and how to resolve this issue?

Thank you for your assistance!

aojea commented 1 week ago

@EanWo can you please share the logs of the component?

can you check that you have disabled the default ipam controller in the kube-controlller-manager and/or the cloud-controller-manager?

EanWo commented 1 week ago

@EanWo can you please share the logs of the component?

can you check that you have disabled the default ipam controller in the kube-controlller-manager and/or the cloud-controller-manager?

Yes, I am certain. I have set the 'allocate-node-cidrs' parameter to 'false' for all master 'kube-controller-manager' configurations

EanWo commented 1 week ago

@aojea I created a new ClusterCIDR object called 'clustercidr-name-ean' at step ①, which has a NodeSelector with the specified label 'name=ean', and the logs have recorded this. Then at step ②, I added a node with the label 'name=ean', but the result showed that the assigned PodCIDR belonged to the default ClusterCIDR. At this point, at step ③, I first removed the previously created node using the 'kubectl delete node' command, then re-added it to the cluster by restarting the node with the 'systemctl restart kubelet' command, and then restarted the controller. After that, at step ④, I saw that the node was assigned to the correct 'clustercidr-name-ean' ClusterCIDR. 1 2 3 4

mneverov commented 1 week ago

@EanWo the original issue was for the old repo. Could you please confirm that you are using the latestv0.2.0 node-ipam-controller?

EanWo commented 1 week ago

@EanWo the original issue was for the old repo. Could you please confirm that you are using the latestv0.2.0 node-ipam-controller?

@mneverov I am using the latest code from the main branch. Additionally, I have also tried the code from tag v0.2.0, and the results were the same.

EanWo commented 1 week ago

@mneverov @aojea I tried modifying the code to print some logs and discovered some issues. At step ①, I added a ClusterCIDR object named 'name-ean'. At step ②, I printed the contents of the cidrMap, which showed only the default ClusterCIDR and did not include the newly added 'name-ean' ClusterCIDR. Then at step ③, I restarted the controller, and at this point, the cidrMap showed the newly added 'name-ean' ClusterCIDR. 5 6

mneverov commented 1 week ago

@EanWo thank you for the info! Do you use any CNI by chance? Can you add the new node yaml before you restart the manager? What k8s distribution you use (k3s, kind, minikube, k3d ..)?

EanWo commented 1 week ago

@mneverov I am using a cluster provided by a cloud vendor. I tested three versions of the Kubernetes cluster, with cluster versions and corresponding CNI plugins being v1.21 with Flannel, v1.23 with Calico, and v1.25 with Flannel. The test results were the same for all versions. I can add and remove nodes to and from the cluster at will.

EanWo commented 1 week ago

I have resolved the issue. The cause of the problem was that the YAML file for the ClusterCIDR I created specified a finalizer for the controller.

aojea commented 1 week ago

/reopen

@sarveshr7 made a good analysis of this problem and I still think we should handle better this scenario, quoting from chat conversation for reference

We should modify the code here: https://github.com/kubernetes-sigs/node-ipam-controller/blob/d843244e4ae2ea7e4a877[…]f6751f461daf0/pkg/controller/ipam/multi_cidr_range_allocator.go , make the createClusterCIDR function idempotent and create the in-memory map irrespective of whether the finalizer is present or not GitHubGitHub node-ipam-controller/pkg/controller/ipam/multi_cidr_range_allocator.go at d843244e4ae2ea7e4a877f828d2f6751f461daf0 · kubernetes-sigs/node-ipam-controller

/assign @sarveshr7

k8s-ci-robot commented 1 week ago

@aojea: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/node-ipam-controller/issues/17#issuecomment-2184008777): >/reopen > >@sarveshr7 made a good analysis of this problem and I still think we should handle better this scenario, quoting from chat conversation for reference > > >> We should modify the code here: [https://github.com/kubernetes-sigs/node-ipam-controller/blob/d843244e4ae2ea7e4a877[…]f6751f461daf0/pkg/controller/ipam/multi_cidr_range_allocator.go](https://github.com/kubernetes-sigs/node-ipam-controller/blob/d843244e4ae2ea7e4a877f828d2f6751f461daf0/pkg/controller/ipam/multi_cidr_range_allocator.go#L1081) , make the createClusterCIDR function idempotent and create the in-memory map irrespective of whether the finalizer is present or not >> GitHubGitHub >> [node-ipam-controller/pkg/controller/ipam/multi_cidr_range_allocator.go at d843244e4ae2ea7e4a877f828d2f6751f461daf0 · kubernetes-sigs/node-ipam-controller](https://github.com/kubernetes-sigs/node-ipam-controller/blob/d843244e4ae2ea7e4a877f828d2f6751f461daf0/pkg/controller/ipam/multi_cidr_range_allocator.go#L1081) >> > > >/assign @sarveshr7 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.