Discovering podCIDR logic for existing clusters

ugur99 commented 1 month ago

Hi,

The current discovering podCIDR logic attempts to discover all used podCIDRs in the cluster and then tries to match them to at least one of the existing clusterCIDR resources.

However, in our cluster, we have some production nodes that use unwanted podCIDR specs that do not match any IP pools defined in the clusterCIDRs. As a result, the controller fails to run because it cannot find any matched clusterCIDR resources. While we plan to re-provision these nodes, it is not feasible to do so quickly at a large scale.

I am wondering if we can find a workaround for this issue. Specifically, can we configure the controller to only discover podCIDRs that lie within the IP pools specified in the clusterCIDRs? This way, the controller would ignore the unwanted podCIDRs and continue to manage the others.

 "failed to create Node IPAM controller" err="could not occupy cidrs: [10.231.249.128/25], No matching ClusterCIDRs found"

aojea commented 1 month ago

@ugur99 can't you use the node selector to ignore those nodes? I didn't tried myself but something as

       nodeSelectorTerms:
        - matchExpressions:
          - key: ipam-mode
            operator: NotIn
            values:
            - legacy

the node ipam controller is assuming full control of the nodes it selects

https://github.com/kubernetes-sigs/node-ipam-controller/blob/df35f35a3a70cca464add8cdb629a39cf66b7b05/pkg/controller/ipam/multi_cidr_range_allocator.go#L272-L282

please let us know, this is a real interesting and important use case

ugur99 commented 1 month ago

yes exactly we are using same strategy (we use nodeSelector for each clusterCIDR resources and there is no match for this legacy nodes); but as far as I see this logic does not allow to do that.

aojea commented 1 month ago

oh, my bad, now I see clearly the bug ... it is listing all nodes without paying attention to the selectors

/kind bug

aojea commented 1 month ago

hmm, it seems the change is more involved ...

I don't think we need to preprocess all the Nodes before the controller starts, once the worker starts it will receive all the node sin the cache via an Add event, preprocessing the clusterCIDRs makes sense because this avoids the race of a node being processed before the corresponding ClusterCIDR ...


diff --git a/pkg/controller/ipam/multi_cidr_range_allocator.go b/pkg/controller/ipam/multi_cidr_range_allocator.go
index fbbfb26..2460f11 100644
--- a/pkg/controller/ipam/multi_cidr_range_allocator.go
+++ b/pkg/controller/ipam/multi_cidr_range_allocator.go
@@ -26,7 +26,7 @@ import (
    "sync"
    "time"

"sigs.k8s.io/node-ipam-controller/pkg/apis/clustercidr/v1"
v1 "sigs.k8s.io/node-ipam-controller/pkg/apis/clustercidr/v1" clustercidrclient "sigs.k8s.io/node-ipam-controller/pkg/client/clientset/versioned/typed/clustercidr/v1" clustercidrinformers "sigs.k8s.io/node-ipam-controller/pkg/client/informers/externalversions/clustercidr/v1" clustercidrlisters "sigs.k8s.io/node-ipam-controller/pkg/client/listers/clustercidr/v1" @@ -263,24 +263,6 @@ func NewMultiCIDRRangeAllocator( logger.Info("No Secondary Service CIDR provided. Skipping filtering out secondary service addresses") }
if nodeList != nil {
for _, node := range nodeList.Items {
if len(node.Spec.PodCIDRs) == 0 {
logger.V(4).Info("Node has no CIDR, ignoring", "node", klog.KObj(&node))
continue
}
logger.Info("Node has CIDR, occupying it in CIDR map", "node", klog.KObj(&node), "podCIDRs", node.Spec.PodCIDRs)
if err := ra.occupyCIDRs(logger, &node); err != nil {
// This will happen if:
// 1. We find garbage in the podCIDRs field. Retrying is useless.
// 2. CIDR out of range: This means ClusterCIDR is not yet created
// This error will keep crashing controller-manager until the
// appropriate ClusterCIDR has been created
return nil, err
}
}
}

this solves this problem because we still try to process ...

https://github.com/kubernetes-sigs/node-ipam-controller/blob/df35f35a3a70cca464add8cdb629a39cf66b7b05/pkg/controller/ipam/multi_cidr_range_allocator.go#L613-L635

If there are not ClusterCIDR matching a node it will fail and retry multiple times , I think we want to do this?? If there are nodes like in this case that have already PodCIDRs but we don't want to add more to this ClusterCIDR we may decide to leave this way, in other hand, it can be an oversight and the user may want to add it .

... the alternative is to decide the node ipam controller must always cover all the Nodes, and "force" the users to create ClusterCIDRs matching those legacy nodes ....

@sarveshr7 @mneverov @uablrek thoughts?

sending PR with the first proposal to get feedback https://github.com/kubernetes-sigs/node-ipam-controller/pull/28

ugur99 commented 1 month ago

Thanks for your quick support @aojea @sarveshr7 @mneverov!

kubernetes-sigs / node-ipam-controller

Discovering podCIDR logic for existing clusters #27