Closed ugur99 closed 1 month ago
@ugur99 can't you use the node selector to ignore those nodes? I didn't tried myself but something as
nodeSelectorTerms:
- matchExpressions:
- key: ipam-mode
operator: NotIn
values:
- legacy
the node ipam controller is assuming full control of the nodes it selects
please let us know, this is a real interesting and important use case
yes exactly we are using same strategy (we use nodeSelector for each clusterCIDR resources and there is no match for this legacy nodes); but as far as I see this logic does not allow to do that.
oh, my bad, now I see clearly the bug ... it is listing all nodes without paying attention to the selectors
/kind bug
hmm, it seems the change is more involved ...
diff --git a/pkg/controller/ipam/multi_cidr_range_allocator.go b/pkg/controller/ipam/multi_cidr_range_allocator.go
index fbbfb26..2460f11 100644
--- a/pkg/controller/ipam/multi_cidr_range_allocator.go
+++ b/pkg/controller/ipam/multi_cidr_range_allocator.go
@@ -26,7 +26,7 @@ import (
"sync"
"time"
"sigs.k8s.io/node-ipam-controller/pkg/apis/clustercidr/v1"
v1 "sigs.k8s.io/node-ipam-controller/pkg/apis/clustercidr/v1" clustercidrclient "sigs.k8s.io/node-ipam-controller/pkg/client/clientset/versioned/typed/clustercidr/v1" clustercidrinformers "sigs.k8s.io/node-ipam-controller/pkg/client/informers/externalversions/clustercidr/v1" clustercidrlisters "sigs.k8s.io/node-ipam-controller/pkg/client/listers/clustercidr/v1" @@ -263,24 +263,6 @@ func NewMultiCIDRRangeAllocator( logger.Info("No Secondary Service CIDR provided. Skipping filtering out secondary service addresses") }
if nodeList != nil {
for _, node := range nodeList.Items {
if len(node.Spec.PodCIDRs) == 0 {
logger.V(4).Info("Node has no CIDR, ignoring", "node", klog.KObj(&node))
continue
}
logger.Info("Node has CIDR, occupying it in CIDR map", "node", klog.KObj(&node), "podCIDRs", node.Spec.PodCIDRs)
if err := ra.occupyCIDRs(logger, &node); err != nil {
// This will happen if:
// 1. We find garbage in the podCIDRs field. Retrying is useless.
// 2. CIDR out of range: This means ClusterCIDR is not yet created
// This error will keep crashing controller-manager until the
// appropriate ClusterCIDR has been created
return nil, err
}
}
}
this solves this problem because we still try to process ...
If there are not ClusterCIDR matching a node it will fail and retry multiple times , I think we want to do this?? If there are nodes like in this case that have already PodCIDRs but we don't want to add more to this ClusterCIDR we may decide to leave this way, in other hand, it can be an oversight and the user may want to add it .
... the alternative is to decide the node ipam controller must always cover all the Nodes, and "force" the users to create ClusterCIDRs matching those legacy nodes ....
@sarveshr7 @mneverov @uablrek thoughts?
sending PR with the first proposal to get feedback https://github.com/kubernetes-sigs/node-ipam-controller/pull/28
Thanks for your quick support @aojea @sarveshr7 @mneverov!
Hi,
The current discovering podCIDR logic attempts to discover all used podCIDRs in the cluster and then tries to match them to at least one of the existing clusterCIDR resources.
However, in our cluster, we have some production nodes that use unwanted podCIDR specs that do not match any IP pools defined in the clusterCIDRs. As a result, the controller fails to run because it cannot find any matched clusterCIDR resources. While we plan to re-provision these nodes, it is not feasible to do so quickly at a large scale.
I am wondering if we can find a workaround for this issue. Specifically, can we configure the controller to only discover podCIDRs that lie within the IP pools specified in the clusterCIDRs? This way, the controller would ignore the unwanted podCIDRs and continue to manage the others.