Open yypastushenko opened 1 year ago
BTW running controller manager with with allocate-node-cidrs=false and cluster-cidr=172.20.64.0/18 solved my current problem. I think calico will set up and controller manager will stopped spamming with node cidr allocation errors on such configuration.
But the question is still active - What is the reason we can't setup calico using tigera operator on kubeadm clusters without specified networking.podSubnet configuration.
I dug some additional information: In kubeadm configuration we specify:
networking:
podSubnet: 172.20.64.0/18
This parameter is propagated to kube-controller-manager flag --cluster-cidr and kube-proxy flag --cluster-cidr. In our installation we use Calico network plugin which does not rely on nodes allocated pod CIDR (which is set by IPAM-controller, that is part of kube-controller-manager).
The kube-controller-manager IPAM controller is enabled by setting --cluster-cidr and --allocate-node-cidrs flags.
In the code I see that kube-controller-manager flag --cluster-cidr
is used only in nodeIPAM controller.
Flag --cluster-cidr requires --allocate-node-cidrs to be true (also by the docs).
I think in our case we don't need to populate node spec.podCIDR by kube-controller-managers IPAM controller.
Considering passing the flag --allocate-node-cidrs=false and don't pass any --cluster-cidr to kube-controller-manager.
We have the ability to turn off node CIRD allocation in kube-controller-manager, just not specifiyng networking.podSubnet config in kubeadm. Anyway we still need to pass --cluster-cidr to kube-proxy because it do some masquerade magic. This is possible now by setting a specific clusterCIDR flag in kube-proxy Configuration (v1alpha1).
But the tigera operator rely on networking.podSubnet. We are stuck a little bit.
@tmjd what do you think about this case?
We have the validation that the podSubnet matches the IPPool because kube-proxy might not use the matching CIDR which would break what kube-proxy does and be a significant problem in a cluster.
It sounds like kubeadm needs the capability to turn off node CIDR allocation since that is an assumption it makes that podSubnet means node CIDR allocation should be enabled.
How is kube-proxy Configuration used when using kubeadm? Would it be included in the kubeadm-config
ConfigMap? If so then the operator could check if the clusterCIDR flag is configured for kube-proxy.
The KubeProxyConfiguration from kubeadm configuration is not present in kubeadm-config configmap. But there is kube-proxy configmap in kube-system namespace.
kubectl get cm -n kube-system kube-proxy -o yaml
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
...
clusterCIDR: 172.20.64.0/18
...
Actually there is a capability to turn off node CIDR allocation in kubeadm with maintaining networking.podSubnet set. And there is a little misunderstanding in affirmation "kube-controller-manager --clusterCidr flag requires node CIDR allocation enabled." I started a discussion here - https://github.com/kubernetes/kubernetes/issues/119066.
Anyway, we have a workaround. We left in kubeadm config the networking section:
networking:
podSubnet: 172.20.64.0/18
And we also overridded in kubeadm config the allocate-node-cidrs flag:
controllerManager:
extraArgs:
allocate-node-cidrs: "false"
This makes starting kube-controller manager with following flags:
--cluster-cidr=172.20.64.0/18
--allocate-node-cidrs=false
And kube-proxy with:
--cluster-cidr=172.20.64.0/18
The networking.podSubnet is also present in kubeadm-config configmap - so tigera starts calico pods. But this is a little bit overhead, because in our case the flag --cluster-cidr in kube-controller-manager is a kinda never used dead-code.
Expected Behavior
We can configure the default IPPool in calico installation without setting podCIDR in kubeadm configuration.
Current Behavior
When installing calico with an IPPool using tigera operator on kubeadm created cluster without networking.PodCIDR param set in kubeadm configuration, calico is not starting.
Steps to Reproduce (for bugs)
Context
We maintain bare metal clusters with calico networking installed using tigera operator. Our cluster installation is cia kubeadm with
podCIDR=172.20.64.0/18
and subnet mask for pods/24
configuration. Calico was installed using tigera-operator helm chart with default configuration. Now we have more than 64 nodes in cluster, so the node-controller can't assign podCIDR for new nodes (172.20.64.0/18 have 64 subnets with /24 mask). We dug a while and found the next issue. Also we found that for calico it's not a problem, because it ignores assigned nodes podCIDR The mask for node subnets in our calico installation is/26
, which is more than enough for our node workloads and cluster size. Now we are facing the following errors in kubernetes controller manager:The Controller manager requires allocate-node-cidrs=true flag if cluster-cidr is specified (by official documentation). On setting networking.podSubnet in kubeadm it enables those 2 flags in Controller manager. So we can't just set cluster-cidr configuration and disable allocate-node-cidrs flag to avoid node PodCIDR allocation.
If calico ignores the assigned node PodCIDR and the kubernetes cluster can be set up with kubeadm without networking.podSubnet set - we decided to try such installation. We started a new kubeadm cluster without specifying networking.podSubnet. We installed tigera operator and tried to create a calico Installation (in operator.tigera.io/v1 API) with specified IPPool:
Calico is not started and tigera operator pod writes following errors in logs:
Your Environment
kubeadm/kubernetes version v1.23.17 tigera-operator v3.23.1