Add ability to start calico when kubeadm cluster does not have podSubnet configured

yypastushenko commented 1 year ago

Expected Behavior

We can configure the default IPPool in calico installation without setting podCIDR in kubeadm configuration.

Current Behavior

When installing calico with an IPPool using tigera operator on kubeadm created cluster without networking.PodCIDR param set in kubeadm configuration, calico is not starting.

Steps to Reproduce (for bugs)

# init kubeadm with default configuration
kubeadm init 
helm repo add projectcalico https://artifactory.quadcode.tech/artifactory/api/helm/projectcalico
helm repo update
helm install calico projectcalico/tigera-operator --version v3.23.1 --set installation.calicoNetwork.ipPools[0].cidr="172.20.64.0/18"

Context

We maintain bare metal clusters with calico networking installed using tigera operator. Our cluster installation is cia kubeadm with podCIDR=172.20.64.0/18 and subnet mask for pods /24 configuration. Calico was installed using tigera-operator helm chart with default configuration. Now we have more than 64 nodes in cluster, so the node-controller can't assign podCIDR for new nodes (172.20.64.0/18 have 64 subnets with /24 mask). We dug a while and found the next issue. Also we found that for calico it's not a problem, because it ignores assigned nodes podCIDR The mask for node subnets in our calico installation is /26, which is more than enough for our node workloads and cluster size. Now we are facing the following errors in kubernetes controller manager:

E0703 16:38:19.620181       1 controller_utils.go:260] Error while processing Node Add/Delete: failed to allocate cidr from cluster cidr at idx:0: CIDR allocation failed; there are no remaining CIDRs left to allocate in the accepted range

The Controller manager requires allocate-node-cidrs=true flag if cluster-cidr is specified (by official documentation). On setting networking.podSubnet in kubeadm it enables those 2 flags in Controller manager. So we can't just set cluster-cidr configuration and disable allocate-node-cidrs flag to avoid node PodCIDR allocation.

If calico ignores the assigned node PodCIDR and the kubernetes cluster can be set up with kubeadm without networking.podSubnet set - we decided to try such installation. We started a new kubeadm cluster without specifying networking.podSubnet. We installed tigera operator and tried to create a calico Installation (in operator.tigera.io/v1 API) with specified IPPool:

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  annotations:
    meta.helm.sh/release-name: calico
    meta.helm.sh/release-namespace: tigera-operator
  name: default
spec:
  calicoNetwork:
    ipPools:
    - cidr: 172.20.64.0/18
  kubernetesProvider: ""

Calico is not started and tigera operator pod writes following errors in logs:

{"level":"error","ts":1688408872.894088,"logger":"controller.tigera-installation-controller","msg":"Reconciler error","name":"default-token-78r8f","namespace":"tigera-operator","error":"Could not resolve CalicoNetwork IPPool and kubeadm configuration: kubeadm configuration is missing required podSubnet field","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.7/pkg/internal/controller/controller.go:214"}

Your Environment

kubeadm/kubernetes version v1.23.17 tigera-operator v3.23.1

yypastushenko commented 1 year ago

BTW running controller manager with with allocate-node-cidrs=false and cluster-cidr=172.20.64.0/18 solved my current problem. I think calico will set up and controller manager will stopped spamming with node cidr allocation errors on such configuration.

But the question is still active - What is the reason we can't setup calico using tigera operator on kubeadm clusters without specified networking.podSubnet configuration.

yypastushenko commented 1 year ago

I dug some additional information: In kubeadm configuration we specify:

networking:
  podSubnet: 172.20.64.0/18

This parameter is propagated to kube-controller-manager flag --cluster-cidr and kube-proxy flag --cluster-cidr. In our installation we use Calico network plugin which does not rely on nodes allocated pod CIDR (which is set by IPAM-controller, that is part of kube-controller-manager).

The kube-controller-manager IPAM controller is enabled by setting --cluster-cidr and --allocate-node-cidrs flags. In the code I see that kube-controller-manager flag --cluster-cidr is used only in nodeIPAM controller. Flag --cluster-cidr requires --allocate-node-cidrs to be true (also by the docs). I think in our case we don't need to populate node spec.podCIDR by kube-controller-managers IPAM controller. Considering passing the flag --allocate-node-cidrs=false and don't pass any --cluster-cidr to kube-controller-manager.

We have the ability to turn off node CIRD allocation in kube-controller-manager, just not specifiyng networking.podSubnet config in kubeadm. Anyway we still need to pass --cluster-cidr to kube-proxy because it do some masquerade magic. This is possible now by setting a specific clusterCIDR flag in kube-proxy Configuration (v1alpha1).

But the tigera operator rely on networking.podSubnet. We are stuck a little bit.

yypastushenko commented 1 year ago

@tmjd what do you think about this case?

tmjd commented 1 year ago

We have the validation that the podSubnet matches the IPPool because kube-proxy might not use the matching CIDR which would break what kube-proxy does and be a significant problem in a cluster.

It sounds like kubeadm needs the capability to turn off node CIDR allocation since that is an assumption it makes that podSubnet means node CIDR allocation should be enabled. How is kube-proxy Configuration used when using kubeadm? Would it be included in the kubeadm-config ConfigMap? If so then the operator could check if the clusterCIDR flag is configured for kube-proxy.

yypastushenko commented 1 year ago

The KubeProxyConfiguration from kubeadm configuration is not present in kubeadm-config configmap. But there is kube-proxy configmap in kube-system namespace.

kubectl get cm -n kube-system kube-proxy -o yaml

apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
...
    clusterCIDR: 172.20.64.0/18
...

Actually there is a capability to turn off node CIDR allocation in kubeadm with maintaining networking.podSubnet set. And there is a little misunderstanding in affirmation "kube-controller-manager --clusterCidr flag requires node CIDR allocation enabled." I started a discussion here - https://github.com/kubernetes/kubernetes/issues/119066.

Anyway, we have a workaround. We left in kubeadm config the networking section:

networking:
  podSubnet: 172.20.64.0/18

And we also overridded in kubeadm config the allocate-node-cidrs flag:

controllerManager:
  extraArgs:
    allocate-node-cidrs: "false"

This makes starting kube-controller manager with following flags:

--cluster-cidr=172.20.64.0/18
--allocate-node-cidrs=false

And kube-proxy with:

--cluster-cidr=172.20.64.0/18

The networking.podSubnet is also present in kubeadm-config configmap - so tigera starts calico pods. But this is a little bit overhead, because in our case the flag --cluster-cidr in kube-controller-manager is a kinda never used dead-code.

tigera / operator