k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.47k stars 354 forks source link

Switch CNI in a deployed cluster #3707

Closed ferama closed 6 months ago

ferama commented 10 months ago

Is your feature request related to a problem? Please describe.

Actually in our lab we are using a k0s cluster deployed using k0sctl and configured with kuberouter as network provider. We already added more worker nodes in the past to the existing cluster without any issue.

Now we need to add a new worker node that do not reside into the same subnet of the others. So we actually have some nodes into the 192.168.5.0/24 and we need to add a new one that resides into the 192.168.6.0/24. At the network layer the two subnet are routed correctly.

The node is added and it register correctly with the control plane but there are network issues. Pods into the new node cannot communicate with pods scheduled into the olds nodes. Getting the pods logs into the new node fails and so on.

We noticed that running a new cluster with workers on both the subnets but using calico network provider, works without any issue.

Here comes the question: the docs here https://docs.k0sproject.io/v1.28.3+k0s.0/networking/ claims that changing the network provider implies a full cluster redeployment. There is a way to manually change the network provider in some way? Can be also the hard way if needed

There is something else that we can do to make the worker nodes into the two subnets communicate as expected?

Describe the solution you would like

What we would like (if doable) is to switch network provider on a deployed cluster

Describe alternatives you've considered

No response

Additional context

No response

github-actions[bot] commented 9 months ago

The issue is marked as stale since no activity has been recorded in 30 days

jnummelin commented 8 months ago

@ferama sorry it took a while to get to your issue.

There's similar sounding issue in kube-router repo with some debugging pointers: https://github.com/cloudnativelabs/kube-router/issues/1006

In this case kube-router should automatically detect that nodes are in different subnets and enable IPIP tunnels between nodes in different subnets: https://github.com/cloudnativelabs/kube-router/blob/master/docs/tunnels.md#scenarios-for-tunnelling

Could there be something in the network blocking those IPIP tunnels?

juanluisvaladas commented 8 months ago

Hi @ferama , besides of what @jnummelin commented, the limitation of not allowing to change the CNI provider is imposed by k0s to prevent damages. You can skip this check at your own risk by following these steps:

  1. Stop every worker
  2. Restart k0s every controller with spec.network.provider: calico
  3. Remove the kuberouter manifests on every controller: rm -rf /var/lib/k0s/manifests/kuberouter/
  4. Delete the kube-router daemonset k0s kc delete ds -n kube-system kube-router
  5. Restart k0s every controller again and make sure kube-router is not redeployed
  6. On every worker remove /etc/cni/net.d/*
  7. Restart every worker

Keep in mind that this procedure isn't supported or properly tested and it could go wrong at any time

github-actions[bot] commented 7 months ago

The issue is marked as stale since no activity has been recorded in 30 days