k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.11k stars 353 forks source link

ip autodection with calico #3711

Open till opened 7 months ago

till commented 7 months ago

Is your feature request related to a problem? Please describe.

We had a weird issue where pods on a new k0s cluster were unable to talk to pods on another node/host. It turned out that the auto detection in Calico had somehow guessed the wrong interface.

So instead of using eth0 like on other clusters, it used eth1 which is a management network and not supposed to be used for node-to-node communication. This meant that the calico.vxlan interface lots all traffic.

We tried tcpdump etc. which wasn't very helpful. I already created an issue in Calico to find out if there's anything one can do to effectively debug/troubleshoot the tunnel since there doesn't seem to be anything obvious and the calicoctl tool is in a state of broken (e.g. regarding the use of docker-cli) and either demands to be executed on nodes directly or works with a $KUBECONFIG.

Describe the solution you would like

I see that currently it's empty by default: https://github.com/k0sproject/k0s/blob/1311fb0b73bb3d99202010f802e486aca5b813d4/pkg/apis/k0s/v1beta1/calico.go#L64

From the docs, it seems like, Calico will use the first interface found: https://docs.tigera.io/calico/latest/networking/ipam/ip-autodetection#autodetection-methods

Why on some clusters this is (the expected) eth0 and on others it is eth1 is currently unknown to me.

I would propose to use kubernetes-internal-ip or can-reach instead? Maybe some docs on how they can be used would be helpful as well.

Describe alternatives you've considered

Configuring this myself.

Additional context

Could be that a bump in Calico is needed for the kubernetes-internal-ip one, but I am not sure.

uablrek commented 4 days ago

This is not a k0s problem, it's standard Calico configuration. Please use:

          provider: calico
          calico:
            mode: "bird"
            envVars:
              IP_AUTODETECTION_METHOD: "interface=eth1"
              IP6_AUTODETECTION_METHOD: "interface=eth1"

in your k0s config. K0s should not try to guess or "help", since systems are different in unpredictable ways. As you can see I want eth1 to be used (in my case eth0 is the maintenance network).

As an example, some programs (e.g. early cri-o) checked the default route and selected that interface. I usually have multiple targets for my default route (ECMP), and that didn't work of course. So I had to temporary set a fake default route, and then reset it after the "clever" programs had made their magical "help". No thanks to that!