kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.74k stars 712 forks source link

Add support for using kubeadm without default route #3102

Closed telmich closed 3 weeks ago

telmich commented 3 weeks ago

What keywords did you search in kubeadm issues before filing this one?

I've checked previous tickets, specfically #3075 in which this issue was last discussed.

FEATURE REQUEST

Request

Add support for kubeadm to work without default route.

Background

I am aware of https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#network-setup saying we need a default route and I understand that there might be regressions running k8s without one.

However, we are running 30+ k8s clusters without default routes, because these k8s clusters are providing routing services to various infrastructures. Usually these are very small clusters, consisting of 1-4 nodes each, providing BGP, firewalling, NAT64 towards the infrastructure.

Technical considerations

One of the main reasons kubeadm currently fails is due to the selection of the IP address. From a network perspective, I see two very easy solutions to this:

In my opinion, (a) should be the default and (b) is already supported for many tools such as ping (using -I), ssh (using -B),

Follow up tasks

I would update the documentation above to reflect that newer versions of kubeadm can work without a default route, but that having a default route is recommended for most cases.

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.2", GitCommit:"39683505b630ff2121012f3c5b16215a1449d5ed", GitTreeState:"archive", BuildDate:"2024-07-03T09:11:54Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}

(however seems to apply to all kubeadm versions)

Environment:

What happened?

Trying to print the join command or running kubeadm upgrade apply ... fails on hosts without a default route.

What you expected to happen?

I expect it to work, as the hosts in question have the full internet routing table:

bird> show route count
3797169 of 3797169 routes for 955884 networks in table master4
814172 of 814172 routes for 209181 networks in table master6
Total: 4611341 of 4611341 routes for 1165065 networks in 2 tables
bird> 

They can reach any host, they just don't have a default route.

How to reproduce it (as minimally and precisely as possible)?

Create a node, remove the default route, keep required routes for pulling images.

Anything else we need to know?

Kubernetes provides a very good framework including managing network components. In many cases pods running on routers are actually running on the HostNetwork, which admittedly is not the default case, is a very useful use case.

We are running data centers all around the world and have been running routing services inside kubernetes now for almost 2 years. Upgrading kubeadm clusters without default route is a bit more dangerous and complicated then other systems because of the default route requirement.

So a typical upgrade or joining other nodes flow is at the moment:

So using kubeadm with a default route is possible, but dangerous and more complex, because routes are important and we don't want incorrect packets to be delivered to the wrong router (which we need to specific using the default route)

If there is any work needed in regards to selecting the right IP address or reasoning about the logic, I can be helping there. I am just not familiar with go/the kubeadm codebase.

telmich commented 3 weeks ago

To follow up on #3075, there does indeed seem to be a bug within kubeadm even with a default route present:

[08:33] server123.place10:~# ip route add default via 2001:1700:3500:2::11
[08:39] server123.place10:~# kubeadm token create --print-join-command
route ip+net: no such network interface
To see the stack trace of this error execute with --v=5 or higher
[08:43] server123.place10:~# kubeadm token create --print-join-command --v=5
I0827 08:45:57.725916   13142 token.go:119] [token] validating mixed arguments
I0827 08:45:57.725976   13142 token.go:128] [token] getting Clientsets from kubeconfig file
I0827 08:45:57.726005   13142 cmdutil.go:94] Using kubeconfig file: /etc/kubernetes/admin.conf
I0827 08:45:57.727655   13142 token.go:243] [token] loading configurations
I0827 08:45:57.728166   13142 initconfiguration.go:114] skip CRI socket detection, fill with the default CRI socket unix:///var/run/containerd/containerd.sock
I0827 08:50:02.451174   13142 interface.go:432] Looking for default routes with IPv4 addresses
I0827 08:50:02.451231   13142 interface.go:437] Default route transits interface "*"
route ip+net: no such network interface
[08:50] server123.place10:~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.2", GitCommit:"39683505b630ff2121012f3c5b16215a1449d5ed", GitTreeState:"archive", BuildDate:"2024-07-03T09:11:54Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}
[08:52] server123.place10:~# 

It seems not only does kubeadm need a default route, but also seem to need an IPv4 default route, which is not present in IPv6 only networks anyway.

I verified this on a second node where the behaviour is identical with default route present:

[08:46] server122.place10:/usr/local/bin# ip route add default via 2001:1700:3500:2::1; kubeadm token create --print-join-command --v=5; ip route del default via 2001:1700:3500:2::1 
I0827 08:47:43.972773     315 token.go:119] [token] validating mixed arguments
I0827 08:47:43.972863     315 token.go:128] [token] getting Clientsets from kubeconfig file
I0827 08:47:43.972893     315 cmdutil.go:94] Using kubeconfig file: /etc/kubernetes/admin.conf
I0827 08:47:43.974538     315 token.go:243] [token] loading configurations
I0827 08:47:43.974970     315 initconfiguration.go:114] skip CRI socket detection, fill with the default CRI socket unix:///var/run/containerd/containerd.sock
I0827 08:52:17.932887     315 interface.go:432] Looking for default routes with IPv4 addresses
I0827 08:52:17.932950     315 interface.go:437] Default route transits interface "*"
route ip+net: no such network interface
[08:52] server122.place10:/usr/local/bin# kubead 
-ash: kubead: not found
[08:52] server122.place10:/usr/local/bin# 
[08:52] server122.place10:/usr/local/bin# 

[08:52] server122.place10:/usr/local/bin# 
[08:52] server122.place10:/usr/local/bin# 
[08:52] server122.place10:/usr/local/bin# 
[08:52] server122.place10:/usr/local/bin# 
[08:52] server122.place10:/usr/local/bin# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"30", GitVersion:"v1.30.2", GitCommit:"39683505b630ff2121012f3c5b16215a1449d5ed", GitTreeState:"archive", BuildDate:"2024-07-03T09:11:54Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}
[08:53] server122.place10:/usr/local/bin# 
telmich commented 3 weeks ago

And on both nodes it works with the same workaround as before:

[08:54] server123.place10:~#  kubeadm token create  --print-join-command --v=5 --config ./kubeadm-init-only.yaml 
I0827 08:54:47.068597   15472 token.go:119] [token] validating mixed arguments
I0827 08:54:47.068664   15472 token.go:128] [token] getting Clientsets from kubeconfig file
I0827 08:54:47.068692   15472 cmdutil.go:94] Using kubeconfig file: /etc/kubernetes/admin.conf
I0827 08:54:47.070134   15472 token.go:243] [token] loading configurations
I0827 08:54:47.070154   15472 initconfiguration.go:260] loading configuration from "./kubeadm-init-only.yaml"
I0827 08:54:47.070736   15472 initconfiguration.go:114] skip CRI socket detection, fill with the default CRI socket unix:///var/run/containerd/containerd.sock
I0827 08:54:47.070772   15472 kubelet.go:196] the value of KubeletConfiguration.cgroupDriver is empty; setting it to "systemd"
I0827 08:54:47.072263   15472 version.go:187] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.txt
I0827 08:54:47.283390   15472 version.go:256] remote version is much newer: v1.31.0; falling back to: stable-1.30
I0827 08:54:47.283471   15472 version.go:187] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.30.txt
I0827 08:54:47.630326   15472 token.go:252] [token] creating token
kubeadm join [2a0a:e5c0:10:1::123]:6443 --token ..... --discovery-token-ca-cert-hash sha256:.... 
[08:54] server123.place10:~# cat kubeadm-init-only.yaml 
kind: InitConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
localAPIEndpoint:
  advertiseAddress: 2a0a:e5c0:10:1::123
  bindPort: 6443

Same on the other node:

[08:54] server122.place10:~#  kubeadm token create  --print-join-command --v=5 --config ./kubeadm-init-only.yaml 
I0827 08:54:41.827711    2019 token.go:119] [token] validating mixed arguments
I0827 08:54:41.827783    2019 token.go:128] [token] getting Clientsets from kubeconfig file
I0827 08:54:41.827805    2019 cmdutil.go:94] Using kubeconfig file: /etc/kubernetes/admin.conf
I0827 08:54:41.829328    2019 token.go:243] [token] loading configurations
I0827 08:54:41.829346    2019 initconfiguration.go:260] loading configuration from "./kubeadm-init-only.yaml"
I0827 08:54:41.830228    2019 initconfiguration.go:114] skip CRI socket detection, fill with the default CRI socket unix:///var/run/containerd/containerd.sock
I0827 08:54:41.830268    2019 kubelet.go:196] the value of KubeletConfiguration.cgroupDriver is empty; setting it to "systemd"
I0827 08:54:41.831398    2019 version.go:187] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.txt
I0827 08:54:42.307952    2019 version.go:256] remote version is much newer: v1.31.0; falling back to: stable-1.30
I0827 08:54:42.308073    2019 version.go:187] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.30.txt
I0827 08:54:42.641422    2019 token.go:252] [token] creating token
kubeadm join [2a0a:e5c0:10:1::122]:6443 --token ..... --discovery-token-ca-cert-hash sha256:... 
[08:54] server122.place10:~# cat kubeadm-init-only.yaml 
kind: InitConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
localAPIEndpoint:
  advertiseAddress: 2a0a:e5c0:10:1::122
  bindPort: 6443
neolit123 commented 3 weeks ago

However, we are running 30+ k8s clusters without default routes, because these k8s clusters are providing routing services to various infrastructures. Usually these are very small clusters, consisting of 1-4 nodes each, providing BGP, firewalling, NAT64 towards the infrastructure.

I would update the documentation above to reflect that newer versions of kubeadm can work without a default route, but that having a default route is recommended for most cases.

i don't think we want to update the documentation or modify kubeadm, because kubeadm is aligned with the IP detection mechanism of all k8s components. they all use default route.

also, @uablrek @aojea do you remember that kubernetes/kubernetes ticket where we discussed that k8s does need a default route for core features. was it related to Services?

b) add an option to manually specific the IP address

And on both nodes it works with the same workaround as before:

@telmich this is already supported by passing IPs to everything. this is not a workaround. this is the expected way, but topology wise it's not recommended.

this page has a note that is clear about that https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#network-setup

The Kubernetes project recommends against this approach (configuring all component instances with custom IP addresses).

aojea commented 3 weeks ago
  • b) add an option to manually specific the IP address

You can and must do this, if you are setting your own routing rules, it means you are a power user and you are in control of the network, so you must do the corresponding IP planning and assignment

Maybe the question is, how can I pass the IPs to the kubernetes components with kubeadm? if that is the case can you please specify which components or all of them?

uablrek commented 3 weeks ago

@uablrek @aojea do you remember that kubernetes/kubernetes ticket where we discussed that k8s does need a default route for core features. was it related to Services?

Yes. Ref https://github.com/kubernetes/kubernetes/issues/123120

neolit123 commented 3 weeks ago

Maybe the question is, how can I pass the IPs to the kubernetes components with kubeadm?

there was this blog post on how to configure all components with kubeadm, but it did not get to a good state to be merged: https://github.com/kubernetes/website/pull/28331 also IMO this is not something we should advise with documentation or blog posts.

also, @uablrek @aojea do you remember that kubernetes/kubernetes ticket where we discussed that k8s does need a default route for core features. was it related to Services?

here it is: https://github.com/kubernetes/kubernetes/issues/123120

(EDIT: was just posted above)

uablrek commented 3 weeks ago

@telmich Check if the bogus default route mentioned in https://github.com/kubernetes/kubernetes/issues/123120#issuecomment-1925700626 can be used as a work-around.

ip ro add default dev lo
telmich commented 3 weeks ago

i don't think we want to update the documentation or modify kubeadm, because kubeadm is aligned with the IP detection mechanism of all k8s components. they all use default route.

Can you elaborate on this a little bit? I don't understand how any component of k8s actually requires a default route, because de-facto we have many k8s clusters running, healthy without a default route.

We did not do any kind of tuning so far and all k8s components just work, the only exception is so far with kubeadm.

telmich commented 3 weeks ago

@telmich this is already supported by passing IPs to everything. this is not a workaround. this is the expected way, but topology wise it's not recommended.

I disagree with that as kubeadm does not support something like --address or --bind-address. The only, rather awkward way to make it work without a default route is by passing in above defined kubeadm-init-only.yaml.

telmich commented 3 weeks ago
  • b) add an option to manually specific the IP address

You can and must do this, if you are setting your own routing rules, it means you are a power user and you are in control of the network, so you must do the corresponding IP planning and assignment

There is a difference between routing (where to go) and addressing (which source address to use). They are related, but they are sometimes mixed up.

The routes on all systems work as well as do the addresses. Using ssh, curl, etc. all of these tools work on the machine. With one exception - that is why I created this feature request.

Maybe the question is, how can I pass the IPs to the kubernetes components with kubeadm? if that is the case can you please specify which components or all of them?

Maybe you can help me on this one: I actually don't understand why kubeadm is failing in the first place.

I can curl the kube-apiserver, I can reach all kube components with curl without setting any kind of parameters. I honestly don't understand why kubeadm fails to connect in the first place. From an OS point of view, everything is working normally.

neolit123 commented 3 weeks ago

i don't think we want to update the documentation or modify kubeadm, because kubeadm is aligned with the IP detection mechanism of all k8s components. they all use default route.

Can you elaborate on this a little bit? I don't understand how any component of k8s actually requires a default route, because de-facto we have many k8s clusters running, healthy without a default route.

how are you configurating bind addresses for them?

We did not do any kind of tuning so far and all k8s components just work, the only exception is so far with kubeadm.

all k8s components have the same IP detection mechanism. it is summarized in this note:

If two or more default gateways are present on the host, a Kubernetes component will try to use the first one it encounters that has a suitable global unicast IP address. While making this choice, the exact ordering of gateways might vary between different operating systems and kernel versions.

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#network-setup

telmich commented 3 weeks ago

Maybe the question is, how can I pass the IPs to the kubernetes components with kubeadm?

there was this blog post on how to configure all components with kubeadm, but it did not get to a good state to be merged: kubernetes/website#28331 also IMO this is not something we should advise with documentation or blog posts.

also, @uablrek @aojea do you remember that kubernetes/kubernetes ticket where we discussed that k8s does need a default route for core features. was it related to Services?

here it is: kubernetes/kubernetes#123120

(EDIT: was just posted above)

Thanks a lot for the information! I've read that issue and while adding the default route seems to fix the particular problem, nothing I find in the ticket actually states that something requires a default route, besides then the link to the kubeadm documentation.

From a network and OS perspective I would also claim that a default route is generally speaking unnecessary. What is required is connectivity between the nodes. Even the described kube-proxy issue might just be an implementation issue, which might not even be there with a CNI such as calico that can do bgp peering.

What I really want to say with this is, I don't think a default route is actually necessary for kubeadm nor any of the k8s components. There might currently be dependencies on it in one or the other implementation way, but technically, networking wise, it is not needed.

neolit123 commented 3 weeks ago

@telmich this is already supported by passing IPs to everything. this is not a workaround. this is the expected way, but topology wise it's not recommended.

I disagree with that as kubeadm does not support something like --address or --bind-address. The only, rather awkward way to make it work without a default route is by passing in above defined kubeadm-init-only.yaml.

the way to configure them all is by using the kubeadm configuration file. check the linked blog post PR. kubeadm exposes ways to configure flags for given components.

EDIT: or this page: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/

the IP detection that kubeadm does is the same as the kube-apiserver. if you pass the advertiseaddress field this detection is skipped as kubeadm thinks you know the address that you want.

telmich commented 3 weeks ago

i don't think we want to update the documentation or modify kubeadm, because kubeadm is aligned with the IP detection mechanism of all k8s components. they all use default route.

Can you elaborate on this a little bit? I don't understand how any component of k8s actually requires a default route, because de-facto we have many k8s clusters running, healthy without a default route.

how are you configurating bind addresses for them?

Address binding is not depending on routing.

I think what you mean is "how to decide on which address to bind" and the default answer to that is: bind to the ANY address (:: and 0.0.0.0).

That's the default behaviour of virtually any network server.

Maybe also to clarify, there might be 2 issues here, not sure how mixed we talk about them:

In both cases, this is usually best left to the operating system and should not be chosen by the application, unless

Because there are cases when a host has multiple addresses and the software needs to select a specific one, not left to the kernel.

We did not do any kind of tuning so far and all k8s components just work, the only exception is so far with kubeadm.

all k8s components have the same IP detection mechanism. it is summarized in this note:

If two or more default gateways are present on the host, a Kubernetes component will try to use the first one it encounters that has a suitable global unicast IP address. While making this choice, the exact ordering of gateways might vary between different operating systems and kernel versions.

Interesting, that is a bit the opposite case, having multiple default routes.

From what I recall coding in C (maybe this is diffferent in golang?), NOT bind()ing to an IP address, but just using connect() does the expected thing: https://stackoverflow.com/questions/15673846/how-to-give-to-a-client-specific-ip-address-in-c

Is there a limitation / requirement in go that forces kubeadm to bind in the first place?

telmich commented 3 weeks ago

@telmich this is already supported by passing IPs to everything. this is not a workaround. this is the expected way, but topology wise it's not recommended.

I disagree with that as kubeadm does not support something like --address or --bind-address. The only, rather awkward way to make it work without a default route is by passing in above defined kubeadm-init-only.yaml.

the way to configure them all is by using the kubeadm configuration file. check the linked blog post PR. kubeadm exposes ways to configure flags for given components.

EDIT: or this page: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/

the IP detection that kubeadm does is the same as the kube-apiserver. if you pass the advertiseaddress field this detection is skipped as kubeadm thinks you know the address that you want.

I just checked the generated manifest that kubeadm created and it contains:

kube-apiserver --advertise-address=2a0a:e5c0:10:1::122

And the initial configuration passed to kubeadm did in fact contain:

kind: InitConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
localAPIEndpoint:
  advertiseAddress: 2a0a:e5c0:10:1::122
  bindPort: 6443

... which seems to somewhat explain why passing in the Initconfiguration makes kubeadm work on non-default route enabled systems (?), because it uses the InitConfiguration to set its source address?

Sorry, slightly puzzled on this one.

neolit123 commented 3 weeks ago

However, we are running 30+ k8s clusters without default routes

i'm assuming these are non-kubeadm clusters, so you must be configuring components in them somehow. if you are passing explicit IP address to kube-apiservers in such clusters you are bypassing the kube-apisever IP detection.

if you want to use kubeadm and skip this detection just pass the config with advertiseAddress.

neolit123 commented 3 weeks ago

... which seems to somewhat explain why passing in the Initconfiguration makes kubeadm work on non-default route enabled systems (?), because it uses the InitConfiguration to set its source address?

https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/

--advertise-address string
The IP address on which to advertise the apiserver to members of the cluster. This address must be reachable by the rest of the cluster. If blank, the --bind-address will be used. If --bind-address is unspecified, the host's default interface will be used.
--bind-address string     Default: 0.0.0.0
The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the rest of the cluster, and by CLI/web clients. If blank or an unspecified address (0.0.0.0 or ::), all interfaces and IP address families will be used.

comment is a bit misleading, but this is where the autodetection as per the kubeadm network docs comes into play.

aojea commented 3 weeks ago

@telmich there are a lot of things under the hood that are not easy to infer. I'm happy to talk offline in a meeting or slack so it is easier to ask questions and get context ,

neolit123 commented 3 weeks ago

closing as documentation covers the status quo.

uablrek commented 3 weeks ago

FYI, this is CNI-plugin dependent (so "any" in the header is wrong). Installing without default route on some CNI-plugins:

In all cases a warning is printed:

W0830 05:57:46.291946     279 common.go:199] WARNING: could not obtain a bind address for the API Server: no default routes found in "/proc/net/route" or "/proc/net/ipv6_route"; using: 0.0.0.0

And BTW, the bogus default route (https://github.com/kubernetes/kubeadm/issues/3102#issuecomment-2312174483) doesn't work.