k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
26.99k stars 2.27k forks source link

Flannel-external-ip is ignored in cloud environments? #10295

Open ludost opened 1 month ago

ludost commented 1 month ago

Environmental Info: K3s Version:

Currently running v1.28.5+k3s1, however the relevant code sample below is from the main branch, so this effects all versions over the last few years.

Node(s) CPU architecture, OS, and Version:

On AWS EC2 instances: Linux host-6f82bbb7-64bd-495a-87f6-d6256171dac6 5.15.117-flatcar #1 SMP Tue Jul 4 14:43:38 -00 2023 x86_64 Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz GenuineIntel GNU/Linux

Cluster Configuration:

Describe the bug:

As documented, there is a Flannel-external-ip flag available to the k3s configuration, that informs the flannel backend to use the ipaddress as provided by node-external-ip config option. However, this flag is ignored if k3s is configured to use an external cloud provider. As shown in: https://github.com/k3s-io/k3s/blob/df5db28a687f5fb6fd0f5cb68873d2184974e953/pkg/agent/run.go#L382 This disallows several use-cases on cloud hosted deployments (e.g. on AWS EC2 hosts as in our case) where some agent nodes are located on user-premises and/or cross-cloud setups.

Steps To Reproduce:

Expected behavior:

I expected the wireguard configuration of flannel to use the public address ("3.72.94.253" in this case) Allowing flannel traffic over the Internet, using wire level security.

Actual behavior:

The actual wireguard configuration still uses the local, private ipaddress as provided by the AWS EC2 server's network interface. This makes routing the flannel traffic over the Internet impossible and insecure.

Additional context / logs:

Wireguard config using the incorrect config: ` interface: flannel-wg public key: (hidden) private key: (hidden) listening port: 51820

peer: PPC1sy2btO8Ihs673FaxPQaFxYbEsMKIM0Oa6gC0TkA= endpoint: 172.16.122.59:51820 allowed ips: 10.42.2.0/24 latest handshake: 46 seconds ago transfer: 2.48 MiB received, 10.37 MiB sent persistent keepalive: every 25 seconds

peer: qWFvdE2fovfNlCEceP5jASeBTBJSBBSr3DuBMCbPj2o= endpoint: 172.16.121.41:51820 allowed ips: 10.42.1.0/24 latest handshake: 1 minute, 42 seconds ago transfer: 210.50 MiB received, 1.41 GiB sent persistent keepalive: every 25 seconds `

brandond commented 1 month ago

As the docs say: https://docs.k3s.io/networking/networking-services#deploying-an-external-cloud-controller-manager

K3s provides an embedded Cloud Controller Manager (CCM) stub that does the following:

Sets node InternalIP and ExternalIP address fields based on the --node-ip and --node-external-ip flags.

If you disable the built-in cloud-controller, K3s no longer has a native integration point to set the external IPs. This is instead handled by whatever infrastructure provider specific cloud controller you deploy. Since those are not integrated into K3s's embedded flannel, you'll need to manually set any additional annotations necessary to inform Flannel about those IPs.

brandond commented 1 month ago

This disallows several use-cases on cloud hosted deployments (e.g. on AWS EC2 hosts as in our case) where some agent nodes are located on user-premises and/or cross-cloud setups.

To be clear, you're trying to use this to manage a hybrid deployment? You have some nodes that you want to be managed by the AWS CCM, and have the node and flannel use the external IPs set by that CCM, while other nodes are managed by the K3s CCM, and have the node and flannel use the external IPs set by the --node-external-ip flag?

This sort of thing isn't really allowed for by the cloud provider model, it is generally expected that all nodes in the cluster will be managed by the same CCM. It would take some additional work to make K3s set the flannel external IP annotations based on the node external IP provided by another CCM, and ensure that flannel starts up AFTER that CCM has already had a chance to initialize the node.

I don't even know how the AWS CCM will handle presence of non-aws nodes in the cluster.

brandond commented 1 month ago

cc @manuelbuil I think this would require

  1. Ensure that flannel doesn't start until after the cloud-provider uninitialized taint has been removed, same as the netpol controller I don't think this will break anything? Might make startup a small bit slower I guess.
  2. Move the flannel annotation setters out of the main agent startup, into the flannel setup code. This is probably good anyway because there's no point in setting these if flannel is disabled...
  3. Set the flannel external-ip-overwrite annotation values based on the node external IPs, instead of the CLI flag value. Honestly I'm not even sure why we need to set these if the external IPs are set properly; what is the point of these again? Were we just doing this because flannel was getting started before the external IPs were set by the embedded cloud provider?
ludost commented 1 month ago

Thanks for looking at this seriously. It would help us significantly, as described below:

To be clear, you're trying to use this to manage a hybrid deployment? You have some nodes that you want to be managed by the AWS CCM, and have the node and flannel use the external IPs set by that CCM, while other nodes are managed by the K3s CCM, and have the node and flannel use the external IPs set by the --node-external-ip flag?

Actually our setup is somewhat simpler: We have a cluster on AWS, to which we want to add remote nodes (k3s agents) which run on local workstations. Basically, we like to "federate" the cluster to local workstations. This works well enough, using bootstrap tokens, tightly controlled Flannel configuration (=which is where the current issue pops up), etc. We're even quite a few steps towards running the local k3s agent in a rootless setup.

Just for completion: our aws-cloud-controller-manager is configured to not do any network configuration inside the cluster: --allocate-node-cidrs=false --configure-cloud-routes=false

It would take some additional work to make K3s set the flannel external IP annotations based on the node external IP provided by another CCM, and ensure that flannel starts up AFTER that CCM has already had a chance to initialize the node.

I'm not familiar enough with the peculiarities, but setting the IP annotations based on the CLI arguments, given during startup, seems independent from whether there is an external CCM or not? Or am I'm fully missing the point here?

2 Move the flannel annotation setters out of the main agent startup, into the flannel setup code. This is probably good anyway because there's no point in setting these if flannel is disabled...

This seems like a good change, these annotations are very Flannel-specific by nature.

For the short term we've solved this issue by manually deploying Flannel as an CNI-plugin daemonset, as described at: https://github.com/flannel-io/flannel This ensures the node is started before Flannel initializes. However, passing the correct External-IP address to that setup is also non-trivial, especially in a rootless configuration. Just having these CLI arguments working would simplify our setup significantly.

brandond commented 1 month ago

We're even quite a few steps towards running the local k3s agent in a rootless setup.

How exactly are you accomplishing that? All the CNI-related stuff seemed pretty broken last time I tried to get it working rootless.

passing the correct External-IP address to that setup is also non-trivial

Even if K3s did set the annotations for you when you're not using our cloud-provider, you'll still need to properly set the external IPs for each node, right?

ludost commented 4 weeks ago

How exactly are you accomplishing that? All the CNI-related stuff seemed pretty broken last time I tried to get it working rootless.

Yes, it's a bit of a mess. Basically that's the next issue to tackle: in the current rootless client options, k3s hardcodes the list of copy-up dirs, which is missing /opt/cni as an entry. Which means that the kube-flannel based CNI plugin can't create that folder. So we moved the: cni-bin-dir to /run/opt/cni/bin. But as you know that's another set of annoying configuration changes that need to be done, both on the containerd and kubelet side. (but let's handle one issue at a time:)

Even if K3s did set the annotations for you when you're not using our cloud-provider, you'll still need to properly set the external IPs for each node, right?

In effect, the external IPs are a given by the host you're running K3s on. In the case of the AWS servers, they are part of the EC2 setup, and can be obtained from inside the host. Similarly, at the remote k3s agent's hosts, we can just determine the Public IP addresses as a pre-given. Effectively we tell K3s what the external-ip is, expecting K3s to just sec pass this along to Flannel (and Flannel would pass it along to WireGuard).

An alternative setup would be to manually (outside K3s) setup a WireGuard network, and tell Flannel to directly use that pre-configured network. But I really like the dynamic setup of Wireguard that Flannel provides out-of-the-box.

manuelbuil commented 3 weeks ago

I don't think this will break anything? Might make startup a small bit slower I guess.

I don't think so either

3. Were we just doing this because flannel was getting started before the external IPs were set by the embedded cloud provider?

Yes, very likely. The public-ip-overwrite annotation needs to be there before flanneld starts, otherwise it will pick the node-ip as the public-ip

manuelbuil commented 3 weeks ago

By reading my issue https://github.com/k3s-io/k3s/issues/6177, I can confirm that we decided to set them as part of the cloud provider so that they are ready before flanneld is started

ludost commented 3 weeks ago

Although I still believe that moving these annotations is a good enhancement, our use-case has weakened a bit: We've decided to not use the AWS-CCM, but to revert back to the k3s-inbuilt-CCM. That AWS version wasn't actually doing anything anymore for us, given that we pre-set the ipaddresses already and were not using "cloud-routes", etc. The AWS-CCM was actually working against our use-case, as it introduced a race-condition where the CCM would remove our remote nodes before they could become Ready.

So, as a consequence, the Flannel-external-ip flag is currently not a problem for us anymore. However, if this issue is worked on and fixed, I will still be able to provide test results and feedback.

Just as a side-note (and if you like I can try a full write-up of how we achieved this): We now have a fully functioning setup with remote k3s agent nodes, running rootless, with NVIDIA and X11 support. Access is controlled by bootstrap tokens. All flannel traffic is encrypted through Wireguard.

brandond commented 3 weeks ago

The AWS-CCM was actually working against our use-case, as it introduced a race-condition where the CCM would remove our remote nodes before they could become Ready.

Yeah, that sort of thing is what I've seen in the past, and was what I was alluding to with

This sort of thing isn't really allowed for by the cloud provider model, it is generally expected that all nodes in the cluster will be managed by the same CCM.

rbrtbnfgl commented 3 weeks ago

The public-ip-overwrite annotation is not configuring the interface used by flannel to create the tunnel the used interface is the one configured by flannel-iface flag. From that part of the code seems only used to advertise the other nodes that to reach that node with the tunnel created by flannel they have to use the node-external-ip instead of the node-ip.