squat / kilo

Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes (k8s + wg = kg)
https://kilo.squat.ai
Apache License 2.0
2.03k stars 123 forks source link

[Question] kilo + flannel #150

Open khalid-huang opened 3 years ago

khalid-huang commented 3 years ago

Hello everyone, I am curious about the sentence in the kilo docs “for example, Flannel for networking, Kilo can be installed on top to enable pools of nodes in different locations to join; Kilo will take care of the network between locations, while Flannel will take care of the network within locations.” How does the kilo implement this?

tetricky commented 3 years ago

I have a question that fits with this. Is it possible to enable kilo for vpn and cluster-cluster services, over the top of flannel with wireguard backend?

This https://gitlab.freedesktop.org/freedesktop/helm-gitlab-config/-/blob/master/gitlab-k3s-provision/deploy/kilo-kustomize/overlays/default/kilo.yaml seems to indicate that it can, but I can't work it out.

squat commented 3 years ago

Hi @khalid-davis Kilo implements this in a similar way to how you can setup WireGuard on your laptop without interfering with your laptop's networking stack, e.g. systemd-networkd, which manages DHCP and sets up routes for communicating with other machines in your home LAN. In other words, the functionality that Kilo adds is purely additive: Kilo uses the Kubernetes API to find what Nodes, and their corresponding PodCIDRs, are in disconnected locations and deterministically sets up the IP routes to route packets to those CIDRs over the WireGuard link.

Implementing this functionality in Kilo is in some ways a reduction of Kilo's responsibilities:

Instead, we can expect that Flannel will take care of these tasks. Kilo does need to know the PodCIDR for the node and needs to know the interface name and IP address for the Flannel overlay device in order to correctly set up routes and iptables rules. At the end of the day, this implementation is abstracted away by the Encapsulator interface in Kilo: https://github.com/squat/kilo/blob/81d6077fc20b4c317a76b2a35dba42cb818ab5b5/pkg/encapsulation/encapsulation.go#L47-L55

squat commented 3 years ago

@tetricky yes Using Kilo for VPN functionality, including multi-cluster services, on top of Flannel is absolutely possible and is in fact simpler technically for Kilo than setting up a single, multi-region Kubernetes cluster on Flannel.

What problems are you running into?

tetricky commented 3 years ago

I have a k3s cluster installed with flannel, using a wireguard backend. If I look at the wireguard interfaces, I have a flannel.1 interface, with each node showing a peered connection to each other node.

If I deploy the kilo-k3s-flannel.yaml, and inspect the nodes wireguard interfaces, then as well as the flannel.1 interface I see a kilo interface. This does not show as connected to any peers on any node.

At this point I've not got past that.

It seemed likely to me that kilo probably needed to share the flannel.1 interface, and I probably needed tell it how to do that somehow...but it was just a vague thought. I don't really understand what's happening.

squat commented 3 years ago

Great, thanks for the info @tetricky :)

All the testing for Kilo on top of Flannel has been performed using the default encapsulation backend, namely VXLAN, as this represents the vast majority of Flannel installations. That said, Kilo should work with most Flannel backends. However, the WireGuard Flannel backend is a major exception because unlike other encapsulation technologies, WireGuard authenticates all packets that it processes, meaning that Kilo cannot route packets through the Flannel interface unless it is also managing Flannel's WireGuard interface. Unfortunately, it is not possible to have both Kilo and Flannel manage the same WireGuard interface because they would clash with one another, as neither would have a complete view of all of the WireGuard peers that should be configured.

For now, running Kilo on Flannel is only expected to work on non-authenticated backends, i.e. not IPSec and not WireGuard.

Can I ask what draws you to the WireGuard backend for Flannel? Flannel is a no-frills, dead-simple, everything-just-works networking provider and running Flannel with the WireGuard backend is functionally equivalent to running Kilo with a full mesh except Flannel does not bring the added VPN features. Is it not possible to run only Kilo in your setup or to run Flannel with the default backend and Kilo on top?

tetricky commented 3 years ago

I think essentially I've got myself into a non-virtuous circle.

I have three independent nodes, same data centre, but independent single ip's (dedibox - paris). K3s installed. Ideally I would be running kilo, without flannel, but I had trouble connecting agent nodes to the master, and getting networking running. So I deployed the cluster using flannel (easy) and chose the wireguard backend because I wanted encrypted traffic.

So now I'm faced with making do without kilo (sub optimal as I have other clusters I want to connect services to with cluster-cluster), or re-installing with kilo.

The problem I seem to have with v1.20.6+k3s1 is that if I install a single master (--disable servicelb,traefik,flannel) define CIDR's (--cluster-cidr 10.12.0.0/16 --service-cidr 10.13.0.0/16 - to be different from other clusters I wish to connect to), copy k3s.yaml to the agent nodes, I can't get the agent nodes to connect and network. I am stuck in a loop of master/agent/kilo install that works.

...and the flannel backend with wireguard was partially me showing to myself that the firewall, network, and port settings did actually allow things to work.

I'd be willing to have another go, if I could get a working procedure that allowed me to add nodes...but I haven't got that...

squat commented 3 years ago

I can't get the agent nodes to connect and network

This seems like there must be either a bug somewhere or a documentation problem. Can you share more details about this point? They don't connect to the API? Or they do but later the pod network does not work? BTW, are you aiming for a full-mesh?

tetricky commented 3 years ago

I think it might be a consequence of k3s development. I've executed a tear down, and I'm building it back up. Now on a later version, using the same server platform, I've been able to cluster the nodes using v1.20.7+k3s1.

I'm going to try a three master node, embedded etcd, setup. I have the three master nodes connected and showing the correct INTERNAL-IP and EXTERNAL-IP. Because it's a multi-master there is a valid k3s.yaml in the right place (pointing to https://127.0.0.1:6443 for each of the master servers).

I've annotated the nodes, and am ready for a kilo deployment as cni....but it occurs to me. Would the k3s-kilo.yaml deploy in this configuration with an embedded etcd cluster, or is this out of scope for kilo (given that it's a relatively new development)?

---edit---

I seem to have answered my own question. I've deployed kilo with --mesh-granularity=full, and everything seems to have come up as expected. kgctl shows the three master nodes with full mesh, and showconfig has each master node peered with the other master nodes.

I will test it.

Not an issue at the moment, but what is the process for adding worker nodes after the deploy (or even further master nodes)?

Note: This is no longer appropriate for the original issue, as it is now kilo as cni, not kilo over flannel.

stv0g commented 3 years ago

I think the insight @squat gave in this issue should be integrated in the documentation (see #165)

Other than that, I guess we can close the issue?