squat / kilo

Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes (k8s + wg = kg)
https://kilo.squat.ai
Apache License 2.0
2.01k stars 120 forks source link

Allow disabling internalIP routing via arg #370

Open sepich opened 11 months ago

sepich commented 11 months ago

Thanks for Kilo! This is somehow related to https://github.com/squat/kilo/pull/127 We use cloud instances with kilo, where nodes come and go, and there is no way to set annotation on them before kilo starts. Default way of work leaves the way it is now, but adding cli arg do disable internalIP routing to wg at all.

squat commented 11 months ago

Hi @sepich :) thanks for the PR and for the additional cleanup! Setting the CLI flag on the DaemonSet to disable routing packets over internal IPs entirely effectively turns the mesh into a full-mesh, where all nodes are connected only via WireGuard tunnels using their public IPs.

If that's the case, why not change the Kilo topology to use a full mesh?

sepich commented 11 months ago

Sorry if I missing something. We have nodes which have both internal and external IPs at same time. We do not need nodes in different locations to know each others internalIPs, we want full mesh over external IP, and start kilo like this:

            - --kubeconfig=/etc/kubernetes/kubeconfig
            - --hostname=$(NODE_NAME)
            - --cni=false
            - --compatibility=cilium
            - --local=false
            - --subnet=...
            - --mesh-granularity=full
            - --route-internal-ip=false
            - --ipam=cilium-cluster-pool
            # dont disturb pods on redeploy
            - --clean-up=false
            - --clean-up-interface=false

Without --route-internal-ip=false I still see internal IPs in wg show and in ip r which complicate things. Just kilo subnet and podCIDRs are enough.

squat commented 11 months ago

ah I see! You really just want to have no knowledge of internal IPs at all. Hmm. Does the discovery of internal IPs break things for your network? Maybe internal IPs are reused and that causes clashes in the routes or WG config? Why is having unused internal IPs in your routes bad? Basically I want to figure out what the benefit of disabling the discovery / routing of internal IPs is and if it is worth the complexity of adding another flag for configuring Kilo. The fewer configuration options the better :)

sepich commented 11 months ago

if it is worth the complexity of adding another flag for configuring Kilo

We have several locations, and in each location there is internal subnet where nodes can communicate with each other. So, there are 3 IPs on nodes - private, public and kilo. On top of kilo0 we run cilium in "direct routing mode" (no encapsulation). We can specifically decide which interface to use. Like setting --node-ip={{kilo0.ip}} for kubelet. Or running static etcd manifests on internal IPs. Now imagine what's happens when all internal IPs got announced via kilo0. That means etcd nodes having fast internal interconnection would be encapsulated to VPN. That seems like unneeded complexity, and when one runs something in prod - he wants it to be as simple as possible. I'd definitely choose one more cli arg, then troubleshooting all this routing/firewall mess.

And again, maybe I misunderstand some of your concepts. But unfortunately I have to +1 for https://github.com/squat/kilo/issues/342 as current docs are just marketing materials. There are no any technical details, and I have to read code to be able to configure current setup. Would be nice to understand design decisions in each case, like "announce internal IPs".

squat commented 11 months ago

That means etcd nodes having fast internal interconnection would be encapsulated to VPN.

Thanks for explaining. Yes, that's indeed a really good thing to avoid! Normally this would be avoided by putting all of the nodes in the same "location" so that Kilo is knows about packets to and from the address but doesn't tunnel them over the VPN.

Is there a reason why you can't rely on Kilo topologies to automatically place your cloud nodes into the same Kilo location and in this was avoid encapsulating traffic to internal IPs within a location? You mentioned that your cloud nodes come up and down and that you can't annotate them beforehand, however, what about simply relying on the Kubernetes-standard topology labels that are set by the cloud-controller-manager [0]? If you are not running a cloud-controller-manager on your cloud nodes, then that explains why you don't have access to topology labels and why Kilo can't use them to determine the topology of the mesh.

Sorry for insisting further; I'm hoping to really understand the details of your deployment to see how to best improve Kilo for you and everyone else.

[0] https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesioregion

sepich commented 11 months ago

Is there a reason why you can't rely on Kilo topologies

Sorry, but what difference it should make? I've set kilo.squat.ai/location to same value to nodes in same location. And I still see kilo creates internalIP routes to wg on these nodes. I use --mesh-granularity=full as I don't want some single leader to be elected, I need full-mesh.

squat commented 11 months ago

The difference is that when you rely on the standard topology labels you don't need to manually annotate nodes to specify a location, which sounded like an annoying source of toil for your cluster.

Ideally, when you let the cloud controller manager set the topology labels, then your Kilo topology can be configured automatically and you can let internal IPs be known to Kilo, because packets within a location won't be tunneled over the VPN.

However, it sounds like you have an additional constraint, which is that you don't want to rely on having a single leader to act as the gateway for a group of nodes. However, if all nodes within a location could act as gateways for their own traffic to other locations, then this would be satisfactory for you. Is that right?

In that case, it sounds like the ideal scenario would be to merge https://github.com/squat/kilo/pull/328, which allows for all nodes in a location to be gateways and traffic internal to a location is not encapsulated. This eliminates issues of network bandwidth limitations and alleviates concerns of downtime, between when a leader becomes unresponsive and another node is elected the leader.

Until that is merged, I agree that having an option to never tunnel traffic to internal IPs would be helpful.