squat / kilo

Kilo is a multi-cloud network overlay built on WireGuard and designed for Kubernetes (k8s + wg = kg)
https://kilo.squat.ai
Apache License 2.0
2.01k stars 120 forks source link

Reduce amount of processes spawned by Kilo #113

Open SerialVelocity opened 3 years ago

SerialVelocity commented 3 years ago

Kilo currently spawns about 50 processes on my machine per 30s (this is a small cluster). That seems to take considerable CPU resources (it spawns the most processes out of anything on my machine). Especially since it needs to take the xtables lock which causes large slowdowns due to the conflicts with kube-router.

It seems like a few things should be done to speed this up and remove the overhead:

squat commented 3 years ago

These are all great ideas for optimizations @SerialVelocity. 1 is certainly trivial to implement, however it will continue to produce very spiky behavior, albeit less often. 3 would be the next easiest to implement, I suspect. Ideally combining 3 with ipsets would be the best for performance. Maybe future work could even include migrating entirely to nftables and this avoiding having to exec iptables at all thanks to support for netlink updates. xref https://github.com/kubernetes/kubernetes/issues/45385 for related conversation

cc @leonnicolas what do you think about hacking together on 2 or 3?

SerialVelocity commented 3 years ago

@squat There is another option which is to just have blanket rules for the node subnet, and the cluster subnet. That way you only have two rules?

squat commented 3 years ago

There is another option which is to just have blanket rules for the node subnet, and the cluster subnet. That way you only have two rules?

This is tricky because we would also need to have rules for the private IPs of each node, which would not necessarily fall into a single CIDR and thus couldn't be captured by a single rule. It also wouldn't address the issue for the IPs of any additional VPN peers.

I prototyped 3 and found I got a ~75% reduction in CPU usage for a small 3-node cluster. I'll try to clean up the work and post a PR ASAP. @leonnicolas and I can hopefully investigate the ipset solution soon :)

squat commented 3 years ago

btw, if you'd like to try out the test build a made, you can try running this image: squat/kilo:13a25891de81a4ae8a0079285799fce492c06ef4

SerialVelocity commented 3 years ago

This is tricky because we would also need to have rules for the private IPs of each node, which would not necessarily fall into a single CIDR and thus couldn't be captured by a single rule. It also wouldn't address the issue for the IPs of any additional VPN peers.

Oh, right. I have all of my private IPs forced to random IPs for #15 (as I'm not sure the PR fixes my issue)

btw, if you'd like to try out the test build a made, you can try running this image: squat/kilo:13a25891de81a4ae8a0079285799fce492c06ef4

Is there a branch for this? I can rebase my custom code on top to test it (since I use my automatic peer code from https://github.com/SerialVelocity/kilo/tree/kg-peer)

squat commented 3 years ago

I'll try to post tonight 👍 Btw is there any chance you could open a PR to merge your branch into upstream and let the kg agent set up non-node peers? I think it could be pretty cool to have upstream 🎉

SerialVelocity commented 3 years ago

I tried your build on my non-node peers but the amount of debug lines printed caused it to take up CPU as I have a slow HDD.

Unfortunately, I don't have the time to clean it up and submit it. It is really hacky right now.

SerialVelocity commented 3 years ago

Looks like 1 is partially done (30s is probably high enough?) and 3 are done.

Do we still want this open for 2?

squat commented 3 years ago

I would like to avoid adding extra flags for every possible re-sync period. Maybe we can have one global flag that sets the re-sync periods for all of Kilo's control loops? And you're right, 2 should still get done

SerialVelocity commented 3 years ago

What are all the control loops that exist?

squat commented 3 years ago

There are three separate controllers:

The next controller that is coming will be an ipset controller. This and the iproute controller don't really need to check periodically since they can use netlink to watch for updates.

Of all of these controllers, only the iptables and main Kilo controller have an extra timer to force a reconciliation check periodically in addition to when a change is made by us.

SerialVelocity commented 3 years ago

It might be worth adding a "--disable-reconciliation" flag instead that will disable the loops and the netlink watches. When used with something like kube-router, it will cause thrashing as kube-router already seems to recreate the ipsets/iptables constantly (I switched to cilium a couple of weeks ago for that reason)

squat commented 3 years ago

hmm interesting, I don't understand why we have thrashing. Kilo only recreates resources when it's own rules change, not when any other process's resources change. Do you have any more insight into this? It would be really helpful to fix this correctly while still maintaining reconciliation.

SerialVelocity commented 3 years ago

It would be really helpful to fix this correctly while still maintaining reconciliation.

If everything is set up correctly, reconciliation does nothing, right? So having a way to disable the extra logic is probably a nice to have feature.

I don't understand why we have thrashing. Kilo only recreates resources when it's own rules change, not when any other process's resources change

Sorry, by thrashing I mean Kilo is doing extra unnecessary checks whenever there are changes to the resources, not that Kilo will be updating lots of things. The way kube-router seems to work is it creates random suffixes for each iptables chain and ipset name. This means that whenever it updates its rules, there are a lot of changes to the ipsets/iptables. I'm not sure whether this is a bug or just how it was written though. Since iptables is just every X seconds, it should be fine. I'm not sure how netlink watches work though. Can you watch just the Kilo ipsets or do you watch everything and then reconcile? Same for ip routes, can you watch only the Kilo routes?