Open SerialVelocity opened 3 years ago
These are all great ideas for optimizations @SerialVelocity.
1 is certainly trivial to implement, however it will continue to produce very spiky behavior, albeit less often. 3 would be the next easiest to implement, I suspect. Ideally combining 3 with ipsets would be the best for performance. Maybe future work could even include migrating entirely to nftables and this avoiding having to exec iptables
at all thanks to support for netlink updates.
xref https://github.com/kubernetes/kubernetes/issues/45385 for related conversation
cc @leonnicolas what do you think about hacking together on 2 or 3?
@squat There is another option which is to just have blanket rules for the node subnet, and the cluster subnet. That way you only have two rules?
There is another option which is to just have blanket rules for the node subnet, and the cluster subnet. That way you only have two rules?
This is tricky because we would also need to have rules for the private IPs of each node, which would not necessarily fall into a single CIDR and thus couldn't be captured by a single rule. It also wouldn't address the issue for the IPs of any additional VPN peers.
I prototyped 3 and found I got a ~75% reduction in CPU usage for a small 3-node cluster. I'll try to clean up the work and post a PR ASAP. @leonnicolas and I can hopefully investigate the ipset solution soon :)
btw, if you'd like to try out the test build a made, you can try running this image: squat/kilo:13a25891de81a4ae8a0079285799fce492c06ef4
This is tricky because we would also need to have rules for the private IPs of each node, which would not necessarily fall into a single CIDR and thus couldn't be captured by a single rule. It also wouldn't address the issue for the IPs of any additional VPN peers.
Oh, right. I have all of my private IPs forced to random IPs for #15 (as I'm not sure the PR fixes my issue)
btw, if you'd like to try out the test build a made, you can try running this image: squat/kilo:13a25891de81a4ae8a0079285799fce492c06ef4
Is there a branch for this? I can rebase my custom code on top to test it (since I use my automatic peer code from https://github.com/SerialVelocity/kilo/tree/kg-peer)
I'll try to post tonight 👍 Btw is there any chance you could open a PR to merge your branch into upstream and let the kg agent set up non-node peers? I think it could be pretty cool to have upstream 🎉
I tried your build on my non-node peers but the amount of debug lines printed caused it to take up CPU as I have a slow HDD.
Unfortunately, I don't have the time to clean it up and submit it. It is really hacky right now.
Looks like 1 is partially done (30s is probably high enough?) and 3 are done.
Do we still want this open for 2?
I would like to avoid adding extra flags for every possible re-sync period. Maybe we can have one global flag that sets the re-sync periods for all of Kilo's control loops? And you're right, 2 should still get done
What are all the control loops that exist?
There are three separate controllers:
The next controller that is coming will be an ipset controller. This and the iproute controller don't really need to check periodically since they can use netlink to watch for updates.
Of all of these controllers, only the iptables and main Kilo controller have an extra timer to force a reconciliation check periodically in addition to when a change is made by us.
It might be worth adding a "--disable-reconciliation" flag instead that will disable the loops and the netlink watches. When used with something like kube-router, it will cause thrashing as kube-router already seems to recreate the ipsets/iptables constantly (I switched to cilium a couple of weeks ago for that reason)
hmm interesting, I don't understand why we have thrashing. Kilo only recreates resources when it's own rules change, not when any other process's resources change. Do you have any more insight into this? It would be really helpful to fix this correctly while still maintaining reconciliation.
It would be really helpful to fix this correctly while still maintaining reconciliation.
If everything is set up correctly, reconciliation does nothing, right? So having a way to disable the extra logic is probably a nice to have feature.
I don't understand why we have thrashing. Kilo only recreates resources when it's own rules change, not when any other process's resources change
Sorry, by thrashing I mean Kilo is doing extra unnecessary checks whenever there are changes to the resources, not that Kilo will be updating lots of things. The way kube-router seems to work is it creates random suffixes for each iptables chain and ipset name. This means that whenever it updates its rules, there are a lot of changes to the ipsets/iptables. I'm not sure whether this is a bug or just how it was written though. Since iptables is just every X seconds, it should be fine. I'm not sure how netlink watches work though. Can you watch just the Kilo ipsets or do you watch everything and then reconcile? Same for ip routes, can you watch only the Kilo routes?
Kilo currently spawns about 50 processes on my machine per 30s (this is a small cluster). That seems to take considerable CPU resources (it spawns the most processes out of anything on my machine). Especially since it needs to take the xtables lock which causes large slowdowns due to the conflicts with kube-router.
It seems like a few things should be done to speed this up and remove the overhead: