kopeio / networking

Networking for kubernetes
43 stars 3 forks source link

What is the default transport mode? #24

Open activeshadow opened 6 years ago

activeshadow commented 6 years ago

I built a cluster using kops in AWS across multiple availability zones with the topology option set to private. I'm 99% sure I set the networking option to kopeio-vxlan, but my cluster config yaml just states kopeio. Is the vxlan transport mode the default in this case? If not, can I switch to the vxlan transport mode without rebuilding the cluster?

I only ask because I've been seeing a ton of pod liveness/readiness probe issues, and I also "feel" like networking between my services running in the cluster is not optimal. I say this because I'm seeing a lot of timeouts between services, but my custom service-level metrics are not showing signs of the actual service code running slow.

justinsb commented 6 years ago

Hi - thanks for trying kopeio-vxlan, and sorry for the issues. I think you're hitting https://github.com/kopeio/networking/issues/10 . I'm working on the workaround for that - in fact I just pushed a revised version of the image. You can try changing the image to kopeio/networking-agent:1.0.20171218 by doing kubectl edit ds -n kube-system kopeio-networking-agent (or just kubectl set image -n kube-system ds kopeio-networking-agent "*=kopeio/networking-agent:1.0.20171218"), followed by kubectl delete pod -n kube-system -l name=kopeio-networking-agent to bounce the pods. This changes the configuration so that it preconfigures routes for each node, rather than doing a per-pod router. This has a few advantages: it's much simpler, it's much more similar to other options (like GRE), it means that kopeio-networking isn't needed in the packet path at all (so even if we're crash-looping you should be OK), and it's a little lighter on system resources because of that.

I'm completing validation of this and we'll be updating the manifest soon, but if this fixes things then you know for sure that this is what you were hitting and this was the cause. I'd also expect you to see some log messages similar to the messages in https://github.com/kopeio/networking/issues/10

And then in answer to your specific question:

We call it kopeio-vxlan because the same kopeio code supports other transports, and I'm thinking they should be things like kopeio-ipsec, kopeio-layer2 etc, rather than introduce another flag!

Again sorry about the problems, and if you do try it do let me know how it goes. If it doesn't fix the issues you're seeing, do include any odd log messages from any of the networking pods, and any more details on the networking issues you're seeing.

(Edit: fixed image name)