Open activeshadow opened 6 years ago
Hi - thanks for trying kopeio-vxlan, and sorry for the issues. I think you're hitting https://github.com/kopeio/networking/issues/10 . I'm working on the workaround for that - in fact I just pushed a revised version of the image. You can try changing the image to kopeio/networking-agent:1.0.20171218
by doing kubectl edit ds -n kube-system kopeio-networking-agent
(or just kubectl set image -n kube-system ds kopeio-networking-agent "*=kopeio/networking-agent:1.0.20171218"
), followed by kubectl delete pod -n kube-system -l name=kopeio-networking-agent
to bounce the pods. This changes the configuration so that it preconfigures routes for each node, rather than doing a per-pod router. This has a few advantages: it's much simpler, it's much more similar to other options (like GRE), it means that kopeio-networking isn't needed in the packet path at all (so even if we're crash-looping you should be OK), and it's a little lighter on system resources because of that.
I'm completing validation of this and we'll be updating the manifest soon, but if this fixes things then you know for sure that this is what you were hitting and this was the cause. I'd also expect you to see some log messages similar to the messages in https://github.com/kopeio/networking/issues/10
And then in answer to your specific question:
We call it kopeio-vxlan because the same kopeio code supports other transports, and I'm thinking they should be things like kopeio-ipsec, kopeio-layer2 etc, rather than introduce another flag!
Again sorry about the problems, and if you do try it do let me know how it goes. If it doesn't fix the issues you're seeing, do include any odd log messages from any of the networking pods, and any more details on the networking issues you're seeing.
(Edit: fixed image name)
I built a cluster using kops in AWS across multiple availability zones with the topology option set to private. I'm 99% sure I set the networking option to
kopeio-vxlan
, but my cluster config yaml just stateskopeio
. Is the vxlan transport mode the default in this case? If not, can I switch to the vxlan transport mode without rebuilding the cluster?I only ask because I've been seeing a ton of pod liveness/readiness probe issues, and I also "feel" like networking between my services running in the cluster is not optimal. I say this because I'm seeing a lot of timeouts between services, but my custom service-level metrics are not showing signs of the actual service code running slow.