kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.18k stars 6.48k forks source link

Calico policies not working - remove --masquerade-all from kube-proxy #1012

Closed m-ferrero closed 7 years ago

m-ferrero commented 7 years ago

BUG REPORT

**Kargo version (commit) (git rev-parse --short HEAD): 2f88c9e

Network plugin used: calico

with

kube_network_plugin: calico enable_network_policy: true

with a modified kargo using: kubernetes 1.5.2 calico and calico cni: 1.0.2 calico-policy-controller: 0.5.2

calico policies (tested with stars demo: http://docs.projectcalico.org/v2.0/getting-started/kubernetes/tutorials/stars-policy/) will not work correctly:

before isolation is enabled managment ui will see all other pods and show them connected after isolation is enabled and policies are enforced, management ui will see only pods on same node calico team helped me (thanks Shaun Crampton) to track the problem to the use of --masquerade-all by kube-proxy

by removing that flag from kube-proxy command line (just to be sure I deleted all stars demo pods, restarted the cluster and recreated the demo) the problem is solved. Outgoing nat from the cluster is still done, since calico ip pool has nat enabled.

I see that in roles/kubernetes/node/defaults/main.yml there is a kube_proxy_masquerade_all: true that drives the proxy manifest to have --masquerade-all included

is it possible to override kube_proxy_masquerade_all: false in another file like inventory/group_vars/all.yml

thanks Best regards Massimiliano Ferrero

mattymo commented 7 years ago

That is just a config mistake. We need masquerade for foreign traffic, not all. Adding "--cluster-cidr=10.233.64.0/18" to the config fixed the issue completely. I'll take care of this.

fasaxc commented 7 years ago

FYI, Calico's --nat-outgoing does NAT for Calico-networked containers only for traffic that's leaving the calico pool so, for Calico-networked pods, that's what you need to get external connectivity.

It's important that traffic between Calico pods is not SNATed even if it's going via a service IP or the destination pod's policy will see the wrong IP.

m-ferrero commented 7 years ago

Hello @mattymo I have seen that in the patch you set kube_proxy_masquerade_all to false as default I encountered a problem only when using calico and network policies

If enabled Calico will do snat, if not disabling --masquerade-all in kube-proxy won't create problems to the cluster? or --cluster-cidr=10.233.64.0/18 causes snat to be done for packets leaving the cluster anyway?

thanks

adidenko commented 7 years ago

Hi @fasaxc, yes, kargo enables --nat-outgoing by default via nat_outgoing config option. We've enabled --masquerade-all to fix issue #524 - so it's not about outgoing NAT, it's about cluster IPs. We've also tested network policies after that - everything worked fine. We need to test patch from Matthew on a setup from issue #524 to make sure we don't introduce regression.

fasaxc commented 7 years ago

@adidenko I'm a bit confused by #524; sounds like

Why does the packet get routed out of eth1 when your Calico network in on eth0? If the pod IP is a Calico IP, you should have a route for that IP via eth0, which should be more specific than the default route on eth1. Calico's NAT-outgoing feature is implemented by using the masquerade iptables rule so the kernel picks the IP of the interface that it's about to send the packet out of.

adidenko commented 7 years ago

@fasaxc not exactly (there's a typo in issues description, sorry about that, I've fixed description), here's how it goes:

adidenko commented 7 years ago

Tested on similar virtual env and with #1013 patch - looks good. No NAT for internal traffic, POD IPs in src/dst, everything looks good now.