Weave: NetPol issue with namespaceSelector across different hosts

tipruzs commented 5 years ago

What happened:

On fresh cluster with weave networking enabled (default pod/service cidr) created 2 namespaces (A,B) and a simple network policy, which enables all pods from namespace B (namespaceSelector) to connect to a pod in namespace A.

At the end, the connection requests from namespace B are getting dropped from weave npc.

What you expected to happen:

The connection requests from namespace B should be successful

How to reproduce it (as minimally and precisely as possible):

create new cluster (no addons required, 1 master, 2 nodes) with default pod/service cidr
create 2 namespaces (A,B) and label them
deploy an app in each namespace and label them
create nodeaffinity rule to separate apps over the 2 cluster nodes
create service for app in namespace A (clusterIP)
create networkpolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: monitoring
  namespace: namespace-A
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: application
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          app.kubernetes.io/name: monitoring
    ports:
    - protocol: TCP
      port: 9273

try to connect from app in namespace B (e.g. nc) to app in namespace A using service name

Anything else we need to know?:

According to kubernetes docs there should be no snat for clusterIP services. However weave npc logs show blocked requests from weave network interface, instead of the source pod interface: WARN: 2019/04/22 10:10:57.664451 TCP connection from 10.46.0.0:32812 to 10.40.0.3:9273 blocked by Weave NPC

As far as i can see, there are two possible issues:

snat for clusterIP which should not be there
networkpolicy with namespaceSelector does not work (perhaps because of snat)

Environment:

pharos 2.3.6+oss
Hetzner Cloud VMs
Ubuntu 18.04.2

cluster.yml:

hosts:
  - address: 1.2.3.4
    role: master
  - address: 1.2.3.5
    role: worker
    labels:
      node-pool.kubernetes.io: app1
  - address: 1.2.3.6
    role: worker
    labels:
      node-pool.kubernetes.io: app2
name: pharos-cluster
network:
  provider: weave
  dns_replicas: 1
  node_local_dns_cache: true
  service_cidr: 10.96.0.0/12
  pod_network_cidr: 10.32.0.0/12
kubelet:
  feature_gates:
    CSINodeInfo: true
    CSIDriverRegistry: true
control_plane:
  feature_gates:
    CSINodeInfo: true
    CSIDriverRegistry: true

tipruzs commented 5 years ago

The issue seems to occur only, if the connection attempt is made from a pod on a different host than the target pod is located.

jnummelin commented 5 years ago

Is the service you are using a plain ClusterIP type?

You are correct, there should not be any SNAT in place, but the behaviour and logs now tell otherwise. 🤔

Check kube-proxy pods and config so that there's no weird masquerade-all type of setting.

$ kubectl -n kube-system get configmaps kube-proxy -o yaml

should have:

iptables:
      masqueradeAll: false

tipruzs commented 5 years ago

Yes, the service ist plain clusterIP. I already checked the kube-proxy settings and followed the weave troubleshooting guide. I could not find any configuration mistake.

In the meantime i created a new lab cluster env without addons, but the same pharos config. In this lab, i'm not able to reproduce the issue. The only difference between these two environments is, that lab is plain 2.3.6 release and prod (where the issue occurs) is an upgraded setup.

Are there any stale iptables rules coming from an older release/component? I saw a few hints, that you made changes for firewalld regarding some nating rules, but it's not really clear to me, but this actually affects.

jnummelin commented 5 years ago

yes, we've been making some changes to FW NAT rules, but those should not affect you at all since you do not have pharos managed FW setup in cluster.yml.

If you could identify the SNAT rule from iptables maybe it might tell where it came from. at least rules coming from kube are usually commented.

Pharos itself does not directly configure any iptables, other than FW. Kube components, especially proxy, of course do.

prod (where the issue occurs) is an upgraded setup.

Could you share the upgrade path you've used? I mean where you started at and what versions have been used in between.

tipruzs commented 5 years ago

You'r right, the cluster.yaml i provided does not include the firewall part, perhaps these lines got lost during copy/paste.

  firewalld:
    enabled: true

I created the cluster based on release 2.3.3 and applied every minor update (2.3.3 -> 2.3.4 -> 2.3.5 -> 2.3.6).

I'll try to find the corresponding iptables lines using a diff between these two environments.

tipruzs commented 5 years ago

I checked the iptables dump of a worker node in both environments and found a few differences. here are the lines which i could not find in the lab env.

*nat
-A POST_public_allow ! -o lo -j MASQUERADE

*filter
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A DOCKER-USER -j RETURN

*filter
-A FWDO_public_allow -m conntrack --ctstate NEW -j ACCEPT

do you have any idea where these lines comes from and if this could be the reason for my issue?

jnummelin commented 5 years ago

-A POST_public_allow ! -o lo -j MASQUERADE

that's definitely from firewalld rules and the one that causes the issue. I'm not yet 100% sure, but I think there's a bug how 2.3.6 tries to remove masquerade from firewalld. as you saw, for fresh 2.3.6 install masquerade is never configured, so it works as expected.

while waiting for a fix, I was able to make the source IP's to work as expected by running:

pharos ssh -c cluster.yml --tf-json tf.json "firewall-cmd --remove-masquerade --permanent && firewall-cmd --reload"

that forces "unloading" of fwd masquerade.

Note: reloading fwd is known to sometimes break up existing connections. usually pharos reloads fwd via weave/calico daemonsets, so reloading it manually might have some side-effects on that front too.

tipruzs commented 5 years ago

Thanks for investigating. I can confirm that source ip is back, as soon as this iptables rule is removed. The other iptables rules (DOCKER-USER) come from docker engine. They should not be there, but this is actually tracked as a bug here https://github.com/moby/moby/issues/35777

kontena / pharos-cluster

Weave: NetPol issue with namespaceSelector across different hosts #1302