NetworkPolicy on okd 4.5 not working properly

mikonse commented 4 years ago

Hi there, when migrating our okd 4.5 cluster from the openshift sdn to ovn-kubernetes we are running into a couple problems. One of those is that NetworkPolicies do not work properly anymore.

Namespaces without any Policies allow all egress and ingress traffic (as expected), when adding a default policy to allow all ingress traffic no traffic is blocked as well.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-all
spec:
  podSelector: {}
  ingress:
    - {}
  policyTypes:
    - Ingress

When, however, adding a more specific policy, e.g. to only restrict ingress traffic to pods of the same namespace all ingress traffic, both within the namespace and from outside, is blocked. Here is the same namespace policy:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
spec:
  podSelector: {}
  ingress:
    - from:
        - podSelector: {}
  policyTypes:
    - Ingress

So far the only thing I could find out of the ordinary in the ovs and ovn logs is the warning

2020-08-25T11:20:47.424Z|00418|dpif(handler7)|WARN|system@ovs-system: execute ct(commit,zone=5,nat(dst=100.122.5.7:8082)),recirc(0x1209e8) failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=2a:ea:86:b8:88:40,dl_dst=0a:58:64:7a:03:01,nw_src=100.122.3.2,nw_dst=100.123.218.211,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=53898,tp_dst=8082,tcp_flags=syn tcp_csum:3d03 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x5),ct_tuple4(src=100.122.3.2,dst=100.123.218.211,proto=6,tp_src=53898,tp_dst=8082),in_port(3) mtu 0

in one of the open vswitch daemonset agents. For reference the destination ip 100.123.218.211 is that of a service in the namespace the policies are deployed in, 100.122.5.7 that of the pod the service points to and 100.122.3.2 that of the ovn-k8s-mp0 interface on the host node.

Just for completeness, none of the above issues arise when deploying the okd cluster with the openshift sdn network provider instead of ovn-kubernetes.

Did anybody experience something similar or has a pointer on how to debug the issue further?

trozet commented 4 years ago

Thanks @mikonse for filing this and the explanation. We know there are network policy bugs right now see #1624

Assigning this to Vic. One of us will get to the bottom of it soon.

trozet commented 4 years ago

@vpickard FYI

trozet commented 4 years ago

@mikonse I just tried to reproduce this on our latest master (4.6) and I'm able to ping between 2 pods after applying the allow-same-namespace policy. In order to get some more output you can login to your ovnkube-db pod and see the ACLs by doing commands like this:

[root@ovn-control-plane ~]# ovn-nbctl  list port_group
_uuid               : 6a460f08-4a34-4456-aba0-5e5f0dc579d8
acls                : [f9d30604-a4f5-4822-ba67-a191904bb84a]
external_ids        : {name=default_allow-same-namespace}
name                : a17396713385646002071
ports               : [39b1b8a8-5a47-4a78-9aa1-d676a83ae08e, f14b0ede-695d-41e9-a996-f21840b836e3]

Above I can see a port group was created for my 2 pods in my namespace. If i look at the at the ACLs on that port group:

[root@ovn-control-plane ~]# ovn-nbctl  acl-list 6a460f08-4a34-4456-aba0-5e5f0dc579d8
  to-lport  1001 (ip4.src == {$a13691450759723478042} && outport == @a17396713385646002071) allow-related

That ACL references an address set to match for ip source and then allow out the ports in the port group:

[root@ovn-control-plane ~]# ovn-nbctl  list address_set
_uuid               : 0c1e4e70-4d81-4c55-afbc-3af126c5d7db
addresses           : ["10.244.0.6", "10.244.0.7"]
external_ids        : {name=default.allow-same-namespace.ingress.0_v4}
name                : a13691450759723478042

If all of that looks right in your setup the next thing would be to start tracing packets in OVS and looking at lflow-list in ovn southbound. However, since this doesn't look like an issue in 4.6. Is there anyway you could try 4.6 and confirm?

mikonse commented 4 years ago

Ok, so after checking the ACLs for my namespace and its pods I might have found the cause of the problem.

Running ovn-nbctl list port_group lists acls for all of the network policies, including the one for same namespace, however, when looking at the address sets used by those ACLs they only contain IPs of normal Pods, excluding the IPs of privileged Pods running on the host network as DeamonSets, i.e. those having node IPs as Pod IPs.

This behaviour also breaks our default networkpolicy that would allow the openshift-ingress namespace to access all namespaces in order to expose services using the default router, as the default router also runs on the nodes host network.

Is this intended behaviour? At least it is not consistent with how openshift-sdn handles those cases. If so what is the recommended way of dealing with node IP type services/Pods in network policies? Thanks for your help so far!

For reference the address set for the same namespace policy is

_uuid               : bb325d38-bc39-4a9e-b73c-afdb8f7a77ac
addresses           : ["100.122.4.12"]
external_ids        : {name=mynamespace.allow-same-namespace.ingress.0}
name                : a9250372746509955056

while there are deamon set pods running in the same namespace


$ oc get pods -o wide -n mynamespace
NAME                                READY   STATUS    RESTARTS   AGE     IP             NODE      NOMINATED NODE   READINESS GATES
skydive-agent-46ktl                 1/1     Running   0          7d19h   10.0.38.74     master2   <none>           <none>
skydive-agent-4hkvh                 1/1     Running   0          7d19h   10.0.38.69     worker0   <none>           <none>
skydive-agent-72vz7                 1/1     Running   0          7d19h   10.0.38.72     worker3   <none>           <none>
skydive-agent-8rb5m                 1/1     Running   0          7d19h   10.0.38.71     master0   <none>           <none>
skydive-agent-g69sb                 1/1     Running   0          7d19h   10.0.38.75     worker1   <none>           <none>
skydive-agent-l5kr6                 1/1     Running   0          7d19h   10.0.38.73     master1   <none>           <none>
skydive-agent-rvbgb                 1/1     Running   0          7d19h   10.0.38.70     worker2   <none>           <none>
skydive-analyzer-857996967b-9ndxs   2/2     Running   0          7d19h   100.122.4.12   worker2   <none>           <none>

danwinship commented 4 years ago

Ah. NetworkPolicy does not apply to hostNetwork pods.

(In particular, there is no way to distinguish traffic coming from different hostNetwork pods on the same node, so there is no way you could have different policies applying to different hostNetwork pods on the same node.)

In openshift-sdn, NetworkPolicy behaves as though all ingress traffic from outside the pod network (including traffic from node IPs, and thus including traffic from hostNetwork pods) is in the namespace "default". So whatever ingress rules you have that apply to namespace "default" will apply to hostNetwork pods. But this is just an oddity of openshift-sdn that doesn't apply to any other network plugin.

mikonse commented 4 years ago

That makes a lot of sense, thanks for the clarification.

What would be the recommended way of allowing host network pods? Explicitely allow the host IPs in a Policy or is there another flag one could set?

eformat commented 4 years ago

Hi .. i have been testing out a similar issue on 4.6.0-rc4 with OVNKubernetes, (caveat, i know 4.6 is pre-release as is OVNKubernetes), but using a ingress router shard

I'm happy to raise another issue, or do some more debugging on this use case, although not 100% sure where to look now - ACL's debugging in OVN?

oc version
Client Version: 4.6.0
Server Version: 4.6.0
Kubernetes Version: v1.19.0+d59ce34

the 4.6 docs suggest this should work - do the docs hold true for OVNKubernetes sdn as well?

https://docs.openshift.com/container-platform/4.6/networking/network_policy/about-network-policy.html

"If the Ingress Controller is configured with endpointPublishingStrategy: HostNetwork, then the Ingress Controller Pod runs on the host network. When running on the host network, the traffic from the Ingress Controller is assigned the netid:0 Virtual Network ID (VNID). The netid for the namespace that is associated with the Ingress Operator is different, so the matchLabel in the allow-from-openshift-ingress network policy does not match traffic from the default Ingress Controller. Because the default namespace is assigned the netid:0 VNID, you can allow traffic from the default Ingress Controller by labeling your default namespace with network.openshift.io/policy-group: ingress"

so i have been trying the above, using a router shard in my use case

my setup is libvirt/bm UPI install

$ oc get nodes
NAME   STATUS   ROLES           AGE     VERSION
i1     Ready    infra           5h41m   v1.19.0+d59ce34
i2     Ready    infra           6h8m    v1.19.0+d59ce34
m1     Ready    master,worker   8h      v1.19.0+d59ce34
m2     Ready    master,worker   8h      v1.19.0+d59ce34
m3     Ready    master,worker   8h      v1.19.0+d59ce34
w1     Ready    worker          6h8m    v1.19.0+d59ce34
w2     Ready    worker          6h8m    v1.19.0+d59ce34

my router shard:

cat <<EOF | oc apply -f -
apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: red-dmz-shard
    namespace: openshift-ingress-operator
  spec:
    replicas: 1
    domain: apps.red-dmz.eformat.me
    endpointPublishingStrategy:
      type: HostNetwork
    nodePlacement:
      nodeSelector:
        matchLabels:
          network-zone: red
          node-role.kubernetes.io/infra: ""
    routeAdmission:
      wildcardPolicy: WildcardsAllowed
  status: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
EOF

Router Shard Pod on i2:

oc get pods -n openshift-ingress -o wide
NAME                                    READY   STATUS    RESTARTS   AGE    IP               NODE   NOMINATED NODE   READINESS GATES
router-default-5b55b479d9-4rpdg         1/1     Running   0          147m   192.168.140.5    m3     <none>           <none>
router-default-5b55b479d9-vgwpt         1/1     Running   0          148m   192.168.140.3    m1     <none>           <none>
router-red-dmz-shard-84fb76f6d5-grkt5   1/1     Running   0          53m    192.168.140.10   i2     <none>           <none>

Workload test pod on w2

oc get pods -n welcome -o wide
NAME                       READY   STATUS    RESTARTS   AGE    IP            NODE   NOMINATED NODE   READINESS GATES
welcome-789764679f-97f66   1/1     Running   0          139m   10.131.0.10   w2     <none>           <none>

Both default and openshift-ingress namespaces are labelled with:

oc label namespace default network.openshift.io/policy-group=ingress

NetworkPolicy applied to a test namespace called welcome

oc apply -n welcome -f - <<'EOF'
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
spec:
  podSelector:
  ingress: []
EOF

oc apply -n welcome -f - <<'EOF'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: ingress
  podSelector: {}
  policyTypes:
  - Ingress
EOF

Traffic is blocked - i was expecting based on the doco and your comments @danwinship that this could work.

I tried adding other NP and found this additional policy allows the traffic through to the pod via the router shard (as well as the above 2 policies) which is maybe expected - but i couldn't easily narrow down the CIDR anymore

oc apply -n welcome -f - <<'EOF'
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress-red-zone-1
spec:
  ingress:
  - from:
    - ipBlock:
        cidr: 10.0.0.0/8
  podSelector: {}
  policyTypes:
  - Ingress
EOF

Cheers

danwinship commented 4 years ago

As mentioned above, that documentation ("When running on the host network, the traffic from the Ingress Controller is assigned the netid:0 Virtual Network ID (VNID)") is specific to openshift-sdn. There is no way to do this in ovn-kubernetes (or most other plugins) besides using an ipBlock, or by using a policy that matches only the ports, not the IP

eformat commented 4 years ago

thanks @danwinship .. ok, will try to achieve the same with ipBlock see how i go

philipp1992 commented 3 years ago

we also ran into this problem. had to change the openshift-ingress from hostnetwork to nodeportservice. now its working as intended. Openshift documentation was not very helpful as it didnt mention that there are things specific to openshift-sdn that dont work under ovn-kubernetes.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been stalled for 5 days with no activity.

ovn-org / ovn-kubernetes

NetworkPolicy on okd 4.5 not working properly #1635