kubernetes-sigs / kube-network-policies

Kubernetes network policies
Apache License 2.0
34 stars 11 forks source link

Doesn't work with proxy-mode=ipvs #46

Closed uablrek closed 2 months ago

uablrek commented 3 months ago

Reproduce by setting proxy-mode=ipvs and run e2e. Example:

With proxy-mode=iptables (or nftables):

export FOCUS="\[sig-network\].*enforce.policy.based.on.PodSelector.with.MatchExpressions.*\[Feature:NetworkPolicy\].*"
# (run e2e)
Ran 1 of 6652 Specs in 9.191 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 6651 Skipped
PASS

Same with proxy-mode=ipvs:

  I0630 16:31:00.978005 207680 reachability.go:180] expected:

  -             netpol-x-6644/a netpol-x-6644/b netpol-x-6644/c netpol-y-7770/a netpol-y-7770/b netpol-y-7770/c netpol-z-2789/a netpol-z-2789/b netpol-z-2789/c
  netpol-x-6644/a       X               .               .               .               .               .               .               .               .
  netpol-x-6644/b       .               .               .               .               .               .               .               .               .
  netpol-x-6644/c       X               .               .               .               .               .               .               .               .
  netpol-y-7770/a       X               .               .               .               .               .               .               .               .
  netpol-y-7770/b       X               .               .               .               .               .               .               .               .
  netpol-y-7770/c       X               .               .               .               .               .               .               .               .
  netpol-z-2789/a       X               .               .               .               .               .               .               .               .
  netpol-z-2789/b       X               .               .               .               .               .               .               .               .
  netpol-z-2789/c       X               .               .               .               .               .               .               .               .

  I0630 16:31:00.978023 207680 reachability.go:183] observed:

  -             netpol-x-6644/a netpol-x-6644/b netpol-x-6644/c netpol-y-7770/a netpol-y-7770/b netpol-y-7770/c netpol-z-2789/a netpol-z-2789/b netpol-z-2789/c
  netpol-x-6644/a       .               .               .               .               .               .               .               .               .
  netpol-x-6644/b       .               .               .               .               .               .               .               .               .
  netpol-x-6644/c       X               .               .               .               .               .               .               .               .
  netpol-y-7770/a       X               .               .               .               .               .               .               .               .
  netpol-y-7770/b       .               .               .               .               .               .               .               .               .
  netpol-y-7770/c       X               .               .               .               .               .               .               .               .
  netpol-z-2789/a       X               .               .               .               .               .               .               .               .
  netpol-z-2789/b       X               .               .               .               .               .               .               .               .
  netpol-z-2789/c       .               .               .               .               .               .               .               .               .

  I0630 16:31:00.978037 207680 reachability.go:186] comparison:

  -             netpol-x-6644/a netpol-x-6644/b netpol-x-6644/c netpol-y-7770/a netpol-y-7770/b netpol-y-7770/c netpol-z-2789/a netpol-z-2789/b netpol-z-2789/c
  netpol-x-6644/a       X               .               .               .               .               .               .               .               .
  netpol-x-6644/b       .               .               .               .               .               .               .               .               .
  netpol-x-6644/c       .               .               .               .               .               .               .               .               .
  netpol-y-7770/a       .               .               .               .               .               .               .               .               .
  netpol-y-7770/b       X               .               .               .               .               .               .               .               .
  netpol-y-7770/c       .               .               .               .               .               .               .               .               .
  netpol-z-2789/a       .               .               .               .               .               .               .               .               .
  netpol-z-2789/b       .               .               .               .               .               .               .               .               .
  netpol-z-2789/c       X               .               .               .               .               .               .               .               .

If kube-network-policies is only for K8s testing, this is probably fine, but should be documented.

uablrek commented 3 months ago

/cc @aojea

aojea commented 3 months ago

If kube-network-policies is only for K8s testing, this is probably fine, but should be documented.

testing does not mean it can not work, it should work ... what is different with IPVS? maybe it needs to go in a different hook of the netfilter pipeline?

have you tried with latest version , the 0.4.0?

uablrek commented 3 months ago

have you tried with latest version , the 0.4.0?

Yes, I used 0.4.0.

My guess is that kube-ipvs0 is to blame (as always). I think it makes packets to be seen twice by nfqueue or something like that. Some condition involving kube-ipvs0 might do the trick. I haven't made any traces or checked the code, but I wrote a test inspired by e2e, but much simpler. I have two namespaces, netpol-x and netpol-y with pods a,b,c in each and svc's to them and make a matrix (again much simplified). This is how it looks without network policies:

netpol-x/a:. . . . . .
netpol-x/b:. . . . . .
netpol-x/c:. . . . . .
netpol-y/a:. . . . . .
netpol-y/b:. . . . . .
netpol-y/c:. . . . . .

Now I added a deny Ingress policy in netpol-x:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-ingress
spec:
  podSelector: {}
  policyTypes:
  - Ingress

And with proxy-mode=iptables it works as expected:

netpol-x/a:X X X . . .
netpol-x/b:X X X . . .
netpol-x/c:X X X . . .
netpol-y/a:X X X . . .
netpol-y/b:X X X . . .
netpol-y/c:X X X . . .

But with proxy-mode=ipvs, and just one worker, nothing is denied:

netpol-x/a:. . . . . .
netpol-x/b:. . . . . .
netpol-x/c:. . . . . .
netpol-y/a:. . . . . .
netpol-y/b:. . . . . .
netpol-y/c:. . . . . .

With several workers it is different. Here is an example with 4 workers and proxy-mode=ipvs:

netpol-x/a:. X X . . .
netpol-x/b:X . X . . .
netpol-x/c:X X . . . .
netpol-y/a:X X X . . .
netpol-y/b:. X X . . .
netpol-y/c:X X . . . .

Whenever the sending pod and the receiving endpoint are on the same node, policies doesn't work.

aojea commented 3 months ago

Whenever the sending pod and the receiving endpoint are on the same node, policies doesn't work.

Ok, this is good because then it is easy to repro, I don't know how IPVS interacts with nftables hooks, maybe is just a matter of priorities or order of the hooks?

can you get a trace of the packet that does not work?

https://wiki.nftables.org/wiki-nftables/index.php/Ruleset_debug/tracing

uablrek commented 3 months ago

Ipvs is kinda stupid that way. I think is uses some ancient input hook, and not nftables/iptables. That's why you must have the lb-addresses on an interface. One can't really blame ipvs though, since it predates even iptables.

can you get a trace of the packet that does not work?

I'll see what I can do

uablrek commented 3 months ago

I use an Ingress policy, and then the dest address should match the pod address for the syn-packet to be handled by kube-network-policies (if I got it right). But for proxy-mode=ipvs, such packet is not seen, only with the service-ip as dest.

09:54:19.571427 IP 11.0.1.3.42021 > 10.96.111.160.6000: Flags [S], seq 465539964, win 64240, options [mss 1460,sackOK,TS val 2700637001 ecr 0,nop,wscale 7], length 0
09:54:19.571446 IP 10.96.111.160.6000 > 11.0.1.3.42021: Flags [S.], seq 2654692432, ack 465539965, win 65160, options [mss 1460,sackOK,TS val 2914818472 ecr 2700637001,nop,wscale 7], length 0

But I am confused since I send from a pod in netpol-x (saddr match) and the rule:

                ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"

should direct the packet to the nfqueue, but I can't see anything in the kube-network-policies logs. I thought I should see some debug log saying that the syn is accepted since the dest isn't a netpol-x pod address (it's the service address)

uablrek commented 3 months ago

(I traced with tcpdump on the veth interface to the pod, and used kindnetd:v1.1.0 in this run)

uablrek commented 3 months ago

Um, scratch that. I traced on the veth on the sending pod, not the receiving. Sorry. Still I think something should be seen in the kube-network-policies logs.

aojea commented 3 months ago

This is where the rules are inserted

https://github.com/kubernetes-sigs/kube-network-policies/blob/db089bc6ed6c17a654ef0ea3f3a93d9122041c4e/pkg/networkpolicy/controller.go#L648-L654

IIUIC this seems to send them to INPUT for IPVS processing?

https://elixir.bootlin.com/linux/v5.10/source/net/netfilter/ipvs/ip_vs_core.c#L2263

Since we want to match against the destination Pod in Services and not to the ClusterIP, should we add a new filter in OUTPUT? I think that will solve the problem

diff --git a/pkg/networkpolicy/controller.go b/pkg/networkpolicy/controller.go
index ab0d1f2..ee2991c 100644
--- a/pkg/networkpolicy/controller.go
+++ b/pkg/networkpolicy/controller.go
@@ -645,7 +645,7 @@ func (c *Controller) syncNFTablesRules(ctx context.Context) error {
                }
        }

-       for _, hook := range []knftables.BaseChainHook{knftables.ForwardHook} {
+       for _, hook := range []knftables.BaseChainHook{knftables.ForwardHook, knftables.OutputHook} {
                chainName := string(hook)
                tx.Add(&knftables.Chain{
                        Name:     chainName,
uablrek commented 3 months ago

I can try it...

uablrek commented 3 months ago

Works like a charm :smile:

# nft list table inet kube-network-policies
table inet kube-network-policies {
        comment "rules for kubernetes NetworkPolicy"
        set podips-v4 {
                type ipv4_addr
                comment "Local V4 Pod IPs with Network Policies"
                elements = { 11.0.2.2, 11.0.2.3,
                             11.0.2.4 }
        }

        set podips-v6 {
                type ipv6_addr
                comment "Local V6 Pod IPs with Network Policies"
                elements = { 1100::202,
                             1100::203,
                             1100::204 }
        }

        chain forward {
                type filter hook forward priority filter - 5; policy accept;
                ct state established,related accept
                ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
                ip daddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
                ip6 saddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
                ip6 daddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
        }

        chain output {
                type filter hook output priority filter - 5; policy accept;
                ct state established,related accept
                ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
                ip daddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
                ip6 saddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
                ip6 daddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
        }
}
# With one worker and proxy-mode=ipvs:
netpol-x/a:X X X . . .
netpol-x/b:X X X . . .
netpol-x/c:X X X . . .
netpol-y/a:X X X . . .
netpol-y/b:X X X . . .
netpol-y/c:X X X . . .
uablrek commented 3 months ago

And e2e:

export FOCUS="\[sig-network\].*\[Feature:NetworkPolicy\].*"

Ran 48 of 6652 Specs in 186.586 seconds
SUCCESS! -- 48 Passed | 0 Failed | 0 Pending | 6604 Skipped

Ginkgo ran 1 suite in 3m7.572058657s
Test Suite Passed
aojea commented 3 months ago

And e2e:

export FOCUS="\[sig-network\].*\[Feature:NetworkPolicy\].*"

Ran 48 of 6652 Specs in 186.586 seconds
SUCCESS! -- 48 Passed | 0 Failed | 0 Pending | 6604 Skipped

Ginkgo ran 1 suite in 3m7.572058657s
Test Suite Passed

can you submit a patch to fix this issue?

add me and @danwinship , I want him to take a look

aojea commented 3 months ago

/assign @uablrek

aojea commented 3 months ago

ok, progressing, different error now??

    helper.go:123: StatefulSet replicas in namespace network-policy-conformance-forbidden-forrest not rolled out yet. 1/2 replicas are available.
    helper.go:120: Error retrieving StatefulSet harry-potter from namespace network-policy-conformance-gryffindor: client rate limiter Wait returned an error: context deadline exceeded
    helper.go:123: StatefulSet replicas in namespace network-policy-conformance-gryffindor not rolled out yet. 0/2 replicas are available.
    suite.go:143: 
aojea commented 2 months ago

I think that the problem if we add it in the OUTPUT hook is that it also processed the packets that are directed to the host, per example, ICMPv6 Neighbor discovery will be processed, and we want to process only packets directed to the Pods, so based on https://stuffphilwrites.com/fw-ids-iptables-flowchart-v2024-05-22/ and http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.filter_rules.html POSTROUTING seems the right hook, instead of OUTPUT, and before SNAT https://wiki.nftables.org/wiki-nftables/index.php/Netfilter_hooks