Closed uablrek closed 2 months ago
/cc @aojea
If kube-network-policies is only for K8s testing, this is probably fine, but should be documented.
testing does not mean it can not work, it should work ... what is different with IPVS? maybe it needs to go in a different hook of the netfilter pipeline?
have you tried with latest version , the 0.4.0?
have you tried with latest version , the 0.4.0?
Yes, I used 0.4.0.
My guess is that kube-ipvs0
is to blame (as always). I think it makes packets to be seen twice by nfqueue or something like that. Some condition involving kube-ipvs0
might do the trick. I haven't made any traces or checked the code, but I wrote a test inspired by e2e, but much simpler. I have two namespaces, netpol-x and netpol-y with pods a,b,c in each and svc's to them and make a matrix (again much simplified). This is how it looks without network policies:
netpol-x/a:. . . . . .
netpol-x/b:. . . . . .
netpol-x/c:. . . . . .
netpol-y/a:. . . . . .
netpol-y/b:. . . . . .
netpol-y/c:. . . . . .
Now I added a deny Ingress policy in netpol-x:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress
And with proxy-mode=iptables it works as expected:
netpol-x/a:X X X . . .
netpol-x/b:X X X . . .
netpol-x/c:X X X . . .
netpol-y/a:X X X . . .
netpol-y/b:X X X . . .
netpol-y/c:X X X . . .
But with proxy-mode=ipvs, and just one worker, nothing is denied:
netpol-x/a:. . . . . .
netpol-x/b:. . . . . .
netpol-x/c:. . . . . .
netpol-y/a:. . . . . .
netpol-y/b:. . . . . .
netpol-y/c:. . . . . .
With several workers it is different. Here is an example with 4 workers and proxy-mode=ipvs:
netpol-x/a:. X X . . .
netpol-x/b:X . X . . .
netpol-x/c:X X . . . .
netpol-y/a:X X X . . .
netpol-y/b:. X X . . .
netpol-y/c:X X . . . .
Whenever the sending pod and the receiving endpoint are on the same node, policies doesn't work.
Whenever the sending pod and the receiving endpoint are on the same node, policies doesn't work.
Ok, this is good because then it is easy to repro, I don't know how IPVS interacts with nftables hooks, maybe is just a matter of priorities or order of the hooks?
can you get a trace of the packet that does not work?
https://wiki.nftables.org/wiki-nftables/index.php/Ruleset_debug/tracing
Ipvs is kinda stupid that way. I think is uses some ancient input hook, and not nftables/iptables. That's why you must have the lb-addresses on an interface. One can't really blame ipvs though, since it predates even iptables.
can you get a trace of the packet that does not work?
I'll see what I can do
I use an Ingress policy, and then the dest address should match the pod address for the syn-packet to be handled by kube-network-policies (if I got it right). But for proxy-mode=ipvs, such packet is not seen, only with the service-ip as dest.
09:54:19.571427 IP 11.0.1.3.42021 > 10.96.111.160.6000: Flags [S], seq 465539964, win 64240, options [mss 1460,sackOK,TS val 2700637001 ecr 0,nop,wscale 7], length 0
09:54:19.571446 IP 10.96.111.160.6000 > 11.0.1.3.42021: Flags [S.], seq 2654692432, ack 465539965, win 65160, options [mss 1460,sackOK,TS val 2914818472 ecr 2700637001,nop,wscale 7], length 0
But I am confused since I send from a pod in netpol-x (saddr match) and the rule:
ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
should direct the packet to the nfqueue, but I can't see anything in the kube-network-policies logs. I thought I should see some debug log saying that the syn is accepted since the dest isn't a netpol-x pod address (it's the service address)
(I traced with tcpdump on the veth interface to the pod, and used kindnetd:v1.1.0 in this run)
Um, scratch that. I traced on the veth on the sending pod, not the receiving. Sorry. Still I think something should be seen in the kube-network-policies logs.
This is where the rules are inserted
IIUIC this seems to send them to INPUT for IPVS processing?
https://elixir.bootlin.com/linux/v5.10/source/net/netfilter/ipvs/ip_vs_core.c#L2263
Since we want to match against the destination Pod in Services and not to the ClusterIP, should we add a new filter in OUTPUT? I think that will solve the problem
diff --git a/pkg/networkpolicy/controller.go b/pkg/networkpolicy/controller.go
index ab0d1f2..ee2991c 100644
--- a/pkg/networkpolicy/controller.go
+++ b/pkg/networkpolicy/controller.go
@@ -645,7 +645,7 @@ func (c *Controller) syncNFTablesRules(ctx context.Context) error {
}
}
- for _, hook := range []knftables.BaseChainHook{knftables.ForwardHook} {
+ for _, hook := range []knftables.BaseChainHook{knftables.ForwardHook, knftables.OutputHook} {
chainName := string(hook)
tx.Add(&knftables.Chain{
Name: chainName,
I can try it...
Works like a charm :smile:
# nft list table inet kube-network-policies
table inet kube-network-policies {
comment "rules for kubernetes NetworkPolicy"
set podips-v4 {
type ipv4_addr
comment "Local V4 Pod IPs with Network Policies"
elements = { 11.0.2.2, 11.0.2.3,
11.0.2.4 }
}
set podips-v6 {
type ipv6_addr
comment "Local V6 Pod IPs with Network Policies"
elements = { 1100::202,
1100::203,
1100::204 }
}
chain forward {
type filter hook forward priority filter - 5; policy accept;
ct state established,related accept
ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
ip daddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
ip6 saddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
ip6 daddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
}
chain output {
type filter hook output priority filter - 5; policy accept;
ct state established,related accept
ip saddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
ip daddr @podips-v4 queue to 100 comment "process IPv4 traffic with network policy enforcement"
ip6 saddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
ip6 daddr @podips-v6 queue to 100 comment "process IPv6 traffic with network policy enforcement"
}
}
# With one worker and proxy-mode=ipvs:
netpol-x/a:X X X . . .
netpol-x/b:X X X . . .
netpol-x/c:X X X . . .
netpol-y/a:X X X . . .
netpol-y/b:X X X . . .
netpol-y/c:X X X . . .
And e2e:
export FOCUS="\[sig-network\].*\[Feature:NetworkPolicy\].*"
Ran 48 of 6652 Specs in 186.586 seconds
SUCCESS! -- 48 Passed | 0 Failed | 0 Pending | 6604 Skipped
Ginkgo ran 1 suite in 3m7.572058657s
Test Suite Passed
And e2e:
export FOCUS="\[sig-network\].*\[Feature:NetworkPolicy\].*" Ran 48 of 6652 Specs in 186.586 seconds SUCCESS! -- 48 Passed | 0 Failed | 0 Pending | 6604 Skipped Ginkgo ran 1 suite in 3m7.572058657s Test Suite Passed
can you submit a patch to fix this issue?
add me and @danwinship , I want him to take a look
/assign @uablrek
ok, progressing, different error now??
helper.go:123: StatefulSet replicas in namespace network-policy-conformance-forbidden-forrest not rolled out yet. 1/2 replicas are available.
helper.go:120: Error retrieving StatefulSet harry-potter from namespace network-policy-conformance-gryffindor: client rate limiter Wait returned an error: context deadline exceeded
helper.go:123: StatefulSet replicas in namespace network-policy-conformance-gryffindor not rolled out yet. 0/2 replicas are available.
suite.go:143:
I think that the problem if we add it in the OUTPUT hook is that it also processed the packets that are directed to the host, per example, ICMPv6 Neighbor discovery will be processed, and we want to process only packets directed to the Pods, so based on https://stuffphilwrites.com/fw-ids-iptables-flowchart-v2024-05-22/ and http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.filter_rules.html POSTROUTING seems the right hook, instead of OUTPUT, and before SNAT https://wiki.nftables.org/wiki-nftables/index.php/Netfilter_hooks
Reproduce by setting proxy-mode=ipvs and run e2e. Example:
With proxy-mode=iptables (or nftables):
Same with proxy-mode=ipvs:
If
kube-network-policies
is only for K8s testing, this is probably fine, but should be documented.