Open girishmg opened 5 years ago
More information on this issue:
Logical topology
+--------------------+
mport | ovn_cluster_router |
| +--+-----------------+
| |
++--------+-+
| node1 |
+-----------+
Physical topology
Host
+---------------------+
| (192.168.1.2) |
| mport |
| | |
| +-+----------+ |
| | br-int | |
| +------------+ |
| 10.0.2.16 |
+---------------------+
(NOTE: Instead of 10.0.2.16 IP use your test machine IP)
ovn-nbctl ls-add node1 -- set logical_switch node1 \
other-config:subnet=192.168.1.0/24
ovn-nbctl lsp-add node1 port1 -- lsp-set-addresses \
port1 "0:1:2:3:4:2 192.168.1.2"
ovn-nbctl lr-add ovn_cluster_router
ovn-nbctl lr-route-add ovn_cluster_router 10.0.2.16 192.168.1.2
ovn-nbctl lrp-add ovn_cluster_router rtos-node1 00:00:00:CB:5A:76 192.168.1.1/24
ovn-nbctl lsp-add node1 stor-node1 -- set logical_switch_port stor-node1 \
type=router options:router-port=rtos-node1 addresses=\"00:00:00:CB:5A:76\"
uuid=`ovn-nbctl create load_balancer protocol=tcp`
ovn-nbctl set load_balancer $uuid vips:\"10.96.0.1:3333\"=\"10.0.2.16:80\"
ovn-nbctl ls-lb-add node1 $uuid
ovn-nbctl acl-add node1 to-lport 1001 ip4.src==192.168.1.2 allow-related
Physical binding
ovs-vsctl add-port br-int k8s-mport -- set interface k8s-mport type=internal \
external_ids:iface-id=port1
ip li set k8s-mport address 0:1:2:3:4:2
ip a add dev k8s-mport 192.168.1.2/24
ip li set k8s-mport up
ip ro add 10.96.0.0/12 via 192.168.1.2
On the host run the HTTP server
python -m SimpleHTTPServer 80 &
On the host disable rp_filter and enable accept_local on mport
sysctl -w net.ipv4.conf.k8s-mport.rp_filter=0
sysctl -w net.ipv4.conf.k8s-mport.accept_local=1
On one terminal, tail ovs-vswitchd.log
term1$ tail -f /var/log/openvswitch/ovs-vswitchd.log
On second terminal, run tcpdump
term2$ tcpdump -enni mport tcp port 9991
On third terminal, run the hairpin traffic
term3$ nc -p 9991 -zv 10.96.0.1 3333
The packet in host (192.168.1.2:9991, 10.96.0.1:3333) enters the OVN pipeline and the LB DNATs the packet to 10.0.2.16:80. The packet hits the ovn_cluster_router and matches the static route and is forwarded to k8s-mport. At this point, the ACL on the k8s-mport (to-lport ip4.src) hits and results in an Invalid argument error in ovs-vswitchd.log.
---------8<--------------8<-------- 2019-09-25T06:07:22.959Z|00001|dpif(handler1)|WARN|system@ovs-system: execute ct(commit,zone=1,label=0/0x1),1 failed (Invalid argument) on packet tcp,vlan_tci=0x0000,dl_src=00:00:00:cb:5a:76,dl_dst=00:01:02:03:04:02,nw_src=192.168.1.2,nw_dst=10.10.0.11,nw_tos=0,nw_ecn=0,nw_ttl=63,tp_src=9991,tp_dst=80,tcp_flags=syn tcp_csum:6273 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x1),ct_tuple4(src=192.168.1.2,dst=10.10.0.11,proto=6,tp_src=9991,tp_dst=80),in_port(1) mtu 0 ---------8<--------------8<--------
Now, if I remove the ACL (to-lport ip4.src==192.168.1.2), then the above warning goes away. So, clearly the CT zone ID for the logical_switch_port is used twice to insert state.
Say that we fix the above issue in OVN, we will end up with a different issue as explained below. (I just removed the ACL to check if things work fine, but it didn't, and I ended up in the issue below)
|------------------+------------------+------------------+-----------------------------|
| Location | SRC:SPort | DST:DPort | Notes |
|------------------+------------------+------------------+-----------------------------|
| Client (Host) | 192.168.1.2:9991 | 10.96.0.1:3333 | |
| OVN pipeline | 192.168.1.2:9991 | 10.0.2.16:80 | LB DNAT on `node1` logical |
| | | | switch |
| --------------- packet hairpins and comes back to the host ----------------------- |
| | | | |
| Client (Host) | 192.168.1.2:9991 | 10.0.2.16:80 | The Web server receives the |
| | | | SYN packet |
| | | | |
| WebServer (Host) | 10.0.2.16:80 | 192.168.1.2:9991 | Server sends SYN+ACK to |
| | | | 192.168.1.2:9991 instead of |
| | | | 10.96.0.1:3333 |
| | | | |
| Kernel (Host) | Sends a Reset since SYN+ACK is from a different endpoint |
|------------------+------------------+------------------+-----------------------------|
So, even if we fix OVN with multiple CT zones for hairpin traffic, then we have the above reset issue. I tried using PBR using ip-rule, iptables connection marking, SNAT, and so on but nothing worked.
I think it makes sense to introduce the second gateway router for this purpose. Do it the right way.
Note: See https://docs.google.com/document/d/1anbU730-qMsdaBCyZxMDMtQe2SWoGOY56cs2nhzDmJ8/edit?usp=sharing for more information on the issue
How to Reproduce:
On the K8s master host run
and the command will time out.
(Note: make sure you don’t have any iptable rules installed by kube-proxy. Run: iptables –F && iptables –t nat –F && iptables –t mangle –F && iptables –X)
Software Version: OVS 2.11.1 and OVN 2.11.1 and Linux Upstream Kernel 4.15 (I have tried Linux OVS Tree Kernel as well using dkms).
Observation:
We are seeing an issue on K8s master with a HostNetwork Pod trying to access the K8s API server using kubernetes cluster IP of 10.96.0.1. The HostNetwork pod tries to access the K8s API server using 10.96.0.1:443 and the request timeouts.
Because of the following route in the host IP stack
the packet from the HostNetwork pod will enter the OVN logical network through the K8s management port. It then hits the load balancer rule and gets translated to 10.0.2.16:6443
The packet then hits ovn_cluster_router and matches the following route
and the packet is sent back to the management port. The corresponding OpenFlow rule in the OVS fails to get added due to ‘Invalid argument’
The failure is in net/openswitch/conntrack.c`ovs_ct_execute() method.