ovn-org / ovn

Open Virtual Network
Apache License 2.0
494 stars 243 forks source link

Logical router `lb_force_snat_ip` doesn't appear to work #144

Open tomponline opened 2 years ago

tomponline commented 2 years ago

OVN details:

sudo ovn-nbctl --version
ovn-nbctl 22.03.0
Open vSwitch Library 2.17.0
DB Schema 6.1.0

When setting options:lb_force_snat_ip on a logical router to either the IP address of the logical router's internal port IP address or the value router_ip I expected that incoming requests to load balancers would be SNATed on the way to the target address on the logical network switch.

This is what the docs suggest:

For the return traffic to go back to the same gateway router (for unDNATing), the packet needs a SNAT in the first place. This can be achieved by setting the above option with a gateway specific set of IP addresses. This option may have exactly one IPv4 and/or one IPv6 address on it, separated by a space character. If it is configured with the value router_ip, then the load balanced packet is SNATed with the IP of router port (attached to the gateway router) selected as the destination after taking the routing decision.

https://manpages.ubuntu.com/manpages/jammy/man5/ovn-nb.5.html#logical_router%20table

But this isn't what happens.

We can see that the logical router's options has lb_force_snat_ip=router_ip:

sudo ovn-nbctl set logical_router lxd-net34-lr options:lb_force_snat_ip=router_ip
sudo ovn-nbctl find logical_router name=lxd-net34-lr
_uuid               : dc6e4f1f-235b-432c-b4bb-fd7d448c8deb
copp                : []
enabled             : []
external_ids        : {}
load_balancer       : [83a21af3-5880-42f3-9557-28870998fcd7, a1ae621b-6781-4364-b186-967cd14457ac]
load_balancer_group : []
name                : lxd-net34-lr
nat                 : [578d4bab-e6d0-4109-b717-999ac522dd5f, d0d2fa70-bc78-4029-8061-8c2711525379]
options             : {lb_force_snat_ip=router_ip}
policies            : [1712bda1-ca78-4659-a7b5-26e3530d8560, 3f0be3a2-f6e1-4539-994f-ab88b871f99c, fb1b29ee-35cd-449e-ae92-597e37656c31]
ports               : [1e8fdd0d-f49b-49bd-a8e6-aed2da39fb55, 95ca3f95-455f-41e8-b494-ecbe3e28179b]
static_routes       : [91b43cd1-4e81-46bf-9279-12275a73515a, cdb562b8-8a72-4b2a-b353-f8b83e8e57a1]

We can see that the internal router port's IP is 10.23.42.1/24:

sudo ovn-nbctl find logical_router_port
_uuid               : 1e8fdd0d-f49b-49bd-a8e6-aed2da39fb55
enabled             : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
ipv6_prefix         : []
ipv6_ra_configs     : {address_mode=dhcpv6_stateless, dnssl=lxd, max_interval="60", min_interval="30", mtu="1500", rdnss="fd42:bafd:ac21:9f::1", send_periodic="true"}
mac                 : "00:16:3e:15:18:f3"
name                : lxd-net34-lr-lrp-int
networks            : ["10.23.42.1/24", "fd42:fcd6:d173:50b9::1/64"]
options             : {gateway_mtu="1500"}
peer                : []

_uuid               : 95ca3f95-455f-41e8-b494-ecbe3e28179b
enabled             : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : 00a1a647-d901-4c6f-970c-b7d5d20e37c0
ipv6_prefix         : []
ipv6_ra_configs     : {}
mac                 : "00:16:3e:15:18:f3"
name                : lxd-net34-lr-lrp-ext
networks            : ["10.64.199.2/24", "fd42:bafd:ac21:9f:216:3eff:fe15:18f3/64"]
options             : {gateway_mtu="1500"}
peer                : []

We have configured a simple load-balancer listening on an external IP 10.0.0.1 with a single backend endpoint on the internal logical switch 10.23.42.2:

sudo ovn-nbctl find load_balancer
_uuid               : 83a21af3-5880-42f3-9557-28870998fcd7
external_ids        : {}
health_check        : []
ip_port_mappings    : {}
name                : lxd-net34-lb-10.0.0.1-tcp
options             : {}
protocol            : tcp
selection_fields    : []
vips                : {"10.0.0.1:53"="10.23.42.2:53"}

We have a logical switch port connected to a container with the IP 10.23.42.2 running a fake DNS server and tcpdump:

sudo ovn-nbctl find logical_switch_port name=lxd-net34-instance-394a304f-5d86-4b99-b550-2bb456c1af2a-eth0
_uuid               : 974cab39-cb59-4324-8b06-2e1cc960de7f
addresses           : ["00:16:3e:de:e4:62 dynamic"]
dhcpv4_options      : 4502a717-4219-4243-9c08-17a863e39ca5
dhcpv6_options      : 0faf36c5-386a-4e33-bdc2-16f6884f6b85
dynamic_addresses   : "00:16:3e:de:e4:62 10.23.42.2 fd42:fcd6:d173:50b9:216:3eff:fede:e462"
enabled             : []
external_ids        : {}
ha_chassis_group    : []
name                : lxd-net34-instance-394a304f-5d86-4b99-b550-2bb456c1af2a-eth0
options             : {}
parent_name         : []
port_security       : []
tag                 : []
tag_request         : []
type                : ""
up                  : true

Running dig @10.0.0.1 foo.com +tcp on the provider network towards the load balancer's external IP works (in that the request is forwarded via DNAT to the internal switch port's IP) and we get an answer from our fake DNS server.

dig @10.0.0.1 foo.com +tcp

; <<>> DiG 9.18.1-1ubuntu1.1-Ubuntu <<>> @10.0.0.1 foo.com +tcp
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58860
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;foo.com.           IN  A

;; ANSWER SECTION:
foo.com.        0   IN  A   127.0.0.1

But inside the container we see that the source address is from the external provider network and hasn't been SNATed to the router_port IP (10.23.42.1/24):

lxc exec c1 -- tcpdump -i eth0 -nn port 53
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:03:21.949988 IP 10.64.199.1.37823 > 10.23.42.2.53: Flags [S], seq 1488537295, win 64240, options [mss 1460,sackOK,TS val 935945855 ecr 0,nop,wscale 7], length 0
09:03:21.950017 IP 10.23.42.2.53 > 10.64.199.1.37823: Flags [S.], seq 2485936117, ack 1488537296, win 65160, options [mss 1460,sackOK,TS val 612113390 ecr 935945855,nop,wscale 7], length 0
09:03:21.954136 IP 10.64.199.1.37823 > 10.23.42.2.53: Flags [P.], seq 1:51, ack 1, win 502, options [nop,nop,TS val 935945860 ecr 612113390], length 50 58860+ [1au] A? foo.com. (48)
09:03:21.954158 IP 10.64.199.1.37823 > 10.23.42.2.53: Flags [.], ack 1, win 502, options [nop,nop,TS val 935945860 ecr 612113390], length 0
09:03:21.954164 IP 10.23.42.2.53 > 10.64.199.1.37823: Flags [.], ack 51, win 509, options [nop,nop,TS val 612113394 ecr 935945860], length 0
09:03:21.954177 IP 10.23.42.2.53 > 10.64.199.1.37823: Flags [.], ack 51, win 509, options [nop,nop,TS val 612113394 ecr 935945860], length 0
09:03:21.954608 IP 10.23.42.2.53 > 10.64.199.1.37823: Flags [P.], seq 1:55, ack 51, win 509, options [nop,nop,TS val 612113394 ecr 935945860], length 54 58860* 1/0/1 A 127.0.0.1 (52)

Related issue: https://github.com/lxc/lxd/issues/10654

dceara commented 2 years ago

@tomponline Would it be possible to share the NB and SB databases? Also, is ovn-northd logging any errors/warnings?

tomponline commented 2 years ago

I can get those for you, which files do you need?

dceara commented 2 years ago

The databases are (I'm guessing) in /etc/ovn/ovnnb_db.db and /etc/ovn/ovnsb_db.db. ovn-northd logs should be in /var/log/ovn/ovn-northd.log

tomponline commented 2 years ago

Thanks. In Ubuntu the databases are in /var/lib/ovn/, so I stopped the OVN services, cleared that directory and restarted them to ensure I had a fresh set of OVN databases.

I then created an OVN network using LXD as per https://linuxcontainers.org/lxd/docs/master/howto/network_ovn_setup/#set-up-a-standalone-ovn-network

sudo ovs-vsctl set open_vswitch . \
   external_ids:ovn-remote=unix:/var/run/ovn/ovnsb_db.sock \
   external_ids:ovn-encap-type=geneve \
   external_ids:ovn-encap-ip=127.0.0.1
lxc network set lxdbr0 \
    ipv4.dhcp.ranges=10.64.199.21-10.64.199.30 \
    ipv4.ovn.ranges=10.64.199.2-10.64.199.20
lxc network create ovntest --type=ovn network=lxdbr0
lxc network show ovntest
config:
  bridge.mtu: "1500"
  ipv4.address: 10.175.153.1/24
  ipv4.nat: "true"
  ipv6.address: fd42:7081:a51c:727d::1/64
  ipv6.nat: "true"
  network: lxdbr0
  volatile.network.ipv4.address: 10.64.199.2
  volatile.network.ipv6.address: fd42:bafd:ac21:9f:216:3eff:fe9c:6097
description: ""
name: ovntest
type: ovn
used_by:
- /1.0/instances/c1
managed: true
status: Created
locations:
- none

I then indicated to LXD that the 10.0.0.0/24 subnet could be used for external IPs on the OVN network and routed a single IP in that subnet to the OVN router's external IP address:

lxc network set lxdbr0 ipv4.routes=10.0.0.0/24
sudo ip r replace 10.0.0.1/32 via $(lxc network get ovntest volatile.network.ipv4.address)

I then created a container connected to the OVN network:

lxc launch images:ubuntu/22.04 c1 -n ovntest
lxc ls c1
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+
| NAME |  STATE  |        IPV4         |                     IPV6                      |   TYPE    | SNAPSHOTS |
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+
| c1   | RUNNING | 10.175.153.2 (eth0) | fd42:7081:a51c:727d:216:3eff:feb3:50e8 (eth0) | CONTAINER | 0         |
+------+---------+---------------------+-----------------------------------------------+-----------+-----------+

Next I created a LXD network forward (which uses OVN load balancers under the hood) to forward 10.0.0.1 on the external network to c1's internal address 10.175.153.2:

lxc network forward create ovntest 10.0.0.1 target_address=10.175.153.2

This resulted in an OVN load balancer:

sudo ovn-nbctl list load_balancer
_uuid               : 853152e9-7f43-4c5b-87f3-88accd0b31fe
external_ids        : {}
health_check        : []
ip_port_mappings    : {}
name                : lxd-net38-lb-10.0.0.1-tcp
options             : {}
protocol            : tcp
selection_fields    : []
vips                : {"10.0.0.1"="10.175.153.2"}

I checked I can ping 10.0.0.1 from the LXD host and captured what was seen inside the container:

ping -c1 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=63 time=0.700 ms

--- 10.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.700/0.700/0.700/0.000 ms
lxc exec c1 -- tcpdump -i eth0 icmp -nn
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:43:23.259344 IP 10.64.199.1 > 10.175.153.2: ICMP echo request, id 14, seq 1, length 64
09:43:23.259375 IP 10.175.153.2 > 10.64.199.1: ICMP echo reply, id 14, seq 1, length 64

We can see the source address was the gateway IP of the uplink network (10.64.199.1) as expected at this stage.

Next I set options:lb_force_snat_ip=router_ip on the logical router:

sudo ovn-nbctl set logical_router lxd-net38-lr options:lb_force_snat_ip=router_ip
sudo ovn-nbctl list logical_router
_uuid               : d3a59d0d-63ff-4a9e-a267-1fc490cb9d05
copp                : []
enabled             : []
external_ids        : {}
load_balancer       : [853152e9-7f43-4c5b-87f3-88accd0b31fe]
load_balancer_group : []
name                : lxd-net38-lr
nat                 : [9883cea0-d762-4c64-9902-c499c3d02fa6, b02367e4-1948-43c7-9ac4-d11cfd6142dc]
options             : {lb_force_snat_ip=router_ip}
policies            : [3726eff6-bbad-4a49-9ffc-01be28501366, e8ceb3bf-20e3-41e7-84bf-6444e64bd4c4, fc60ad2f-0425-4156-b379-68f77f702101]
ports               : [69069034-7cf3-4818-9869-11e73adff430, b2eddd86-0840-45a6-8621-b2e5c77bbf9e]
static_routes       : [0263d9ef-fe78-4857-8a59-78b8dc9b8e4f, 2abd12df-1461-4869-a426-60d456b77ea0]

Then re-ran the ping test, and still see the same 10.64.199.1 source address, rather than the desired 10.175.153.1 address of the OVN router's IP address on the internal router port:

lxc exec c1 -- tcpdump -i eth0 icmp -nn
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
09:46:55.406443 IP 10.64.199.1 > 10.175.153.2: ICMP echo request, id 15, seq 1, length 64
09:46:55.406492 IP 10.175.153.2 > 10.64.199.1: ICMP echo reply, id 15, seq 1, length 64
tomponline commented 2 years ago

Here are the database files after that: ovn.tar.gz

tomponline commented 2 years ago

Interestingly I'm seeing this in the logs:

2022-07-15T09:50:24.680Z|00093|northd|WARN|bad ip router_ip in options of router d3a59d0d-63ff-4a9e-a267-1fc490cb9d05

Although it is mentioned as supported in the docs for this version:

https://manpages.ubuntu.com/manpages/jammy/en/man5/ovn-nb.5.html#logical_router%20table

tomponline commented 2 years ago

I also tried setting it manually:

sudo ovn-nbctl set logical_router lxd-net38-lr options:lb_force_snat_ip=10.175.153.1
sudo ovn-nbctl list logical_router
_uuid               : d3a59d0d-63ff-4a9e-a267-1fc490cb9d05
copp                : []
enabled             : []
external_ids        : {}
load_balancer       : [853152e9-7f43-4c5b-87f3-88accd0b31fe]
load_balancer_group : []
name                : lxd-net38-lr
nat                 : [9883cea0-d762-4c64-9902-c499c3d02fa6, b02367e4-1948-43c7-9ac4-d11cfd6142dc]
options             : {lb_force_snat_ip="10.175.153.1"}
policies            : [3726eff6-bbad-4a49-9ffc-01be28501366, e8ceb3bf-20e3-41e7-84bf-6444e64bd4c4, fc60ad2f-0425-4156-b379-68f77f702101]
ports               : [69069034-7cf3-4818-9869-11e73adff430, b2eddd86-0840-45a6-8621-b2e5c77bbf9e]
static_routes       : [0263d9ef-fe78-4857-8a59-78b8dc9b8e4f, 2abd12df-1461-4869-a426-60d456b77ea0]

But still no affect.

Here are the DB files after setting it manually. ovn2.tar.gz

dceara commented 2 years ago

@tomponline I see now why router_ip is ignored. It's because the option is only supported on gateway routers (bound to a chassis with options:chassis=<chassis-name>): https://github.com/ovn-org/ovn/commit/c6e21a23bd8cfcf8dd8b6eb70c8b09e6f4582b2f

As a matter of fact, it looks to me like lb_force_snat_ip was always only supported on gateway routers. @numansiddique, is this really the case?

tomponline commented 2 years ago

Hrm thanks for clarifying, that's ashame. In our case the logical router is not tied to a specific chassis, in fact in a clustered setup we rely on its external port moving between chassis based on chassis priority. We don't have multiple logical routers connected to the same network, but would still like the load balancer to perform SNAT.

tomponline commented 2 years ago

Can you explain the difference between a Gateway router and a non-gateway router? We'd like as much processing to be distributed as possible, but with NAT necessarily needing to be centralized on the current primary chassis.

tomponline commented 2 years ago

We use Gateway_Chassis and HA_Chassis_Group to enable the uplink port to move between chassis.

dceara commented 2 years ago

A gateway router is "pinned" to a chassis, so all traffic that needs to go through the router will be forwarded that specific chassis. But in your case, what I think we need to do is add support for lb_force_snat_ip on "non-gateway routers" (i.e., routers with the uplink port bound to an HA_Chassis_Group). The only restriction would be that you can have at most one such port but that's already the case for load balancers to work properly.

tomponline commented 2 years ago

Yes, that makes sense, and would be much appreciated, thanks! :)

numansiddique commented 2 years ago

Yes. lb_force_snat_ip is restricted to gateway routers. I don't see the reason why we can't support it in distributed routers with gateway port.

tomponline commented 2 years ago

Excellent that would be fantastic thank you.

numansiddique commented 2 years ago

Feel free to give it a shot to add this support if you want to :)

LorenzoBianconi commented 2 years ago

Yes. lb_force_snat_ip is restricted to gateway routers. I don't see the reason why we can't support it in distributed routers with gateway port.

@numansiddique @dceara @tomponline looking at the code it seems a bit hard to introduce lb_force_snat_ip on distributed router since the incoming traffic is "snatted" on the hv running the distributed gw router port (lr_out_snat - table 3 of router egress pipeline) while the reverse traffic is "unsnatted" in lr_in_unsnat stage (table 4) of logical router ingress pipeline. Since lr_in_unsnat is executed before routing stage (lr_in_ip_routing - table 11) and lr_in_gw_redirect stage (table 18) the traffic will not be sent to the hv running the distributed gw router port before forwarding it to connection tracking. Moreover lr_in_unsnat must be executed before lr_in_ip_routing stage so it is not possible to decide in advance if the packet must be sent to the hv running the distributed gw router port.

@tomponline is it possible to modify the architecture to achieve the result?

LorenzoBianconi commented 1 year ago

@tomponline do you have any update on this? Is it possible to modify the architecture to achieve the result?

tomponline commented 1 year ago

Hi @LorenzoBianconi sorry for the delay in replying. I've not had time to consider your request yet. To be honest I don't really follow what you're saying as am not familiar with OVN internals.

I'm not sure what you mean about modifying the architecture, but we do want to be able to use distributed routers.

xujunjie-cover commented 1 year ago

@tomponline please try to set chassis for logical router sudo ovn-nbctl --wait=hv set logical_router lxd-net34-lr options:chassis=xxx