submariner-io / submariner

Networking component for interconnecting Pods and Services across Kubernetes clusters.
https://submariner.io
Apache License 2.0
2.43k stars 191 forks source link

Connection failures seen with RHEL-9 Gateway nodes #2422

Closed sridhargaddam closed 1 year ago

sridhargaddam commented 1 year ago

What happened:

On RHEL-9/CoreOS 413.92.xx nodes when OCP is installed with OpenShiftSDN, it was seen that connectivity between non-Gateway to non-Gateway nodes is broken. Upon debugging the issue, the problem turns out to be wrong configuration related to reverse path filtering (i.e., /proc/sys/net/ipv4/conf/vx-submariner/rp_filter) for vx-submariner interface on the Gateway nodes.

Surprisingly, when the route-agent, after creating the vx-submariner, configures the LooseMode Reverse Path filtering on the iface, there is no error from from the netlink library, but the LooseMode (aka "2") configuration is not applied on the interface.

2023-03-31T18:43:27.319Z DBG ..oxy/gw_transition.go:47 KubeProxy            The current node has become a Gateway
2023-03-31T18:43:27.319Z ERR ..oxy/routes_iface.go:159 KubeProxy   Error retrieving link by name "vx-submariner" error="Link not found"
2023-03-31T18:43:27.319Z INF ..oxy/gw_transition.go:55 KubeProxy     Creating the vxlan interface: vx-submariner on the gateway node
2023-03-31T18:43:27.320Z DBG ../kubeproxy/vxlan.go:279 KubeProxy  Successfully configured reverse path filter to loose mode on "vx-submariner"
2023-03-31T18:43:27.392Z INF ..oller/controller.go:150 EventController  Event controller started

Note: This issue is applicable only when using OpenShiftSDN and not with OVN-K CNI.

What you expected to happen: non-Gateway to non-Gateway communication should work fine.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?: A quick work-around is to restart the route-agent pods using the following command and this will fix the issue.

kubectl delete pod -n submariner-operator -l app=submariner-routeagent

Environment: Submariner version 0.15.0-m3, Kubernetes Server version: v1.26.2+06e8c46, OCP 4.13.0-rc0 RHEL-9/CentOS Stream CoreOS 413.92.202303190222-0

sridhargaddam commented 1 year ago

An easy way to reproduce the issue is to run the following commands on a RHEL-9 node:

ip link add vxlan-sm type vxlan id 112 dev eth0 dstport 4789
ip link set dev vxlan-sm up
cat /proc/sys/net/ipv4/conf/vxlan-sm/rp_filter
echo 2 > /proc/sys/net/ipv4/conf/vxlan-sm/rp_filter
ip addr add 192.168.112.1/24 dev vxlan-sm
cat /proc/sys/net/ipv4/conf/vxlan-sm/rp_filter

The output of /proc/sys/net/ipv4/conf/vxlan-sm/rp_filter would be 1, in this case.

Adding a small delay of 100ms after creating the interface fixes the issue:

ip link add vxlan-sm type vxlan id 112 dev eth0 dstport 4789
ip link set dev vxlan-sm up
sleep 0.1
cat /proc/sys/net/ipv4/conf/vxlan-sm/rp_filter
echo 2 > /proc/sys/net/ipv4/conf/vxlan-sm/rp_filter
ip addr add 192.168.112.1/24 dev vxlan-sm
cat /proc/sys/net/ipv4/conf/vxlan-sm/rp_filter

The output of /proc/sys/net/ipv4/conf/vxlan-sm/rp_filter would be 2, in this case.

Note: Similar observation is seen even when using nmcli for creating the vxlan interface.

nmcli connection add type vxlan con-name vxlan-nm ifname vxlan-nm dev ens5 id 142
cat /proc/sys/net/ipv4/conf/vxlan-nm/rp_filter
echo 2 > /proc/sys/net/ipv4/conf/vxlan-nm/rp_filter
nmcli connection modify vxlan-nm ipv4.addresses 192.168.142.1/24 ipv4.method manual
cat /proc/sys/net/ipv4/conf/vxlan-nm/rp_filter

The output of /proc/sys/net/ipv4/conf/vxlan-nm/rp_filter is 1 and adding a small delay before setting the rp_filter, retains the value of rp_filter to 2.

skitt commented 1 year ago

The default is set to 1 in /usr/lib/sysctl.d/50-redhat.conf. Since gateway nodes are supposed to be pretty much dedicated to the tunnel, I wonder if it would be OK set set the default to 2 — not by changing the configuration file on the host, but by writing 2 to the net.ipv4.conf.default.rp_filter sysctl. That way new interfaces would have rp_filter set to 2 by default.

sridhargaddam commented 1 year ago

The default is set to 1 in /usr/lib/sysctl.d/50-redhat.conf. Since gateway nodes are supposed to be pretty much dedicated to the tunnel, I wonder if it would be OK set set the default to 2 — not by changing the configuration file on the host, but by writing 2 to the net.ipv4.conf.default.rp_filter sysctl. That way new interfaces would have rp_filter set to 2 by default.

But it would affect all the interfaces created on the Host, including the interfaces created by the CNI/other-applications which may not be desirable.

skitt commented 1 year ago

But it would affect all the interfaces created on the Host, including the interfaces created by the CNI/other-applications which may not be desirable.

Right, I was thinking that by the time the gateway agent started, the host’s interfaces should have settled down, but that might well not be the case.