opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.37k stars 757 forks source link

MultiWAN reply-to not working as expected in some instances #5226

Closed rjdza closed 2 years ago

rjdza commented 3 years ago

Important notices

Describe the bug

My version is OPNsense 21.7.2_1-amd64

I'm not sure how long this has been an issue, but I can confirm that it also exists on recent pfSense versions (at least it did until I left pfSense). I have a workaround for OPNSense, no workaround for pfSense.

We have 3 WAN links, 2 x Fibre connections (F1 and F2) and 1 x Microwave connection (MW01). I do not think WAN type has anything to do with the problem, but am not sure.

We have 4 firewalls, but are reducing them to 2. 2 x OPNSense, 2 x pfSense. We also have a number of Linux machines. The problem presents on the OPNSense and pfSense firewalls, but not on the Linux machines.

F1 and F2 each give us a number of public IP addresses to use, which means each firewall has it's own public IP address and talks to an upstream GW that has a public IP address.

MW01 only has a single IP address, and that IP address is assigned to the vendor provided and managed MikroTik router at the edge of the network. All hosts are assigned a private IP address (from the 10.0.0.0/24 range) and use the MikroTik (10.0.0.1) as a default GW. Various ports are forwarded from the MikroTik to the machines inside on the 10.0.0.x IP addresses. All ports not specifically forwarded to a host are forwarded to a CARP IP address on the 10.0.0.x range. This CARP address used to be assigned to the pfSense machines, it's now assigned to the OPNSense machines.

The Problem When the F1 or F2 gateway is set as the default, outside hosts can connect to IP addresses on the F1 and F2 addresses on the firewalls, and the firewalls can connect using their F1 and F2 IP addresses. BUT, ports on the MW01 are unreachable, and connectivity via the MW01 IP addresses fails.

If I set the MW01 gateway as the default, F1 and F2 addresses do not work in or out, but MW01 works as expected.

Traffic dumps show that when F1 or F2 are set as the default GW, traffic comes in over MW01 and tries to leave via the default GW (F1 or F2), but F1 and F2 work fine, i.e. when F1 is thew default GW, traffic that comes in on F1 leaves on F1, and traffic that comes in on F2 leaves on F2.

When MW01 is the default GW, all traffic tries to leave over MW01, regardless of which interface it came in on.

The pfSense boxes have the exact same problem. The linux boxes work without a problem.

The Workaround The workaround uses firewall rules to set the reply-to field.

I have given the Fibre GWs a higher priority than the Microwave link so that the firewall will never fail over to MW as long as a single fiber link is up. This is important, because the fiber links will not work if the microwave is the default GW.

I then have to have two versions of each rule - one for the F1 & F2 IPs, and a separate one for the MW01 IPs. The MW01 rule has the reply-to field set to use the MW01 gw as the reply-to gateway. With this set, everything seems to work as expected, and connections to all IPs and ports work all the time.

Problems with the workaround This is fairly obvious, but it's included for completeness. While the workaround works, it's fairly intrusive as it makes using firewall groups and floating rules cumbersome - you need to have two rules for each effective rule, and the order of the rules is critical (MW01 rules need to be ahead of their F1F2 counterparts).

The large number of rules adds complexity and increases the chances of errors and misconfigurations.

FWIW, pfSense doesn't have the ability to set the reply-to field (at least I couldn't find it), so there is no workaround there.

To Reproduce

Steps to reproduce the behavior:

Simple tests were unable to reproduce the behaviour, but my ability to test is limited right now. I'm trying to source a Mikrotik and set up some servers in a test lab, but am unsure when that will be.

For now I'm hoping I can provide information and help with troubleshooting.

Expected behavior

All IP addresses on a WAN interface can connect to the Internet and can be connected to from the Internet, assuming all the necessary firewall rules are in place.

Describe alternatives you considered

I've manually set the reply-to field after locking the priorities of the interfaces.

Screenshots

Nothing right now.

Relevant log files

I'm not sure what log files would be useful.

Additional context

None yet.

Environment

Versions OPNsense 21.7.2_1-amd64 FreeBSD 12.1-RELEASE-p20-HBSD OpenSSL 1.1.1l 24 Aug 2021

Dell Optiplex 7010 Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz (4 cores)

OPNsense-bot commented 2 years ago

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.