Closed sjjh closed 11 months ago
Quick test adding a second gateway...
The following input errors were detected:
The gateway IP address "10.3.0.2" already exists.
If you talk about unspecified dynamic gateways delivered by the ISP at runtime.. I'm not even sure how and where to present that.
Sorry, did not test it with a manual config. It's about DHCP (as stated in the initial post). Probably in most cases when a second gateway will be created usind DHCP, the connection will be established very quickly and thus the issue will be noticeable directly. Thus I could imagine to present the error message in the gateway > single gateway screen. Even if it later occurs, I would imagine someone will be looking sooner or later at the gateway screen and would see an error message there. Additionally I believe a log entry might be helpful.
PPPoE and DHCP are distinct, but routers are provided in both cases. These routers are written to files on the disk:
# ls /tmp/*_router
To my knowledge the problem first and foremost is that you cannot simultaneously push traffic through both WANs if they have the same gateway address. The second one is dead. Default gateway switching still works, but since the single point of failure is your ISP gateway the failover point is rather moot.
So typically you only use such a setup if you want to bundle two connections in order to use doubled bandwidth. All these constraints may or may not apply to the case at hand. My reluctance here is adding an error to a functional setup as well. Some people don't mind or haven't noticed. Not sure where the sweet spot for this request shall be.
Cheers, Franco
In our setup we are using a 1Gbit/s link for "most" traffic, and a 30Mbit/s link dedicated only for VoIP traffic (so no bundling to double the bandwidth). The gateway to use is selected by firewall / NAT rules (the internal VoIP traffic is coming from a separate VLAN). This seems to work and it does not look like as if one interface would be dead completely. We are experiencing irregular issues if the web-gateway goes down (e.g. after connection breakages or reboots) that the web traffic is using the VoIP-gateway and will not come back to the web-WAN it should use. (Web-gateway is marked as upstream, default, higher priority (equals lower number) than the VoIP-gateway)
I understood (but might be wrong due to my limited understanding) that the setup with two gateways and only one gateway-gateway address is not supported at all. Thus I thought that an error message would be helpful (would at least have saved me quite some hours of research on the net). If one gateway would be dead (even if people would not notice), it still sounds sensible to me to show an error message to make them aware of that fact. Right now, at least I, wasn't aware of the root cause of the topic and it costs me quite some time do research.
Probably obvious, but in case it helps, yes, both files contain the same IP address:
$ ls /tmp/pppoe*_router
/tmp/pppoe1_router /tmp/pppoe2_router
$ diff /tmp/pppoe1_router /tmp/pppoe2_router
$
Do you have a gateway group set? Loss and delay triggers are broken currently, see #6231
Cheers, Franco
No, no gateway group is used. We also disabled gateway monitoring as we do have no fallback anyway it does add no value (and could potential only lead to false positive).
But you are using default gateway switching? I’m not sure how that works without proper monitoring.
no, no switching at all. Just two gateways, for specific traffic:
Are both set to upstream gateway? Can you explain "web-gateway goes down" a little more?
Thanks, Franco
Only the "web" gateway is set to upstream.
With "web-gateway goes down" I mean occasions as e.g. power loss of firewall, cable disconnected, reboot of firewall, taking the gateway down in SW, forcing the gateway down by (false positive) gateway monitoring result, ... all situations when the interface is not up. Not in all but in some cases we than have issues as described that all the traffic will only use the other VoIP gateway and stick there, even if both gateways are available again. My expectation was, that as soon as the web-gateway will come up again, it will be used again (due to priority, marking as upstream, ...) but it is not. Often it only helps to take the VoIP-gateway down, and after a while then the traffic switches back to the web-gateway.
Ok, when the traffic is stuck on the VOIP WAN will this resolve it?
# /usr/local/etc/rc.filter_configure
If this doesn't work you could also try
# /usr/local/etc/rc.routing_configure
But I suspect the first one will work.
Cheers, Franco
Will try, when I experience the problem next time, and report back.
So, after maintenance work of our ISP tonight, leading to a cut-off of the uplink, this morning we were having the same issue, that the web-traffic was using the wrong gateway.
I tried both, # /usr/local/etc/rc.filter_configure
and # /usr/local/etc/rc.routing_configure
, and both did not work. I also tried reconnecting both gateways in the web UI under Interfaces > Overview > reload, which also did not work. Only editing the gateways under System > Gateways > Single (enabling the the monitoring monitoring and reapplying the changes) helped to bring traffic back to the correct gateway.
That seems to indicate gateway monitoring (dpinger) plays a bigger role here in decision. It would perhaps appear dpinger is "stuck" on the second link. Have you tried to disable host routes for the gateways?
Can you share the gateway log during the event and fix?
The development version has improved gateway monitor handling and recovery, but perhaps due to the same gateway IP this might be a OS problem of sorts still.
Cheers, Franco
Have you tried to disable host routes for the gateways?
sry, not sure. Can you point me to the setting you are talking about?
Can you share the gateway log during the event and fix?
root@fw:/var/log/gateways # ls -l
total 184
-rw------- 1 root wheel 10557 Mar 30 13:55 gateways_20230330.log
-rw------- 1 root wheel 57752 Mar 31 23:58 gateways_20230331.log
-rw------- 1 root wheel 99903 Apr 1 20:56 gateways_20230401.log
-rw------- 1 root wheel 3875 Apr 24 12:35 gateways_20230424.log
-rw------- 1 root wheel 115 May 23 20:08 gateways_20230523.log
-rw------- 1 root wheel 932 Jun 22 07:53 gateways_20230622.log
lrwxr-x--- 1 root wheel 39 Jun 22 08:01 latest.log -> /var/log/gateways/gateways_20230622.log
root@fw:/var/log/gateways # cat latest.log
<12>1 2023-06-22T07:51:55+02:00 fw.example.com dpinger 29060 - [meta sequenceId="1"] send_interval 1000ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr 8.8.8.8 bind_addr n.n.n.1 identifier "GW_INTERNET_WAN_PPPOE "
<12>1 2023-06-22T07:51:55+02:00 fw.example.com dpinger 30434 - [meta sequenceId="2"] send_interval 1000ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr 8.8.4.4 bind_addr n.n.n.2 identifier "GW_VOIP_WAN_PPPOE "
<12>1 2023-06-22T07:52:48+02:00 fw.example.com dpinger 29060 - [meta sequenceId="3"] exiting on signal 15
<12>1 2023-06-22T07:52:48+02:00 fw.example.com dpinger 30434 - [meta sequenceId="4"] exiting on signal 15
It's a setting for each individual gateway: "Disable Host Route"
So after 07:52:48 the first line was up being used again? It's a bit strange since rc.routing_configure will also restart all monitors.
Cheers, Franco
It's a setting for each individual gateway: "Disable Host Route"
Sorry, overlooked that one. It's not disabled. Shall I disable it for both and check if it makes a difference next time? (if there is a next time -- due to the ongoing issues we are currently considering to abandon the second gateway and just use one, as long as bandwidth permits it)
So after 07:52:48 the first line was up being used again?
Yes, after the mentioned steps in my above post the gw internet WAN worked again as expected.
It's relatively strange about the fix with the "apply", in a nutshell the GUI is calling /usr/local/etc/rc.routing_configure
. Just to make sure gateway monitor is now disabled (option checked).
Yes you can check disable host route setting, but it only makes sense if monitor itself is enabled (option unchecked).
Cheers, Franco
Just to make sure gateway monitor is now disabled (option checked).
It initially was disabled (option checked), I enabled it (to have a change I could apply), and then disabled it again (and applied again), for both gateways respectively.
Yes you can check disable host route setting, but it only makes sense if monitor itself is enabled (option unchecked).
Which is not (monitor). I'll nevertheless just enable it, if it cannot hurt and we'll (might) see next time if it makes any difference.
FYI: We removed the second gateway (as it is not supported, as stated in the initial posting) to erase this as a root cause for other connection problems. Thus I will not be able to test/debug this any further. The initial bug/feature request is IMHO nevertheless valid, thus leaving this bug open. :)
We have something similar with 3 WAN links to the same ISP and currently with the same gateway address (this was not always the case):
We use gateway rules to enforce traffic from our VOIP server over the WAN_VOIP interface. We use similar rules for assigning certain traffic to certain interfaces. Remaining traffic goes over a gateway group balancing WAN and WAN2.
I was honestly not aware this was unsupported.
Things that I have noticed that might not be working:
We are currently in contact with the ISP to see if we can get different gateway IPs assigned.
I am happy to provide some testing although it is a production system so I am weary of anything that might affect client connectivity,
I have now disabled gateway monitoring on WAN2/WAN_VOIP and will see what the impact (if any) is on the gateway groups.
Overlapping networks break normal (destination) routing constraints, this is an issue on most platforms. It's like instructing the mailman the same address is located at different locations, in which case a letter might be delivered randomly.
In theory it should be possible to define virtual overlapping networks using fibs (https://man.freebsd.org/cgi/man.cgi?query=setfib), but it comes with quite some constraints (the running application should choose on which virtual network it lives). Unfortunately that's not a scenario easy to support from our end. If I'm not mistaken in linux the problem is similar, but solvable using VRF (https://docs.kernel.org/networking/vrf.html), which probably has similar challenges.
This issue has been automatically timed-out (after 180 days of inactivity).
For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.
If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
I can create a multiwan setup with two (PPPoE) gateways which get their IP addresses via DHCP from the ISP and have uplink both the same upstream gateway IP. This setup is apparently not supported by FreeBSD (as multipath is disabled due to other issues), see: https://forum.opnsense.org/index.php?topic=34189.0 Although this setup is not supported, no error message or warning is shown in the Web GUI.
To Reproduce
Just create two gateways having the same upstream gateway address.
Expected behavior
It works or an error message is shown.
Describe alternatives you considered
Not using multi WAN.
Additional context
The resulting feature request would be to check the IP address of the respective gateway of the two gateways. In case both gateways have the same gateway IP address, show a big error message in the Web GUI and log an error message to the log file.
Environment
OPNsense 23.1.7_3-amd64 FreeBSD 13.1-RELEASE-p7 OpenSSL 1.1.1t 7 Feb 2023