opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.31k stars 738 forks source link

Dual-WAN setup can access the FW over the primary (active) WAN, not the secondary (standby) WAN #3670

Closed drivera73 closed 5 years ago

drivera73 commented 5 years ago

DISCLAIMER: This bug seems to be similar to this issue, which in turn is based on this forum post, and the fix may be discussed therein, but it's in German so I'm unable to verify.

Describe the bug

I have a dual-WAN setup with failover properly working, but I've never been able to access the firewall or forwarded resources within over the standby connection - only over the active connection.

To Reproduce Steps to reproduce the behavior:

  1. Set up a firewall with two WAN networks, set up in failover, both gateways marked upstream, with priorities 1 and 2
  2. Define some services to forward to DMZ or LAN servers, with the appropriate firewall rules grouping together both interfaces, to forward the traffic as necessary.
  3. Access those services over the active connection, to make sure they're working OK (they should be)
  4. Attempt to access those services over the standby connection while the primary is still online, only to see connection attempts fail

Expected behavior The services are just as accessible via the standby circuit as via the primary one.

Environment

OPNsense 19.7.2-amd64 (with patch 7bfadb2) FreeBSD 11.2-RELEASE-p12-HBSD OpenSSL 1.0.2s 28 May 2019 Protectli FW4B with 8GB RAM and 256GB SSD

mimugmail commented 5 years ago

For me it sounds that 7bfadb2 should fix this: https://forum.opnsense.org/index.php?topic=13832.0

drivera73 commented 5 years ago

As you can see above, it's already applied, and it does not. Any other ideas?

drivera73 commented 5 years ago

To clarify: if the primary circuit goes down, and the system goes into failover (i.e. the secondary takes over), everything works as expected.

However, there's no logical reason in my mind that this wouldn't work the same way while both circuits are up.

mimugmail commented 5 years ago

Firewall : Settings : Advanced, Disable Force Gateway set?

drivera73 commented 5 years ago

It is not set. Also, this has always been an issue since I installed and configured the firewall last year.

drivera73 commented 5 years ago

It seems that what's missing is a "routeback" feature/rule: i.e. route responses for traffic coming in over an interface out that same interface (where applicable?) ... I don't even know if BSD has that, but that's what seems to be missing here.

I can see the traffic arrive at the firewall over the secondary, but there's never a response sent out - i.e. the firewall swallows it.

drivera73 commented 5 years ago

Ok I can confirm that what's happening:

So this looks like it's either a bug, or a bum rule. Checking my rules, I'm not forcing traffic to go out over any specific interface per-se with some very specific exceptions that only match very specific traffic: traffic initiated from specific IPs in the internal network, outboound to the internet ...

Attached are dumps of my filter rules and NAT rules, with IPs redacted as follows:

IPv6 is not in use anywhere.

Perhaps you can double-check me and smack me around if I did anything stupid and this isn't a bug? :)

Thanks!

fichtner commented 5 years ago

Normally you'd have a "reply-to" in the rule handling your inbound traffic for it to be returned on the right interface. From your description it seems that reply-to is not employed, likely by having another inbound rule overwriting it with a simple pass.

drivera73 commented 5 years ago

Looking through this manual page, perhaps this is what's missing:

 reply-to
 The reply-to option is similar to route-to, but routes packets that
 pass in the opposite direction (replies) to the specified inter-
 face.  Opposite direction is only defined in the context of a state
 entry, and reply-to is useful only in rules that create state.  It
 can be used on systems with multiple external connections to route
 all outgoing packets of a connection through the interface the in-
 coming connection arrived through (symmetric routing enforcement).

I see a couple of rules for that to handle DHCP/BOOTPS, and one for a syslog path that I have enabled for my ADSL modem:

@99 pass in quick on igb2 reply-to (igb2 CABLE_IP_BLOCK.1) inet proto udp from any port = bootps to (self:11) port = bootpc keep state label "4115317788804c8d0a156ff1a259440d" @100 pass in quick on igb3 reply-to (igb3 ADSL_IP_BLOCK.1) inet proto udp from any port = bootps to (self:11) port = bootpc keep state label "0b0d328860d6b993afea478d4411e3b5" @101 pass in quick on igb3 reply-to (igb3 ADSL_IP_BLOCK.1) inet proto udp from to port = syslog keep state label "347c5e876aa4b7958230ff62eccdbf4c"`

It seems to me that every interface chould (should?) have its own reply-to rule added for all traffic...

Cheers...

drivera73 commented 5 years ago

Normally you'd have a "reply-to" in the rule handling your inbound traffic for it to be returned on the right interface. From your description it seems that reply-to is not employed, likely by having another inbound rule overwriting it with a simple pass.

Right.... I came to the same conclusion (more-or-less), but:

  1. I don't know which of my inbound rules would override that ... we can pick any one as an example - say, tcp/2222 or tcp/22 or tcp/32400 ... and analyze to see what's happening for it
  2. This is something that should be handled transparently by the configuration engine

So...thoughts?

drivera73 commented 5 years ago

I think I may have an idea of what's happening.

I have both WAN interfaces added to a "firewall group" (Firewall : Groups), and that's how I'm defining the incoming rules for services being exported out to the internet: same services available regardless of which circuit is used, all configured in one place, etc. Super convenient :)

However, these are the services that don't seem to be getting a reply-to rule generated that would allow the correct (route-back) routing behavior to occur.

However, there's a rule explicitly set on the ADSL interface for the syslog service (udp/514, to capture log messages from the ADSL Modem), and this rule does get its correct reply-to rule added.

It would appear that that the configuration engine is neglecting to examine rules defined for firewall groups to see if any reply-to rules need to be defined for them.

Thoughts?

I have to go now, but when I come back (or maybe tomorrow) I might try manually duplicating all the same group-related rules on the interfaces independently, and see if that solves anything. If it does, then we'll have our culprit!

If not, then it's back to square one :)

Cheers!

AdSchellevis commented 5 years ago

the note in https://docs.opnsense.org/manual/firewall_groups.html ?

drivera73 commented 5 years ago

This is indeed the problem.

I went over to my NAT port forwarding rules and replaced all references of the group with the individual interfaces involved, and voilá! Everything worked as intended, except for ping, which I suspect is a problem with my ADSL modem proper since SSH was also having a weird issue (if I switch the forwarding from tcp/22 to port tcp/2222, everything works fine).

The question is: why?

This seems like a defect to me - it's the same thing to process a group's interface list while rendering the ruleset as it is to replace the group with each of its member interfaces when generating the NAT/rules, right?

AdSchellevis commented 5 years ago

if it's the note as mentioned in the docs, it's not a defect. you can't add reply-to in group rules, since it's one rule affective for multiple physical interfaces.

drivera73 commented 5 years ago

But you can identify the interfaces that are members for that group and generate the requisite reply-to rules...

This is why I think it's a defect: it's a job that (I think) can and should be done, but isn't being done.

AdSchellevis commented 5 years ago

be my guest, we're talking about interface groups as documented in https://www.freebsd.org/cgi/man.cgi?query=ifconfig and should (highly likely) be contributed upstream

drivera73 commented 5 years ago

I don't mean at the BSD level.

OPNSense must generate the PF rules based on data it stores. Thus, there must also be a moment where the group rules are processed from said stores data to generate the necessary PF rules.

It's at that moment that OPNSense can determine that reply-to rules should (or not) be generated, and do so accordingly.

That's what I'm referring to.

Cheers.

AdSchellevis commented 5 years ago

can we close this issue now? we're not going to remove interface groups and try to reverse engineer something else in (which has other downsides). Limitations are known and documented.

drivera73 commented 5 years ago

I'm not asking for groups to be removed. That'd be silly.

I'm asking if it's possible for OPNSense to add the reply-to rules as part of the group rules' rendering and processing. It does it for individual interfaces so it definitely seems possible...

If it's possible then this ticket should either remain open or be replaced by a more specific one.

If not, then yes it should be closed but at least there should be a stronger justification beyond "the limitation is documented" ... I.e.: is this a limitation b/c it's impossible to implement? Or is it a limitation from the originating codebase that was never questioned (and thus could be fixed eventually)?

fichtner commented 5 years ago

Just for perspective, in the old days command line utility manual pages "BUGS" section was a place to document known limitations and bugs. Not for having to fix them but making people aware of them. I don't think that practice has changed or acts as an excuse for not doing it or not making it "more visible". It's just a fact of the software.

AdSchellevis commented 5 years ago

I'm not going to spend more time on this ticket, already explained it doesn't work, @drivera73 is free to explain how he's going to do this, without trying to unravel groups in rules (which we are not going todo). Otherwise I would advise him to close the issue.

drivera73 commented 5 years ago

I remember that, and have done that myself on occasion.

In this case I'm simply asking if this can be fixed in the OPNSense end, should it (and thus keep a ticket open to track it)?

Or not...

99 commented 5 years ago

Whoever mentioned @ 99 in their message can you please delete it, thank you

drivera73 commented 5 years ago

Whoever mentioned @ 99 in their message can you please delete it, thank you

Sorry ... it's from line numbers pasted from a rule dump :)

drivera73 commented 5 years ago

I'm not going to spend more time on this ticket, already explained it doesn't work, @drivera73 is free to explain how he's going to do this, without trying to unravel groups in rules (which we are not going todo). Otherwise I would advise him to close the issue.

Aren't you the ray of sunshine XD

I'll close it for now, till I get time to have a look and see if I can pitch in a solution.

Thanks for your insights.