Closed crt333 closed 2 years ago
I'm experiencing the same issue on 21.7.4. What OPNsense version did you use when it worked?
I tried rolling back to 21.7.2, but the issue persists. Were the following steps correct or am I missing something?
opnsense-update -kr 21.7.2
opnsense-revert -r 21.7.2 base
opnsense-revert -r 21.7.2 opnsense
This has worked for years in every version of opnsense up to and including 21.7.3. I had adguard exclusively using unbound on port 5353, and unbound doing DoT over my WG tunnels. I had also used the same config before starting to use adguard on top of it.
Sorry, I don't know how to revert everything. Right now I'm still running adguard but it is using TLS directly rather than talking to unbound.
dnsleaktest always worked with this config, showing only the DoT servers being used.
up to and including 21.7.3
So this would indicate that my revert didn't work.
I've been trying to get this to work for the better part of last week without success. I might try to clean install an earlier version. Maybe even 21.1.
I guess there's a revert problem for you, since before 21.7.4 my logs showed all queries to port 853 (DoT) where going through WG
I clean installed and updated to the following version and everything works as expected.
OPNsense 21.1.9_1-amd64
OpenSSL 1.1.1k 25 Mar 2021
Might the updated versions of unbound
and os-wireguard
be an issue? The working version has these installed.
os-wireguard: 1.7
unbound: 1.13.1
Is there a way to selectively upgrade to 21.7 to isolate the package upgrade that's causing this bug? I'll clone the live system as is and will gladly assist in debugging this.
Is opnsense-patch
the right tool for the job? :hammer:
The situation in 21.7.5 is the same as 21.7.4, I can't specify WG outgoing interfaces for unbound, I have to specify WAN or it doesn't work.
This MIGHT be similar to https://forum.opnsense.org/index.php?topic=25327.0 (German)
https://github.com/opnsense/core/commit/76b8ae4490
Problem is you cant revert as the package was removed/replaced. Just as guess as this is the only DNS relevant change in 21.7.4.
I don't see Unbound acting up for using dnspython v2 in alias resolving with regard to WireGuard as an outgoing interface. It's more likely that 6d5721565226f3 does something, but there are no relevant indicators here in this report like pf rules comparison or naming the steps for configuring wireguard for it and if any custom rules are in the way and whether WireGuard interface is assigned or not.
This MIGHT be similar to https://forum.opnsense.org/index.php?topic=25327.0 (German)
Problem is you cant revert as the package was removed/replaced. Just as guess as this is the only DNS relevant change in 21.7.4.
I don't think this is related.
@fichtner : no rules comparison or otherwise complex setup is needed to reproduce the issue. Here are my steps off the top of my head:
With this simple setup Unbound uses the default route and not the selected interface
no rules comparison or otherwise complex setup is needed to reproduce the issue. Here are my steps off the top of my head:
Sure but not doing it will not point us into the right direction.
Sure but not doing it will not point us into the right direction.
Are the steps I provided enough?
I'll try to update my setup to 21.7.3 this weekend. Can I opnsense-patch 6d57215
to check whether the commit is the issue?
it should unpatch, yes, otherwise use
# opnsense-revert -r 21.4.3 opnsense
to move the core package to the working release.
I was running on 21.7.4 and tried to revert, but things were still broken, so I clean installed 21.1 and upgraded to 21.1.9, which is what I'm currently running.
My thinking was to upgrade to 21.7.3, the latest version where Unbound works properly. Then apply the 6d57215
patch to verify it breaks things. Or isn't that helpful to you?
the good state of 21.1.9 works too I suppose. We need to compare the /tmp/rules.debug output on a good and the bad configuration.
This worked for me on 21.7.3, but broke with the .4 update (and .5)
Thanks to both of you for your help!
Chris
Nov 12, 2021 7:29:27 AM Michael Schnerring @.***>:
I was running on 21.7.4 and tried to revert, but things were still broken, so I clean installed 21.1 and upgraded to 21.1.9, which is what I'm currently running.
My thinking was to upgrade to 21.1.3, the latest version where Unbound works properly. Then apply the 6d57215 patch to verify it breaks things. Or isn't that helpful to you?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub[https://github.com/opnsense/core/issues/5329#issuecomment-967118887], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ARWE2I7M4GSRQN6HKGOOIULULUJBZANCNFSM5HE4QO2A]. Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. [data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEgAAABICAYAAABV7bNHAAAABHNCSVQICAgIfAhkiAAAACtJREFUeJztwQENAAAAwqD3T20ON6AAAAAAAAAAAAAAAAAAAAAAAAAAAD4MUUgAARy2AfAAAAAASUVORK5CYII=###24x24:true###][Tracking image][https://github.com/notifications/beacon/ARWE2I2RIJ2RKAMAHBL67XTULUJBZA5CNFSM5HE4QO2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHGSRAJY.gif]
I clean installed 21.1.1 and performed the following steps:
21.1.9_1
via serial consoleos-wireguard
plugin (1.7)WAN_VPN
)WAN_VPN_GW
)<source>
-> <interface
):
LAN
-> WAN
LAN
-> WAN_VPN
WAN_VPN
interface as outgoing interface in Unbound and restart UnboundWith this config, Unbound uses the VPN tunnel.
Running opnsense-patch 6d57215
and rebooting changes the following part:
# [prio: 100000]
pass out log route-to ( igb1 WAN6::WAN6:WAN6:WAN6:WAN6 ) from {(igb1)} to {!(igb1:network)} keep state allow-opts label "ba70dc1769980afe65cbac8576cee233" # let out anything from firewall host itself (force gw)
pass out log route-to ( igb1 WAN.WAN.WAN.1 ) from {(igb1)} to {!(igb1:network)} keep state allow-opts label "761a166383f941c76dbf2c76c9e2f241" # let out anything from firewall host itself (force gw)
pass out log route-to ( wg0 10.108.122.176 ) from {(wg0)} to {!(wg0:network)} keep state allow-opts label "1fbb62a93d5262e897b6c0f184574cba" # let out anything from firewall host itself (force gw)
to this:
# [prio: 100000]
pass out log route-to ( igb1 WAN.WAN.WAN.1 ) from {(igb1)} to {!(igb1:network)} keep state allow-opts label "761a166383f941c76dbf2c76c9e2f241" # let out anything from firewall host itself (force gw)
pass out log route-to ( igb1 WAN6::WAN6:WAN6:WAN6:WAN6 ) from {(igb1)} to {!(igb1:network)} keep state allow-opts label "ba70dc1769980afe65cbac8576cee233" # let out anything from firewall host itself (force gw)
And indeed, Unbound then uses WAN instead of WAN_VPN.
Running opnsense-patch 6d57215
again reverts /tmp/rules.debug
and everything works again (actually had to reset the FW state, even after reboot).
I'm gonna do an upgrade from 21.1.9 to the latest version and try to revert the patch to see if that works, too.
Reverting the patch works on the latest version:
21.7.5
opnsense-patch 6d57215
(reverts the patch)Do future releases automatically re-apply reverted patches, or is that something I need to keep in mind?
Great troubleshooting, well done!
@schnerring
add GW (WAN_VPN_GW)
and what if you assign this gw as upstream gateway in WAN_VPN
properties?
and what if you assign this gw as upstream gateway in
WAN_VPN
properties?
I do that on the WG local peer. I check Disable Routes and specify the gateway IP there. I can try, but the docs say to leave the interface unconfigured. Or is that something that changed?
I just looked at the code..it looks for gateways in interfaces properties now imho. Can you try to set upstream gateway in wan_vpn properties?(not only in wg local peer). On my test vm it starts to generate pf rules after that
On Saturday, November 13, 2021, Michael Schnerring @.***> wrote:
and what if you assign this gw as upstream gateway in WAN_VPN properties?
I do that on the WG local peer. I check Disable Routes and specify the gateway IP there. I can try, but the docs say to leave the interface unconfigured. Or is that something that changed?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opnsense/core/issues/5329#issuecomment-967790530, or unsubscribe .
Yes, you're right. Statically configuring IPv4 on the interface fixes the issue. I don't really understand what's happening on a lower level when assigning the wg0
"device" to the WAN_VPN
"interface". I just "learned" that before creating a WireGuard gateway after assigning wg0
to an interface that WireGuard had to be restarted, so the interface gets an IP from WireGuard:
Ever since I've "learned" doing it this way, I assumed that WG populates the IPv4 config of the interface with the values entered in the local peer, namely the (local) tunnel address and gateway IP. So is this a bug? Or is it really required to enter duplicate information on the local peer AND on the interface?
A while back I read the following comment from @fichtner . I don't fully understand, but does it have something to do with that?
Unbound binds to addresses, not interfaces. Dynamic interfaces pose an issue then, because you would need to stop and start unbound dynamically, which causes cache loss and general dns resolution disruption. The bind interface feature is a popular request but people often forget that this has side effects that have these annoying operational effects / implementation quirks.
In this particular case I assume Unbound simply forgets to load the correct address or is not notified of it. Best to use a manual ACL with listen on any IMO.
Interesting. If you leave the upstream gateway as "Auto-detect", is it still OK? I presume so since at least in my setup the WG gateway is the only one listed in the dropdown.
I agree it seems duplicative to have to include the gateway info in the WG local config as well (and also the IPs, although presumably that must be in the WG config for WG to work). Particularly as the WG config only allows for one gateway, so both IPv4 and IPv6 can't be specified (but they both work regardless).
I will update the how-to once the proper configuration has been clarified.
I wonder if this also means that I should be configuring the IPs on the interface for my WG road warrior setup as well?
(I haven't read all the comments, but thought to shed some light on gateways)
If we're looking at https://github.com/opnsense/core/issues/5230, the old behaviour was more or less a bug, some time ago we added Dynamic gateway policy
(https://docs.opnsense.org/manual/interfaces.html originating from https://github.com/opnsense/core/commit/dba70c0ead9b55e4be8ac37dad5c2afbfe3209e8) so one can provide an upstream gateway for interfaces that doesn't necessarily need an address (like openvpn and I expect the same is the case for wireguard).
In which case one can assign a gateway for an interface which is configured by the service itself (it generates rules like ... route-to ( wg0 ) ...
.
Auto detect gateways are only used if there's a file provided with an address in it (which dhcp clients do) or "Dynamic gateway policy" is set, which more or less explained in https://docs.opnsense.org/manual/gateways.html#missing-dynamic-gateway
I'm not sure that will work for WG as when "Disable Routes" is selected in the WG local config (which is needed to allow selective routing of hosts through the tunnel), a gateway IP is required to be specified.
so if I understood @AdSchellevis correctly, the changes are intentional (and I also think the previous behavior was less logical). And if I see correctly, the only thing that has changed is that now an allowing rule with the route-to
specified is not created by filter.lib.inc
to direct traffic from the wg interface to external networks (and traffic is directed according to the let out anything from firewall host itself
(w/o 'force gateway') rule through the default gateway). thus, you can force the script to create a rule by specifying the gateway in the interface settings. or (which in my opinion is more logical and clearer) just create this rule manualy (can even limit ports for DoT)
Much of this goes over my head, and as an end user of an appliance, I only want to concern myself with implementation details as much as I have to. I'll sure learn things along the way, but I didn't expect having to spend so much time to get to the bottom of this. Does this come down to WG still being experimental and its integration requiring ironing out which isn't really possible before being implemented on a kernel level?
So if one of the solutions @kulikov-a mentions is the right / "logical" one, we should at least update the WG docs.
I agree with @schnerring and I have no useful input as to what the right way is, but instructions in the documentation and perhaps removal of "outgoing network interfaces" in the unbound config would be good, since it seems to add confusion. I'm happy with whatever the "best" way is, and thank everyone involved.
@kulikov-a Could you provide me with an example what the rule you'd create would look like? If I understand correctly, I want to replicate this:
pass out log route-to ( wg0 10.108.122.176 ) from {(wg0)} to {!(wg0:network)} keep state allow-opts label "1fbb62a93d5262e897b6c0f184574cba" # let out anything from firewall host itself (force gw)
I tried creating:
Interface | WAN_VPN |
Direction | out |
Source | WAN_VPN address |
Destination | ! WAN_VPN net |
Gateway | WAN_VPN_GW |
But I get the error:
Policy based routing (gateway setting) is only supported on inbound rules.
@schnerring gui will not allow to do this on interface (did not know about this limitation )
but you can try to make it as a floating rule. like:
this shoud generate rule like
@kulikov-a
.. gui will not allow to do this on interface (did not know about this limitation )
I think we should remove that limitation, it was added a long time ago to prevent people from using "out" when they intended "in" (https://github.com/opnsense/core/commit/01c16b0a86715a316ecd20cf6679c2915e302a6c), but there are still valid use-cases and I'm doubting it really helps to lock it here. cc @fichtner
@schnerring
Does this come down to WG still being experimental and its integration requiring ironing out which isn't really possible before being implemented on a kernel level?
Not really. Avoiding the normal destination routing using policy based routing unfortunately is a bit complicated, in theory the service (wireguard) could generate a file containing the routing address when the line is up in which case it would present itself as automatic gateway (https://github.com/opnsense/core/blob/master/src/etc/inc/plugins.inc.d/openvpn/ovpn-linkup). The automatic rules are far from perfect as well to be honest, which is more or less inherited (different scenario's need different types of rules).
Without special modifications traffic will usually (on most platforms) follow the path by destination.
So if one of the solutions @kulikov-a mentions is the right / "logical" one, we should at least update the WG docs.
That's always a good idea :)
By the way, I expect if you add a static gateway and assign it on the interface you will end up with the same rule as before as well.
@AdSchellevis thanks for pointing out. so a little clearer)
I expect if you add a static gateway and assign it on the interface you will end up with the same rule as before as well
yes, just mentioned another way (allowing to redirect traffic more granularly imho)
So sounds like the simplest solution is simply to add the IP(s) to the interface config, rather than just relying on WG to bind to them? The how-to already says to create a static gateway for the interface, so that part is a given.
BTW, would be nice if the WG plugin didn't separately require a gateway IP to be specified in the local config when Disable Routes is selected, particularly as it appears redundant if the gateway is already assigned to the interface. Something for @mimugmail to consider perhaps?
I think we should remove that limitation, it was added a long time ago to prevent people from using "out" when they intended "in" (01c16b0), but there are still valid use-cases and I'm doubting it really helps to lock it here. cc @fichtner
@AdSchellevis I remember we did talk about it but I couldn't pin my memory to this particular validation. I'm ok with removal of that limitation.
Static interface configuration works well.
I tried it with the floating rule but can't get it to work. I created the floating rule per https://github.com/opnsense/docs/pull/365#pullrequestreview-811956828 (like @kulikov-a explained) and it's correctly generated.
However, as soon as I remove the static IP configuration for WG interfaces, Unbound stops working. Is the Outgoing Interfaces option incompatible with "dynamic" interfaces?
Did you restart WG so that it rebinds the IPs to the interface, after removing the IPs from the interface config itself?
Countless times... restarted Unbound, rebooted, re-imported my backup config. I would like to re-install and test from a minimal config but I'm not sure when I'm gonna have the time for that.
OK, I'm content to leave the PR on hold until you get the chance to test (unfortunately I don't have a setup that allows me to do so). The key question in my mind is exactly what is required to get the desired behaviour - just an interface config with a gateway set, or just an outbound FW rule for policy-based routing, or both?
Might be something specific to wireguard, @schnerring can you post theifconfig -m -v
output of a wireguard interface with and without static address assigned?
I will give my input too if that helps.
With no static addresses assigned:
wg1: flags=8143<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
capabilities=80000<LINKSTATE>
inet 10.105.96.110 netmask 0xffffffff broadcast 10.105.96.110
inet6 fc00:bbbb:bbbb:bb01::2a:606d prefixlen 127
groups: tun wireguard
nd6 options=103<PERFORMNUD,ACCEPT_RTADV,NO_DAD>
Opened by PID 2767
With static addresses assigned:
wg1: flags=8143<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
capabilities=80000<LINKSTATE>
inet 10.105.96.110 netmask 0xffffffff broadcast 10.105.96.110
inet6 fc00:bbbb:bbbb:bb01::2a:606d prefixlen 127
groups: tun wireguard
nd6 options=101<PERFORMNUD,NO_DAD>
Opened by PID 2767
Only the ND options are different.
With the static IPs and gateway specified in the interface configuration, the routing entries are there:
pass out log route-to ( wg1 10.105.96.109 ) from {(wg1)} to {!(wg1:network)} keep state allow-opts label "c8bbe7226cc73d39b1ff03e1afc48087" # let out anything from firewall host itself (force gw)
pass out log route-to ( wg1 fc00:bbbb:bbbb:bb01::2a:606c ) from {(wg1)} to {!(wg1:network)} keep state allow-opts label "a5230f9c0fc9f45f7a58564c83bcbccd" # let out anything from firewall host itself (force gw)
With no gateway specified in the interface configuration (ie "auto-detect" only), the routing entries are not there.
Interestingly, if I restart WireGuard after assigning static IPs in the interface configuration, the ND options become the same:
wg1: flags=8043<UP,BROADCAST,RUNNING,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
capabilities=80000<LINKSTATE>
inet 10.105.96.110 netmask 0xffffffff broadcast 10.105.96.110
inet6 fc00:bbbb:bbbb:bb01::2a:606d prefixlen 127
groups: tun wireguard
nd6 options=103<PERFORMNUD,ACCEPT_RTADV,NO_DAD>
Opened by PID 43939
But the PROMISC flag has gone.
the netmask in both scenario's looks troubling to be honest, 10.105.96.110/32
shouldn't be allowed to access 10.105.96.109
, as long as the manual firewall rules generated are the same as the ones with the static interface+gateway ones you should be safe there.
Completely off-topic: why, after I make an interface change, does my CPU usage sit at a minimum of 50%, when ordinarily it will be much less than that (down to 0%)? Only a reboot fixes it up
the netmask in both scenario's looks troubling to be honest,
10.105.96.110/32
shouldn't be allowed to access10.105.96.109
, as long as the manual firewall rules generated are the same as the ones with the static interface+gateway ones you should be safe there.
The gateway is configured as a Far Gateway on IPv4. The /32 netmask is to match the WG configuration.
Completely off-topic: why, after I make an interface change, does my CPU usage sit at a minimum of 50%, when ordinarily it will be much less than that (down to 0%)? Only a reboot fixes it up
no clue, you will have to inspect top if it stays high.
The gateway is configured as a Far Gateway on IPv4. The /32 netmask is to match the WG configuration.
With the risk of asking very dumb questions, can't you offer the client something a bit larger than a single address? I've seen some strangeness with IPsec VTI tunnels in the past as well, although pf(4)
shouldn't care too much in this case.
i'm quite curious what @schnerring ifconfig
output looks like, when using IPv4 the ND options shouldn't be very relevant, so if the settings don't seem to stick with similar ifconfig output, the next question is why does pushing the same address change anything (unless unbound is the culprit here)
no clue, you will have to inspect top if it stays high.
Ah - ntopng!
With the risk of asking very dumb questions, can't you offer the client something a bit larger than a single address? I've seen some strangeness with IPsec VTI tunnels in the past as well, although
pf(4)
shouldn't care too much in this case.
Well, Mullvad gives a /32 and a /128 for the tunnel, and that is what I configure in WG. In theory I could specify a larger netmask, I suppose (and in fact I do for IPv6 since otherwise I can't configure a gateway, since Far Gateways aren't a thing for IPv6). But I am not sure it would make any difference?
With static config:
wg0: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
capabilities=80000<LINKSTATE>
inet 10.107.58.245 netmask 0xffffffff
inet6 fc00:bbbb:bbbb:bb01::2c:3af4 prefixlen 128
groups: wg wireguard
nd6 options=103<PERFORMNUD,ACCEPT_RTADV,NO_DAD>
Without static config:
wg0: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
options=80000<LINKSTATE>
capabilities=80000<LINKSTATE>
inet 10.107.58.245 netmask 0xffffffff
inet6 fc00:bbbb:bbbb:bb01::2c:3af4 prefixlen 128
groups: wg wireguard
nd6 options=103<PERFORMNUD,ACCEPT_RTADV,NO_DAD>
Btw, I'm using wireguard-kmod
. Might this be an issue?
@schnerring could be the case, it doesn't really make sense that setting the same address again magically changes the routing. If that's the case, it should be easy to validate by manually setting only an address on wg0
and see if it makes a difference.
ifconfig wg0 10.107.58.245 netmask 255.255.255.255
So things work when I just add the floating rule that's also added by statically configuring the WG interface.
I'd expect the user-defined floating rule to effectively override the auto-generated rule from the static WG interface configuration. But this only partially happens. Requests from Unbound still trigger the let out anything from firewall itself (force gw)
rule.
What's odd is that Unbound requests display WAN
as interface, despite showing the wg0
IP as source.
I remember having read somewhere that this is just a display error, but I don't remember where.
edit: "yoy" is the custom floating rule that replicates the auto-generated rule from static IP configuration
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug I have a single WAN and two WG interfaces, and have configured unbound DoT to use the two WG interfaces as the "outgoing network interfaces" on the settings page. This has worked in all versions before 21.7.4, but in the current version WAN must be included in the outgoing interfaces or unbound stops working.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Up until 21.7.4 the DoT DNS lookups went over WG, now they don't do anything unless WAN is selected
Describe alternatives you considered
A clear and concise description of any alternative solutions or workaround you considered.
Screenshots
If applicable, add screenshots to help explain your problem.
Relevant log files
If applicable, information from log files supporting your claim.
Additional context
Add any other context about the problem here.
Environment Quotom box
Software version used and hardware type if relevant, e.g.:
OPNsense 21.7.4-amd64 FreeBSD 12.1-RELEASE-p20-HBSD OpenSSL 1.1.1l 24 Aug 2021