opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.26k stars 725 forks source link

interfaces: SLAAC WAN improvements #5862

Closed maurice-w closed 2 years ago

maurice-w commented 2 years ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

If the WAN IPv6 Configuration Type is set to SLAAC, /tmp/$if_routerv6 doesn't get created - unless a (stateless) DHCPv6 server exists upstream. Without /tmp/$if_routerv6, features like gateway monitoring, gateway groups and policy-based routing don't work. This might have never worked and is probably not a recently introduced issue.

To Reproduce

Steps to reproduce the behavior:

  1. Set up two OPNsense instances. Connect the OPNsense2 WAN interface to the OPNsense1 LAN interface.
  2. OPNsense1 LAN interface: IPv6 Configuration Type 'Static IPv6', IPv6 address '2001:db8::1/64', Router Advertisements mode 'Unmanaged', DHCPv6 server disabled
  3. OPNsense2 WAN interface: IPv6 Configuration Type 'SLAAC'
  4. See error on OPNsense2: WAN interface gets an IPv6 address, but /tmp/$if_routerv6 doesn't exist and 'System / Gateways / Single / WAN_SLAAC' doesn't show a gateway address.
  5. On OPNsense1, change the Router Advertisements mode to 'Stateless' and enable the DHCPv6 server (you don't have to specify an address range).
  6. See result on OPNsense2: /tmp/$if_routerv6 exists and 'System / Gateways / Single / WAN_SLAAC' does show a gateway address.

Expected behavior

For SLAAC WAN interfaces, an upstream DHCPv6 server should not be required for proper functionality.

Describe alternatives you considered

Configure the gateway address statically, but this isn't always possible.

Additional context

SLAAC without DHCPv6 is common on mobile (4G / 5G) connections. I encountered this issue when trying to set up a dual WAN system where WAN2 is tethered to an Android phone.

Environment

OPNsense 22.1.10 (amd64, OpenSSL). Hyper-V Gen2

maurice-w commented 2 years ago

Turns out you don't even have to enable the DHCPv6 server in step 5, just switching the RA mode to 'Stateless' is sufficient. So it seems OPNsense only creates /tmp/$if_routerv6 if the O flag is set in RAs. Weird.

I tried to figure out what creates the /tmp/$if_routerv6 file for SLAAC interfaces but couldn't find anything in core. Does rtsold do that by itself?

fichtner commented 2 years ago

The router file is indeed only supplied by rtsold or mpd5 (pppoe linkup). For fringe use case of 6RD and 6to4 it is written statically.

maurice-w commented 2 years ago

Thanks, @fichtner. So does rtsold write the /tmp/$if_routerv6 file itself or does it pass the router address to a script or something which then writes the file? I did dig through interfaces.inc but that only seems to handle the router file creation for the special interface types you mentioned. Should I look in the rtsold source code?

fichtner commented 2 years ago

It’s a little harder to track down since „ifctl“ helper was introduced.

https://github.com/opnsense/core/blob/b8c6c182025eb49f93f419a2be4ad9b914c2710c/src/etc/inc/interfaces.inc#L2860

maurice-w commented 2 years ago

rtsold is started with the -M and -O options, which invokes rtsold_script.sh if the M or O flag is set in RAs: https://github.com/opnsense/core/blob/b8c6c182025eb49f93f419a2be4ad9b914c2710c/src/etc/inc/interfaces.inc#L2709-L2715 Neither of these flags are set in this use case. Could this lead into the right direction?

fichtner commented 2 years ago

Yeah, that’s how it works. I don’t know if this makes sense historically for different flags or if rtsold isn’t fully implemented or wasn’t at the time it was integrated. The rtsold manual page would certainly shed more light on this.

maurice-w commented 2 years ago

Hey @fichtner, so it seems we've established how this is currently implemented: When rtsold receives a Router Advertisement with the M or O flag set, it invokes rtsold_script.sh with the receiving interface name and the sending router as arguments. rtsold_script.sh then passes these arguments to ifctl, which creates the router file.

This works for DHCPv6 (where the M flag should always be set), but is hit-or-miss for SLAAC (where we can't make any assumptions about these flags).

We could use the rtsold -R option to invoke a script when a Router Advertisement contains RDNSS / DNSSL information (which is usually the case in scenarios without a DHCPv6 server). This script could then use ifctl to create the router file (just like rtsold_script.sh, but without the dhcp6c stuff). It still would fail if RAs have neither the M or O flag nor RDNSS / DNSSL information. Unlikely, but possible. And if RAs have these flags and RDNSS / DNSSL, we would invoke ifctl -6rd twice. Could this be an issue? Other / better ideas?

maurice-w commented 2 years ago

Proof of concept patch: https://github.com/maurice-w/core/commit/62cb980d2c323ffe2e8aac32351603d254f3e2b6 Initial tests were successful.

Question: rtsold_script.sh also creates /tmp/$if_defaultgwv6. What's this file for? Should the new script create it, too? My test systems seem to work fine without it. https://github.com/opnsense/core/blob/b8c6c182025eb49f93f419a2be4ad9b914c2710c/src/etc/inc/interfaces.inc#L2861-L2862 So far I haven't seen any side effects of invoking ifctl -6rd twice. Do you think there could be any or is it meant to handle this?

fichtner commented 2 years ago

Thanks for the patch as a base for discussion. I would think we only write if router file is empty to keep a somewhat mandatory value in there that we know works. You can read the contents from the ifctl utility as well.

about default gateway file that should be emulated too but requires further work later on to streamline that file‘s purpose and handling as it lies outside the scope of ifctl.

to extract router address we should use a separate var just for style reasons and maybe the extraction can be made with shell internals instead of using sed. I can help with that if you make a PR for it.

furthermore, the XXX is somewhat wrong as ifctl already does handle DNS so the script could do it. It just needs the right parameters and variable to add.

maurice-w commented 2 years ago

@fichtner, quick update: I've addressed your points and also added support for passing RDNSS / DNSSL information from RAs to ifctl. I'll make a PR once testing with 22.7 is completed.

Only writing the router file when it's empty would prevent router address changes from being processed. I'm working on a way to only run the new script if the existing rtsold_script.sh hasn't run before. That should avoid any side effects.

fichtner commented 2 years ago

@maurice-w don't need to overcomplicate., something like this will suffice:

if [ -z "$(ifctl -r6 -i igb1)" ]; then
   # XXX add it if empty
fi

there's still a risk of race in the tool itself but I think we need to improve ifctl anyway to avoid adding duplicates for DNS servers (if we start having multiple sources for the same interface)

maurice-w commented 2 years ago

Alright @fichtner, I used a modified version of the check you suggested and created a PR. Let's continue the discussion there. I'm still concerned that checking whether the router file is empty doesn't allow router address changes without a link down/up event. Yes, that's a thing I've encountered. More testing required.

And feel free to replace sed with shell internals! :-)

fichtner commented 2 years ago

are you ok with 93d56cb590d49? figured the file is static so we can make an external script which is easier for maintenance.

Should we react on -d as well? Maybe we shouldn't check the router file and instead try to figure out if "ifname" is indeed configured as SLAAC so that we can do an authoritative configuration instead of trying to duck away unless we don't have anything else.

Cheers, Franco

maurice-w commented 2 years ago

A static file makes sense, absolutely. And thanks for the clean-up! When I apply opnsense-patch 93d56cb, the rtsold_resolvconf file gets created without the .sh extension. Strange. Maybe a side-effect of how opnsense-patch works? I just renamed it manually.

About -d: I haven't really looked into the circumstances under which this will be invoked by rtsold. Will do. Low priority.

About checking whether the interface's IPv6 configuration is set to SLAAC: This doesn't tell us whether to use RDNSS / DNSSL from RAs or stateless DHCPv6 or even both. All of these possibilities actually exist. We have to make this decision by interpreting the rtsold output: O flag set, no RDNSS / DNSSL in RAs: use dhcp6c in stateless mode O flag not set, RDNSS / DNSSL in RAs: use DNS info from RAs O flag set, RDNSS / DNSSL in RAs: Well, ideally use DNS info from dhcp6c and RAs, merge and remove duplicates. Or for simplicity, just use dhcp6c.

Cheers Maurice

fichtner commented 2 years ago

fixed the patch issue via https://github.com/opnsense/update/commit/d669b0f24e2a04ecf5

let me merge the improved patch then and work on the other things in the ticket... through me magic of ifctl we could actually write interface information as "ifname:slaac" files too and just fallback to those if we didn't find any ones from the "ifname" file itself upon reading. I think this would take the pressure off overwriting data between slaac/no slaac modes and makes it easy to detect if we need to use slaac information in the first place... keeping them separate might be the best option as per your previous statement.

Cheers, Franco

fichtner commented 2 years ago

Worked on 90db8f4d0fa2206 which makes transparent reading of :slaac interface suffix a possibility. Router files are still a work in progress, but will follow the same logic eventually.

maurice-w commented 2 years ago

Applying opnsense-patch -c update d669b0f unfortunately doesn't fix the missing .sh issue for me.

The improved patch https://github.com/opnsense/core/commit/d582435b4b21a4246dd1b40df6e0bb8b28e7be77 works, thanks! Doing some real-world testing now. As expected, router address changes require an interface down/up event to be properly processed. Being able to have multiple _routerv6 / _nameserverv6 / _searchdomainv6 files per interface might indeed be a neat solution.

Something related I noticed: We don't include the zone identifier (%ifname suffix) when writing the _defaultgwv6 file here. But when this file gets created by default gateway switching, the zone id is included. Not sure what's correct in this context.

fichtner commented 2 years ago

Oh right, https://github.com/opnsense/update/commit/d5ec887efa on top makes sense (still untested due to being preoccupied with 22.7 release preparations).

For defaultgw files I think that the XXX is pretty much all we need to know here. The file is created by a script not authorised to do it for historical reasons and since a number of things were reworked we should ideally avoid writing those files, but that needs separate testing. I do want to enforce scope in defaultgw and router files in the future as it also makes other code easier (like the scope bug in the host route you found), but that needs quite some testing spread out over a few months and partial release in the 22.7.x series so we can have a consistent state in 23.1 hopefully. ifctl was specifically crafted to deal with this problem in the midterm and makes its full debut in the 22.7 series although gradually introduced earlier in 22.1.x

I'll add the ":slaac" suffix idea now and report back later. Thanks for these helpful discussions! :)

Cheers, Franco

fichtner commented 2 years ago

@maurice-w as discussed see 2e2e59c1d8a43

I'm cleaning the router file read up soon, but now moving to final 22.7 build.

fichtner commented 2 years ago

@maurice-w the state in final 22.7 development version should implement the idea of the fallback including router file. That is version is 22.7.r_110 or higher...

fichtner commented 2 years ago

@maurice-w last update: integrated everything as neatly as possible now. If you still have issues with it just open PR or let me know and we can look at it :)

maurice-w commented 2 years ago

I did some tests with 23.1.a_20 and it (mostly) works! Biggest issue I've encountered so far: rtsold seems to invoke rtsold_resolvconf.sh every time it encounters an RA with RDNSS / DNSSL info. This means we trigger configctl -d interface newipv6 quite often, which also restarts radvd... ough. Might be better to add some checks whether the router address or DNS info have actually changed?

While testing, I also noticed that SLAAC with stateless DHCPv6 (O flag set) doesn't fully work either! rtsold does invoke rtsold_script.sh, which then creates the _routerv6 file, but DNS info is not actually requested. There was some dormant code for this (dhcp6c_wan.conf), so this might have worked once. The functionality was probably lost during the extensive dhcpc6 rework (multi WAN support) years(?) ago. I'm a bit surprised no-one ever noticed, but so it seems. Patch in #5937.

fichtner commented 2 years ago

I'm working on the reload resilience in rc.newwanip(v6) see https://github.com/opnsense/core/issues/5933 there is a number of issues that disrupt connectivity when IP does not change.

As a side note for your reading pleasure: https://lists.freebsd.org/archives/freebsd-net/2022-August/002234.html

maurice-w commented 2 years ago

Alright, so I'll make no further changes to rtsold_resolvconf.sh for the time being and look forward to rc.newwanip(v6) improvements. :-)

Interesting read on the mailing list. I kind of get both sides. I agree that FreeBSD should have a DHCPv6 client in base so that it "just works" out of the box (like DHCPv4 does). But which client to choose and how to implement this is another question.

fichtner commented 2 years ago

0b29f71 takes this to 11 for the sake of code deduplication. Make sure to see if doesn't want to remember a prefix, but according to the code it doesn't unless yielded by the server:

https://github.com/opnsense/dhcp6c/blob/cff2641237596c28d05dd8d4b591e5b508d93078/dhcp6c_script.c#L198-L208

fichtner commented 2 years ago

As for the FreeBSD discussion the dhclient deprecation bit seems rather risky. I can understand the server-side of FreeBSD where probably any DHCP client works, but the router side is a bit more tricky as we have seen over the years.

maurice-w commented 2 years ago

More experiments to check for side effects with existing DHCPv6 setups. The upstream router in these tests advertises DNS servers and search domains via DHCPv6 and RAs, but different ones.

The downstream router only adds the DNS servers from DHCPv6 to /etc/resolv.conf and only creates host routes for these, which seems okay to me. If no DNS servers are advertised via DHCPv6, it falls back to those from RAs. Good!

But: 'Interfaces: Overview' always only shows the DNS servers from RAs (/tmp/$if:slaac_nameserverv6)!?

No DNS search domains get added to /etc/resolv.conf at all since get_searchdomains() only checks for IPv4: https://github.com/opnsense/core/blob/05922760bd05dca42a34fae7094eb798ba14bca5/src/etc/inc/system.inc#L301-L313

fichtner commented 2 years ago

But: 'Interfaces: Overview' always only shows the DNS servers from RAs (/tmp/$if:slaac_nameserverv6)!?

Pretty easy to trace this with the following

# ifctl -nVl6i <interfacename>

No DNS search domains get added to /etc/resolv.conf at all since get_searchdomains() only checks for IPv4:

-l mode without interface specification will ignore address family type and return all. Over here I don't have any searchdomains other than IPv4:

# ifctl -ls
/tmp/igb1_searchdomain
# echo "foo.bar" > /tmp/igb1_searchdomainv6
# ifctl -ls
/tmp/igb1_searchdomain
/tmp/igb1_searchdomainv6

Might be best to look if you got any:

# ls /tmp/*_searchdomain*

Cheers, Franco

maurice-w commented 2 years ago

There are _nameserverv6 and _searchdomainv6 files from DHCPv6 and SLAAC:

# ls -l /tmp/hn1*
-rw-r-----  1 root  wheel  42 Aug 22 22:50 /tmp/hn1:slaac_nameserverv6
-rw-r-----  1 root  wheel  25 Aug 22 22:50 /tmp/hn1:slaac_routerv6
-rw-r-----  1 root  wheel  36 Aug 22 22:50 /tmp/hn1:slaac_searchdomainv6
-rw-r-----  1 root  wheel  29 Aug 22 22:50 /tmp/hn1_defaultgwv6
-rw-r--r--  1 root  wheel  42 Aug 22 22:50 /tmp/hn1_nameserverv6
-rw-r-----  1 root  wheel  24 Aug 22 22:50 /tmp/hn1_oldipv6
-rw-r--r--  1 root  wheel  25 Aug 22 22:50 /tmp/hn1_prefixv6
-rw-r-----  1 root  wheel  25 Aug 22 22:50 /tmp/hn1_routerv6
-rw-r--r--  1 root  wheel  40 Aug 22 22:50 /tmp/hn1_searchdomainv6

dhcpd6 and radvd advertise different nameservers:

# cat /tmp/hn1_nameserverv6
2606:4700:4700::1111
2606:4700:4700::1001
# cat /tmp/hn1:slaac_nameserverv6
2001:4860:4860::8888
2001:4860:4860::8844

ifctl returns the DHCPv6 nameservers as expected:

# ifctl -n6i hn1
2606:4700:4700::1111
2606:4700:4700::1001

Full diag output:

# ifctl -nVl6i hn1
+ [ -l '=' -c ]
+ [ -l '=' -l ]
+ [ -z hn1 ]
+ find -s /tmp -name hn1_nameserverv6 -or -name 'hn1:*_nameserverv6'
+ MATCHES='/tmp/hn1:slaac_nameserverv6
/tmp/hn1_nameserverv6'
+ RESULTS=''
+ FILE=hn1:slaac_nameserverv6
+ IF=hn1:slaac
+ IF=hn1
+ MD=nameserverv6
+ eval echo '${hn1_nameserverv6}'
+ echo
+ [ -z '' ]
+ RESULTS=' hn1_nameserverv6'
+ eval export 'hn1_nameserverv6=${MATCH}'
+ export 'hn1_nameserverv6=/tmp/hn1:slaac_nameserverv6'
+ FILE=hn1_nameserverv6
+ IF=hn1
+ IF=hn1
+ MD=nameserverv6
+ eval echo '${hn1_nameserverv6}'
+ echo /tmp/hn1:slaac_nameserverv6
+ [ -z /tmp/hn1:slaac_nameserverv6 ]
+ eval export 'hn1_nameserverv6=${MATCH}'
+ export 'hn1_nameserverv6=/tmp/hn1_nameserverv6'
+ eval echo '${hn1_nameserverv6}'
+ echo /tmp/hn1_nameserverv6
/tmp/hn1_nameserverv6
+ exit 0

The only way I can get ifctl to return the SLAAC nameservers is by querying them explicitly:

# ifctl -n6i hn1:slaac
2001:4860:4860::8888
2001:4860:4860::8844

But why would status_interfaces.php do this? Very weird.

dhcpd6 and radvd advertise different search domains:

# cat /tmp/hn1_searchdomainv6
dhcpv6.example.com.
dhcpv6.example.net.
# cat /tmp/hn1:slaac_searchdomainv6
dnssl.example.com
dnssl.example.net

ifctl returns the correct file:

# ifctl -ls
/tmp/hn1_searchdomainv6

And the correct domains (only when specifying interface and address family):

# ifctl -s6i hn1
dhcpv6.example.com.
dhcpv6.example.net.
# ifctl -s6i hn1:slaac
dnssl.example.com
dnssl.example.net

But /etc/resolv.conf doesn't contain any of these domains:

# cat /etc/resolv.conf
domain local.example.com
nameserver 127.0.0.1
nameserver 2606:4700:4700::1111
nameserver 2606:4700:4700::1001
search local.example.com

Cheers Maurice

maurice-w commented 2 years ago

Okay, I've (kind of) found the root cause of the resolv.conf issue. Something doesn't like that the domains in the searchdomainv6 file end with a . (unlike in slaac_searchdomainv6).

fichtner commented 2 years ago

@maurice-w funky stuff, can you try 041e92a + a785c12

maurice-w commented 2 years ago

@fichtner Both issues fixed, thanks! a785c129a2ff2e54640b3f7de197b4140205087c is funky stuff, indeed...

fichtner commented 2 years ago

@maurice-w yay, shall we close this then? Can always open new tickets for small issues you see

Thanks, Franco

maurice-w commented 2 years ago

@fichtner Agreed!