opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.27k stars 727 forks source link

IPv6 Track WAN "loses" the IPv6 address when the interface loses its Ethernet carrier #4909

Closed g-a-c closed 2 years ago

g-a-c commented 3 years ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

I have 3 VLAN interfaces on an igb interface on my firewall appliance, which is physically connected to a Unifi switch. My WAN connection is PPPoE, and connected directly to my ISP modem. Each VLAN interface has "Track WAN" enabled and gets its own /64 from the /48 handed out by my ISP.

If the underlying igb interface goes down physically (for example if I update the switch firmware and the switch reboots), then when the VLAN interface come back up, they no longer have any IPv6 address assigned to them. The WAN interface still has one.

To Reproduce

Steps to reproduce the behavior:

  1. configure a PPPoE WAN connection, including IPv6 prefix delegation
  2. make sure the WAN connection has a delegated prefix in Interfaces/Overview
  3. enable a LAN interface on a different physical port
  4. enable a static IPv4 address
  5. enable IPv6/Track Interface on a LAN connection and assign it a prefix from the WAN interface
  6. make sure the LAN interface has an IPv6 address from the delegated prefix
  7. take down the LAN interface (reboot a switch, unplug a cable, etc)
  8. wait a few seconds
  9. restore the LAN interface
    1. expected: both IPv4 and IPv6 addresses are present on the LAN interface
    2. actual: only IPv4 is present

Expected behavior

When the interfaces come back up, they should retain/re-assign their "Track WAN" IPv6 addresses based on the WAN IPv6 delegated prefix which is still present on the WAN interface.

Describe alternatives you considered

n/a

Screenshots

n/a

Relevant log files

If applicable, information from log files supporting your claim.

Additional context

This appears similar to #4282 - but this bug refers to the loss of the physical LAN connection, #4282 refers to the loss of the PPPoE WAN connection.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 21.1.4 (amd64, OpenSSL) Intel Atom C2358 Network: 4 * igb (I think they're i210, the NICs are whatever is built into the Atom C2358 SoC)

fichtner commented 3 years ago

Hmm, so if IPv4 is static it's clear that it comes back after the link disappeared. Unfortunately the authoritative interface is your WAN link so if you unplug+plug this one the addresses should be back on LAN as well for IPv6. Goes directly into dhcp6c territory and not sure how to solve this other than managing expectations at this point.

g-a-c commented 3 years ago

Yep, as a workaround, I can reconnect the PPPoE session and the IPv6 addresses return - but depending on the timing, it may not always be convenient to do that right away and until you do, I've seen some of my clients still having an IPv6 address assigned (until the previously advertised router times out) and trying to route IPv6 traffic through the OPNsense box which no longer has one. So it's an inconvenience, and there is a workaround, but I've been meaning to open an issue for months to see if there was either a better solution that I hadn't thought of (apart from static addresses) or a chance that it could be fixed.

But after you mentioned dhcp6c specifically, I had a look at the logs and it looks like you're correct. The IPv6 addresses are only added to the LAN interfaces immediately after dhcp6c makes its request for prefix delegation, which only happens once when the PPPoE connection is brought up. So it (now) makes sense why this bug is happening and why reloading the PPPoE connection fixes it. The perfect fix would be for dhcp6c to have some interface tracking which watches for down/up events and checks its configuration file to see if there should be an IPv6 address assigned to a link which comes up, but this would have to be a fix upstream of OPNsense?

marjohn56 commented 3 years ago

dhcp6c will release/remove the addresses on an interface down event, providing that the no-release option has NOT been set. Have you got that set? If not then we might have to look at using one of the SIGUSR signals ( if it's still in the source! ) that specifically forces a release of all addresses on the down event. RADVD should be sending out a lifetime of 0 on the route, but I suspect the clients will still try for a while.

fichtner commented 3 years ago

It might disconnect all clients when we happen to fix this, even in other tracked interfaces. What I'm interested in is /var/etc/dhcp6c.conf during link down and subsequent link up. In either case SIGHUP should be sent to dhcp6c so I am assuming the linkup config is not correct.

marjohn56 commented 3 years ago

The other issue I've seen happening is that a sighup to dhcp6c does start its solicit, but the ISP does not respond until the lease time has expired, the only way to get it back is to drop the PPPoE session. @g-a-c , can you look at the logs, filter on dhcp6c and see if dhcp6c is sending solicits, you'll need to increase the dhcp6c log level in interfaces, then reboot and take it from there, as the log level will only be set when dhcp6c is restarted, that or kill the client from the console and then take an interface down and back up, that should restart the client.

g-a-c commented 3 years ago

dhcp6c will release/remove the addresses on an interface down event, providing that the no-release option has NOT been set. Have you got that set?

Good question - where is this option? I do have "Lock Interface" set, which I thought might help this problem. It didn't, but it looks like I never removed it. But I don't see an option specifically called "no release" in the Interface settings anywhere.

@g-a-c , can you look at the logs, filter on dhcp6c and see if dhcp6c is sending solicits

It looks like it, there are regular occurrences of this log snippet happening:

Apr  9 09:58:14 opnsense dhcp6c[2061]: Sending Solicit
Apr  9 09:58:14 opnsense dhcp6c[2061]: set client ID (len 14)
Apr  9 09:58:14 opnsense dhcp6c[2061]: set identity association
Apr  9 09:58:14 opnsense dhcp6c[2061]: set elapsed time (len 2)
Apr  9 09:58:14 opnsense dhcp6c[2061]: set option request (len 4)
Apr  9 09:58:14 opnsense dhcp6c[2061]: set IA_PD
Apr  9 09:58:14 opnsense dhcp6c[2061]: send solicit to ff02::1:2%pppoe0
Apr  9 09:58:14 opnsense dhcp6c[2061]: reset a timer on pppoe0, state=SOLICIT, timeo=6, retrans=63095

if you need more detail then I can increase the log level, but I can at least see that regular SOLICITs are happening on the shown retrans interval (looks like ~15/30/60/120 so far)

g-a-c commented 3 years ago

OK, I think I found the option you're talking about, under Interfaces/Settings/IPv6 DHCP/Prevent Release

This option is currently checked

marjohn56 commented 3 years ago

Yes, uncheck it, it's stopping dhcp6c sending releases. The idea works but some ISPs refuse to answer a solicit before the lease time has expired if they do not get a release. I added it to dhcp6c years ago for Sky UK who would change the prefix every time you reconnected, It works providing the ISP doesn't do what I suspect your ISP is doing,

g-a-c commented 3 years ago

OK - I'll uncheck it and reboot, and see what happens. FWIW the ISP is Zen (I followed this guide), and my prefix doesn't change regularly; the only reason I'm even using DHCP prefix delegation really is that whenever I renew/regrade once a year, there's a 50/50 chance whether my IPv6 gets disabled and if it does, a new prefix gets allocated to me when it gets turned back on. So I may end up just doing the slightly longer static setup off that page as a fallback option if changing the Prevent Release option doesn't help...

marjohn56 commented 3 years ago

You don’t need it with Zen, their addresses are static, even if you use dhcp6 it will still come back with the same address every time. Never heard of Zen changing a v6 address block, they should have sent you a /64 block for WAN and a /48 for LAN at the start of your contract; as long as the contract lasts that block should never change, same as the IPv4 block, which sadly they don't do now, at least not unless you pay extra, fortunately it was free when I joined them. I would suggest switching to statics if you are with Zen, set it up then forget about it.

marjohn56 commented 3 years ago

I will add that 'no-release' used to work fine with Zen, but they have changed something along the way. When we sort of get back to a normal life again and SWMBO goes out from time to time I'll be able to play with dhcpc6 on Zen and try and work out a way to fool them, if possible, but it might be something that's linking the dhcp6c session to the PPPoE session making it impossible to do anything sneeky.

g-a-c commented 3 years ago

they should have sent you a /64 block for WAN and a /48 for LAN at the start of your contract

Oh, yeah, they did. After a year, the price on my package had gone down so I called them and said I was happy to commit to another 12 month contract at the reduced rate. After that, IPv6 was disabled on the account so I emailed them to re-provision it, which they did - but with new /64 and /48 blocks. Year after that, same thing, I committed again at the cheaper price, no v6, when it was re-enabled there was another new /64//48. So it doesn't change on a regular basis, but the process of re-committing for the lower price (and presumably regrading to "Ultrafast" when it reaches my area) seems to screw the process up somehow.

marjohn56 commented 3 years ago

:) Never had that either, I'm on a rolling contract fixed price, about £19 pcm plus line rental for a 80Mbps ( I get 70+ ), a bunch of static v4s ( old customer ). Anyway, wrong place for chatting, the Github police will be on us.

g-a-c commented 3 years ago

I just rebooted my firewall (with Prevent Release disabled), then rebooted the switch to reproduce the problem, and still no IPv6 addresses on the LAN interfaces.

So I think I'll just assign them statically and watch out for the blocks changing whenever anything happens on the account if it gets reprovisioned. I won't close the issue though - this might still be a valid bug for anyone who gets a dynamic prefix on every PPPoE connection and can't easily switch to static addressing; so I'll leave it up to you and/or @fichtner whether it's worth keeping open.

marjohn56 commented 3 years ago

Yes, prevent release disabled will only take effect AFTER a reboot, changing it and then rebooting reboots it on the old state. However, you're still better of switching to statics.

jameskimmel commented 3 years ago

I have the same issue with almost the same hardware and setup. Router on a stick, VLAN, Unifi Switch, converter instead of Bridge Modem.

ISP is dynamic IPv4 and static IPv6 with /48 prefix. Both are configured with DHCP.

OPNsense-bot commented 2 years ago

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.