opnsense / src

OPNsense operating system on top of FreeBSD
https://opnsense.org/
Other
354 stars 151 forks source link

rtsold: specific cases maybe not send signals to dhcp6c #215

Open wevsty opened 1 month ago

wevsty commented 1 month ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

When the ISP forcibly updates the IPV6 prefix, rtsold does not seem to notify dhcp6c to handle it.

My ISP seems to force a new IPV6 prefix to be released every so often, and the ISP doesn't seem to be keeping to its time agreement with its customers. When a new prefix is released, all machines on the LAN are disconnected. Normally, we would think that rtsold would send a SIGHUP to dhcp6c, and that dhcp6c would receive the notification and resend a DHCPv6 request to update the prefix.

However, this does not seem to be the case in my case. This eventually causes all devices to disconnect from the network for a period of time (about 10 minutes in my case).

The results and system logs of capturing packets when a problem is sent can be found in the: https://github.com/opnsense/dhcp6c/issues/37#issuecomment-2261198621 and https://github.com/opnsense/dhcp6c/issues/37#issuecomment-2295244001

Since rtsold gives limited debug information, if there is a way to get more useful information please let me know and I will try to get more information. And if there are any other suggestions to help diagnose the problem, I'd be more than happy to try them.

Tip: to validate your setup was working with the previous version, use opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert)

To Reproduce

Waiting for ISPs to issue new IPV6 prefixes

Expected behavior

rtsold should notify dhcp6c within a few seconds so that the disconnection should only take a few seconds.

Describe alternatives you considered

None.

Screenshots

None.

Relevant log files

None.

Additional context

Add any other context about the problem here.

Environment

OPNsense 24.7.1-amd64 Dell Optiplex 3070 MFF Intel(R) Core(TM) i3-8100T CPU @ 3.10GHz (4 cores, 4 threads) Network card: Intel I210 (WAN) Realtek NIC (LAN)

fichtner commented 1 month ago

Thanks for the ticket! Some initial digging in the manual page:

     Specifically, rtsold sends at most 3 Router Solicitations on an interface
     after one of the following events:

     •   Just after invocation of rtsold daemon.
     •   The interface is up after a temporary interface failure.  rtsold
         detects such failures by periodically probing to see if the status of
         the interface is active or not.  Note that some network cards and
         drivers do not allow the extraction of link state.  In such cases,
         rtsold cannot detect the change of the interface status.
     •   Every 60 seconds if the -m option is specified and the rtsold daemon
         cannot get the interface status.  This feature does not conform to
         the IPv6 neighbor discovery specification, but is provided for mobile
         stations.  The default interval for router advertisements, which is
         on the order of 10 minutes, is slightly long for mobile stations.
         This feature is provided for such stations so that they can find new
         routers as soon as possible when they attach to another link.

So the -m would maybe speed this up if the daemon even considers the link as reset. But I have the feeling it doesn't as your connection never recovered before and now dhcp6c can on its own.

Now if this is a dead end the other possibility is to look into default router and SLAAC advertisements. If your ISP sends a router advertisement of zero lifetime rtsold could consider this an invitation to reset and try again?

The information could be extracted from the kernel and handled separately but I have the feeling another daemon to do this wouldn't make much sense either.

Cheers, Franco

wevsty commented 1 month ago

So the -m would maybe speed this up if the daemon even considers the link as reset. But I have the feeling it doesn't as your connection never recovered before and now dhcp6c can on its own.

I could try changing the -m option to confirm if there is an improvement, but I don't think it's a good idea to wait for a polling check.

Now if this is a dead end the other possibility is to look into default router and SLAAC advertisements. If your ISP sends a router advertisement of zero lifetime rtsold could consider this an invitation to reset and try again?

I think it's possible. The manual for rtadvd states that

       Basically, hosts MUST NOT send Router  Advertisement  messages  at  any
       time  (RFC 4861, Section 6.2.3).  However, it would sometimes be useful
       to allow hosts to advertise some parameters such as prefix  information
       and  link  MTU.   Thus, rtadvd can be invoked if router lifetime is ex-
       plicitly set zero on every advertising interface.

       ……

       Use SIGHUP to reload the configuration file  /etc/rtadvd.conf.   If  an
       invalid  parameter  is found in the configuration file upon the reload,
       the entry will be ignored and the old configuration will be used.  When
       parameters in an existing entry are updated, rtadvd  will  send  Router
       Advertisement messages with the old configuration but zero router life-
       time to the interface first, and then start to send a new message.

       Use  SIGTERM  to  kill  rtadvd  gracefully.   In this case, rtadvd will
       transmit router advertisement with router lifetime 0 to all the  inter-
       faces (in accordance with RFC 4861 6.2.5).

This document suggests that broadcasting the prefix lifetime to 0 is a standard action, and upstream ISPs are likely to have done the same thing. But I think any IPV6 prefix change should trigger sending a signal to dhcp6c. I'm not sure what happens when a prefix with a lifetime of 0 is reset immediately upon broadcast.

The information could be extracted from the kernel and handled separately but I have the feeling another daemon to do this wouldn't make much sense either.

The manual for rtsold has the -O parameter written in it.


       -O script-name
           Specifies a supplement script file to handle the Other Configu-
           ration flag of the router advertisement.  When the flag changes
           from FALSE to TRUE, rtsold will invoke script-name with a first
           argument of the receiving interface name and a second  argument
           of  the  sending router address, expecting the script will then
           start a protocol for the other configuration.  The script  will
           not  be run if the Managed Configuration flag in the router ad-
           vertisement is also TRUE.  script-name  must  be  the  absolute
           path  from  root  to the script file, be a regular file, and be
           created by the same owner who runs rtsold.

This parameter will handle the Other Configuration flag. In my case the command to start is /usr/sbin/rtsold -p /var/run/rtsold.pid -A /var/etc/rtsold_script.sh -R /usr/local/opnsense/scripts/interfaces/rtsold_resolvconf.sh -a -u -D I observe that the -O parameter is not specified. I think by using this parameter and specifying a new script, we can handle the prefix change.

fichtner commented 1 month ago

I could try changing the -m option to confirm if there is an improvement, but I don't think it's a good idea to wait for a polling check.

Well, it is a workaround for "mobile" connections after all.

But I think any IPV6 prefix change should trigger sending a signal to dhcp6c.

You're conflating SLAAC with DHCPv6 maybe because your ISP handles it this way. While you need SLAAC for DHCPv6 to work (DHCPv6 doesn't provide routers!) the two should operate independently from each other after a lease has been successfully acquired. Much of where this fails is when the ISP restarts their DHCP servers and leases are "lost" on the server side but still used by the client. Contrary to SLAAC/RA, DHCPv6 doesn't have a mechanism to revoke a valid lease. Fun stuff. :)

That being said I still agree with you that a prefix deprecation should be considered a link event because of its impact on the overall connectivity.

My best guess is that IPSs try to avoid zero lifetime advertisements in the average cases which would allow us to get away with a change in behaviour from rtsold, maybe coupled with a new option. The code to read the DHCP options presented by the router is already inside rtsol.c so it should be relatively easy to read the vltime of the prefix and generate an event when it is zero.

This parameter will handle the Other Configuration flag.

The -A parameter supersedes this for convoluted reasons.

Cheers, Franco

wevsty commented 4 weeks ago

My best guess is that IPSs try to avoid zero lifetime advertisements in the average cases which would allow us to get away with a change in behaviour from rtsold, maybe coupled with a new option. The code to read the DHCP options presented by the router is already inside rtsol.c so it should be relatively easy to read the vltime of the prefix and generate an event when it is zero.

I don't have contact with ISPs in other countries, so I don't know if lifetime to 0 is a special operation, which may require more data reporting or experience. For me, I think adding the option to change the behavior of rtsold is acceptable.

Please contact me if you need to do any testing.And thank you for your help.

fichtner commented 4 weeks ago

It's just a guess based on the fact that the SLAAC prefixes should be/could be rather static in the average case, but I'm willing to bet on it.

I'll give this code a try and report back. Your packet captures are a great resource by the way. Thanks! :)

Cheers, Franco