Open wevsty opened 3 months ago
Thanks for the ticket! Some initial digging in the manual page:
Specifically, rtsold sends at most 3 Router Solicitations on an interface
after one of the following events:
• Just after invocation of rtsold daemon.
• The interface is up after a temporary interface failure. rtsold
detects such failures by periodically probing to see if the status of
the interface is active or not. Note that some network cards and
drivers do not allow the extraction of link state. In such cases,
rtsold cannot detect the change of the interface status.
• Every 60 seconds if the -m option is specified and the rtsold daemon
cannot get the interface status. This feature does not conform to
the IPv6 neighbor discovery specification, but is provided for mobile
stations. The default interval for router advertisements, which is
on the order of 10 minutes, is slightly long for mobile stations.
This feature is provided for such stations so that they can find new
routers as soon as possible when they attach to another link.
So the -m
would maybe speed this up if the daemon even considers the link as reset. But I have the feeling it doesn't as your connection never recovered before and now dhcp6c can on its own.
Now if this is a dead end the other possibility is to look into default router and SLAAC advertisements. If your ISP sends a router advertisement of zero lifetime rtsold could consider this an invitation to reset and try again?
The information could be extracted from the kernel and handled separately but I have the feeling another daemon to do this wouldn't make much sense either.
Cheers, Franco
So the -m would maybe speed this up if the daemon even considers the link as reset. But I have the feeling it doesn't as your connection never recovered before and now dhcp6c can on its own.
I could try changing the -m option to confirm if there is an improvement, but I don't think it's a good idea to wait for a polling check.
Now if this is a dead end the other possibility is to look into default router and SLAAC advertisements. If your ISP sends a router advertisement of zero lifetime rtsold could consider this an invitation to reset and try again?
I think it's possible.
The manual for rtadvd
states that
Basically, hosts MUST NOT send Router Advertisement messages at any
time (RFC 4861, Section 6.2.3). However, it would sometimes be useful
to allow hosts to advertise some parameters such as prefix information
and link MTU. Thus, rtadvd can be invoked if router lifetime is ex-
plicitly set zero on every advertising interface.
……
Use SIGHUP to reload the configuration file /etc/rtadvd.conf. If an
invalid parameter is found in the configuration file upon the reload,
the entry will be ignored and the old configuration will be used. When
parameters in an existing entry are updated, rtadvd will send Router
Advertisement messages with the old configuration but zero router life-
time to the interface first, and then start to send a new message.
Use SIGTERM to kill rtadvd gracefully. In this case, rtadvd will
transmit router advertisement with router lifetime 0 to all the inter-
faces (in accordance with RFC 4861 6.2.5).
This document suggests that broadcasting the prefix lifetime to 0 is a standard action, and upstream ISPs are likely to have done the same thing. But I think any IPV6 prefix change should trigger sending a signal to dhcp6c. I'm not sure what happens when a prefix with a lifetime of 0 is reset immediately upon broadcast.
The information could be extracted from the kernel and handled separately but I have the feeling another daemon to do this wouldn't make much sense either.
The manual for rtsold has the -O parameter written in it.
-O script-name
Specifies a supplement script file to handle the Other Configu-
ration flag of the router advertisement. When the flag changes
from FALSE to TRUE, rtsold will invoke script-name with a first
argument of the receiving interface name and a second argument
of the sending router address, expecting the script will then
start a protocol for the other configuration. The script will
not be run if the Managed Configuration flag in the router ad-
vertisement is also TRUE. script-name must be the absolute
path from root to the script file, be a regular file, and be
created by the same owner who runs rtsold.
This parameter will handle the Other Configuration flag.
In my case the command to start is /usr/sbin/rtsold -p /var/run/rtsold.pid -A /var/etc/rtsold_script.sh -R /usr/local/opnsense/scripts/interfaces/rtsold_resolvconf.sh -a -u -D
I observe that the -O parameter is not specified.
I think by using this parameter and specifying a new script, we can handle the prefix change.
I could try changing the -m option to confirm if there is an improvement, but I don't think it's a good idea to wait for a polling check.
Well, it is a workaround for "mobile" connections after all.
But I think any IPV6 prefix change should trigger sending a signal to dhcp6c.
You're conflating SLAAC with DHCPv6 maybe because your ISP handles it this way. While you need SLAAC for DHCPv6 to work (DHCPv6 doesn't provide routers!) the two should operate independently from each other after a lease has been successfully acquired. Much of where this fails is when the ISP restarts their DHCP servers and leases are "lost" on the server side but still used by the client. Contrary to SLAAC/RA, DHCPv6 doesn't have a mechanism to revoke a valid lease. Fun stuff. :)
That being said I still agree with you that a prefix deprecation should be considered a link event because of its impact on the overall connectivity.
My best guess is that IPSs try to avoid zero lifetime advertisements in the average cases which would allow us to get away with a change in behaviour from rtsold, maybe coupled with a new option. The code to read the DHCP options presented by the router is already inside rtsol.c
so it should be relatively easy to read the vltime
of the prefix and generate an event when it is zero.
This parameter will handle the Other Configuration flag.
The -A parameter supersedes this for convoluted reasons.
Cheers, Franco
My best guess is that IPSs try to avoid zero lifetime advertisements in the average cases which would allow us to get away with a change in behaviour from rtsold, maybe coupled with a new option. The code to read the DHCP options presented by the router is already inside rtsol.c so it should be relatively easy to read the vltime of the prefix and generate an event when it is zero.
I don't have contact with ISPs in other countries, so I don't know if lifetime to 0 is a special operation, which may require more data reporting or experience. For me, I think adding the option to change the behavior of rtsold is acceptable.
Please contact me if you need to do any testing.And thank you for your help.
It's just a guess based on the fact that the SLAAC prefixes should be/could be rather static in the average case, but I'm willing to bet on it.
I'll give this code a try and report back. Your packet captures are a great resource by the way. Thanks! :)
Cheers, Franco
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
When the ISP forcibly updates the IPV6 prefix,
rtsold
does not seem to notify dhcp6c to handle it.My ISP seems to force a new IPV6 prefix to be released every so often, and the ISP doesn't seem to be keeping to its time agreement with its customers. When a new prefix is released, all machines on the LAN are disconnected. Normally, we would think that
rtsold
would send aSIGHUP
todhcp6c
, and thatdhcp6c
would receive the notification and resend a DHCPv6 request to update the prefix.However, this does not seem to be the case in my case. This eventually causes all devices to disconnect from the network for a period of time (about 10 minutes in my case).
The results and system logs of capturing packets when a problem is sent can be found in the: https://github.com/opnsense/dhcp6c/issues/37#issuecomment-2261198621 and https://github.com/opnsense/dhcp6c/issues/37#issuecomment-2295244001
Since
rtsold
gives limited debug information, if there is a way to get more useful information please let me know and I will try to get more information. And if there are any other suggestions to help diagnose the problem, I'd be more than happy to try them.Tip: to validate your setup was working with the previous version, use opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert)
To Reproduce
Waiting for ISPs to issue new IPV6 prefixes
Expected behavior
rtsold
should notifydhcp6c
within a few seconds so that the disconnection should only take a few seconds.Describe alternatives you considered
None.
Screenshots
None.
Relevant log files
None.
Additional context
Add any other context about the problem here.
Environment
OPNsense 24.7.1-amd64 Dell Optiplex 3070 MFF Intel(R) Core(TM) i3-8100T CPU @ 3.10GHz (4 cores, 4 threads) Network card: Intel I210 (WAN) Realtek NIC (LAN)