Closed Greelan closed 2 years ago
Hi @Greelan,
Thanks for the detailed context. Is 437b308fffed7e406ac6bdb7ac206e5257e26ffd + cff2641237596c28d05dd8d4b591e5b508d93078 what you are looking for? :)
If you need a test package I can provide one tomorrow.
Cheers, Franco
Wow, incredible response time! Thank you.
I will check out the commits but given my C coding days are more than 25 years ago, I likely will have little to add ;) (also the reason why I didn't create a PR myself). Happy to test though.
Again, thanks for being so responsive.
The commits look consistent with the Debian patch contents (with the d_printf
function adjustment to match the OPNsense context). Here's hoping it works!
Actually, I do note that these changes in the Debian patch aren't in your commit (they relate to lines 1266 and 1269 of OPNsense's dhcp6c):
@@ -1484,10 +1484,10 @@ client6_recv()
switch(dh6->dh6_msgtype) {
case DH6_ADVERTISE:
- (void)client6_recvadvert(ifp, dh6, len, &optinfo);
+ client6_recvadvert(ifp, dh6, len, &optinfo);
break;
case DH6_REPLY:
- (void)client6_recvreply(ifp, dh6, len, &optinfo);
+ client6_recvreply(ifp, dh6, len, &optinfo);
Presume there is a reason not to add these?
I omitted the non-functional changes. I presume the person doing the patch neither cared about -Werror or 80 character line limits. 😉 The tabs were also off in some functional lines so the end result looks a bit different.
Also just for readability it would be better to do non-functional updates in separate commits.
To test:
# pkg add -f https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/misc/dhcp6c-20200512_1.txz
To revert:
# opnsense-revert dhcp6c
Thank you. Initial testing looks good. I see in the logs dhcp6c receiving an UnspecFail status code, and as a consequence reporting advertise contains no address/prefix
. So it resends the solicit. Sometimes the process is repeated, ie it receives another UnspecFail status code. Eventually it receives a correct advertise and then successfully requests the address and prefix.
I will continue to monitor, but it is looking positive. :)
Heads up that I am still seeing some odd behaviour with dhcp6c.
The patch does address the issue of dhcp6c wrongly responding to an UnspecFail as if it was an address/prefix advertise. It sends another solicit instead, until an address/prefix is actually received.
However, I have noticed on at least one occasion dhcp6c sending another release immediately after receiving an address/prefix. It then sends a request, but that request is ineffective because of the release. The end result is that no address/prefix is received despite repeated requests being made.
The odd behaviour is the sending of the release - it is not clear why that happens at that point in the process.
I will post some logs when I get a chance later, to show the sequence. I will also do some more testing to see whether this is consistent behaviour or happens only intermittently.
Attached are the promised logs: dhcplogs.txt
This shows a reload event on the WAN (Interfaces->Overview->WAN->Reload), from when dhcp6c is SIGHUPed. A release is sent, a reply to the release is received, and a solicit is then sent. The first response is an UnspecFail. So dhcp6c sends another solicit. This time the response is to advertise a GUA for the WAN and a prefix.
At this point it would be expected that dhcp6c would request the address and prefix. But first another release is sent, and replied to, before the request is made. Presumably because of the release, the request is not responded to, so it is repeated.
The logs then show another reload event. On this occasion, the sequence follows as expected - release, response, solicit, advertise, request, response. There is no UnspecFail on this occasion because the dhcpv6 server and proxy don't have a lease assigned to the client.
In the first reload event, there is a release both after the GUA/prefix is received, and also immediately after the UnspecFail is received (so: UnspecFail response, release (no response), solicit, response, release, response, request...).
I don't profess to know a lot about dhcp6c, but I wonder whether the two releases are related? Eg, is the second based off a timer after the first? Should the first be happening at all?
While not scientific, I did do some more testing to see if I could replicate the behaviour, by hitting reload on the WAN again. I couldn't replicate the behaviour. On the first occasion, the UnspecFail response was followed by a release being sent, but then the solicit, response and request sequence happened as expected. On the second occasion, the sequence was: UnspecFail, solicit, UnspecFail, solicit, response, request - which I would have thought is the expected behaviour with the patch.
My initial testing straight after installing the patch also did not exhibit the behaviour I just experienced today. So it is a bit of a mystery.
Sorry I can't shed more light.
BTW, an aside on the logs. I had to run OPNsense's logs through tail -r
to get them into reverse order, which I find more natural to read. Not sure if I am missing something, but is there no way of having the GUI display OPNsense's logs in reverse order (oldest to newest)? I thought I saw on some screenshots online of an earlier version a checkbox for exactly that, but it seems to have disappeared.
@Greelan use clog -f
for tail -f
or clog
for cat
behaviour. I haven't looked closer yet. We may be looking at an implementation limitation in dhcp6c as the HUP was added by us so the only way to reload prior was stopping and starting so the same lease will very likely not be acquired anyway.
I was hoping to get a sense of whether a wide-dhcpv6 patch that was introduced in the upstream in 2015 will make it into OPNsense.
The issue is this (as explained to me by my ISP). Whenever dhcp6c on OPNsense sends a release to my ISP, and then (as is usual) tries to send a solicit straight after the release to obtain a new lease, the ISP's BNG responds with a DHCPv6 UnspecFail advertise. Apparently, somewhere between the DHCPv6 proxy and DHCPv6 server running on the BNG, the release process has not been finalised (and so the lease is not considered available to the client again) when dhcp6c sends the new solicit, hence the UnspecFail response.
However, dhcp6c does not interpret this correctly and instead assumes the BNG is trying to advertise the prefix, and so dhcp6c sends a DHCPv6 request for a blank prefix back to the BNG. The BNG of course drops this as invalid. dhcp6c then eventually gives up trying to get a DHCPv6 lease, and accordingly no WAN GUA or IPv6 PD is received, until dhcp6c is reloaded or restarted.
The behaviour can be seen in these logs (anonymised):
The BNG's behaviour (my ISP uses Cisco routers) is actually RFC compliant, according to my ISP (and I understand Cisco, with whom my ISP has apparently been discussing the issue). Per RFC 3315 (https://tools.ietf.org/html/rfc3315#section-15):
Same for the updated RFC 8415 (https://tools.ietf.org/html/rfc8415#section-16):
The issue of wide-dhcpv6 / dhcp6c not responding correctly to UnspecFail advertises was, I am told, addressed in the upstream. My research suggests that occurred in 2015: https://sourceforge.net/p/wide-dhcpv6/bugs/34/. Specifically, item 4:
This patch was implemented in Debian in March 2018: https://metadata.ftp-master.debian.org/changelogs//main/w/wide-dhcpv6/wide-dhcpv6_20080615-23_changelog. Specifically, patch 0018:
What I understand to be the content of the patch is attached: 0018-dhcpv6-ignore-advertise-messages-with-none-of-reques.patch.txt
My ISP tells me that they believe Netcomm has implemented the patch, and I have seen comments from Ubiquiti on their forums that they plan to implement the patch, at least for their Edgerouter line.
But this patch does not appear to have made it into FreeBSD or OPNsense.
Curiously, some of the items addressed by the upstream patch in 2015 have been implemented - in FreeBSD in Jan 2016 (based on the descriptions in https://www.freshports.org/net/dhcp6 for version 20080615_4) and in OPNsense in December 2016 (based on the commit descriptions in https://github.com/opnsense/dhcp6c/commits/master). But not item 4.
Is there any prospect of this patch being implemented in OPNsense?