opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.07k stars 700 forks source link

DHCP lease expiration drops connection #7514

Closed Run1e closed 2 weeks ago

Run1e commented 2 weeks ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

The DHCP lease expires every 10 minutes (600s) which causes the default gateway to drop for 3-5 seconds.

I am having the exact same issue described in this issue. In fact by looking at some of the addresses in that issue, we have the same exact Norwegian ISP.

To Reproduce

Assuming this is caused by non-standard behaviour of my ISP's DHCP server, this probably has to be reproduced by me.

Expected behavior

The default gateway to not disappear when the ISP's DHCP lease expires every 10 minutes.

Describe alternatives you considered

I tried the DHCP Option Modifiers described in the other issue without much luck (assuming I did it right, anyway)

supersede dhcp-server-identifier *.*.*.1,supersede dhcp-lease-time 120

Relevant log files

This general block of logs will occur every 10 minutes:

2024-06-06T07:01:45 Notice  dhclient    dhclient-script: Reason EXPIRE on igb0 executing
2024-06-06T07:01:45 Notice  dhclient    dhclient-script: Reason PREINIT on igb0 executing   
2024-06-06T07:01:46 Notice  dhclient    dhclient-script: Reason ARPSEND on igb0 executing   
2024-06-06T07:01:48 Notice  dhclient    dhclient-script: Reason ARPCHECK on igb0 executing  
2024-06-06T07:01:48 Notice  dhclient    dhclient-script: Reason BOUND on igb0 executing 
2024-06-06T07:01:48 Notice  dhclient    dhclient-script: New Hostname (igb0): router    
2024-06-06T07:01:48 Notice  dhclient    dhclient-script: New IP Address (igb0): *.*.*.112   
2024-06-06T07:01:48 Notice  dhclient    dhclient-script: New Subnet Mask (igb0): 255.255.255.0  
2024-06-06T07:01:48 Notice  dhclient    dhclient-script: New Broadcast Address (igb0): *.*.*.255    
2024-06-06T07:01:48 Notice  dhclient    dhclient-script: New Routers (igb0): *.*.*.1    
2024-06-06T07:01:48 Notice  dhclient    dhclient-script: Creating resolv.conf   
2024-06-06T07:01:48 Notice  opnsense    /usr/local/etc/rc.newwanip: IP renewal starting (new: *.*.*.112, old: *.*.*.112, interface: wan, device: igb0, force: yes)  
2024-06-06T07:01:48 Notice  opnsense    /usr/local/etc/rc.newwanip: ROUTING: entering configure using 'wan' 
2024-06-06T07:01:48 Notice  opnsense    /usr/local/etc/rc.newwanip: ROUTING: setting inet interface route to *.*.*.1 via igb0   
2024-06-06T07:01:48 Notice  opnsense    /usr/local/etc/rc.newwanip: ROUTING: configuring inet default gateway on wan    
2024-06-06T07:01:48 Notice  opnsense    /usr/local/etc/rc.newwanip: ROUTING: setting inet default route to *.*.*.1  
2024-06-06T07:01:48 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure monitor (,WAN_DHCP)   
2024-06-06T07:01:48 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure monitor (execute task : dpinger_configure_do(,WAN_DHCP))  
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure vpn (,wan)    
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure vpn (execute task : ipsec_configure_do(,wan)) 
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure vpn (execute task : openvpn_configure_do(,wan))   
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: Resyncing OpenVPN instances for interface WAN.  
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure vpn (execute task : wireguard_configure_do(,wan)) 
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (,wan)   
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : dnsmasq_configure_do())  
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : ntpd_configure_do()) 
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : opendns_configure_do())  
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : openssh_configure_do(,wan))  
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : unbound_configure_do(,wan))  
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : vxlan_configure_do())    
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : webgui_configure_do(,wan))   
2024-06-06T07:01:49 Notice  opnsense    /usr/local/etc/rc.newwanip: plugins_configure newwanip (execute task : wireguard_sync(,wan))    

Additional context

My assumption that my ISP has some weird DHCP behaviour, is caused by the fact that we switched ISP here a couple of months ago and did not have that issue then. After the switch to our current ISP, this issue started happening while running pfSense 2.6.0. Currently running opnsense 24.1.7_4.

Environment

Type opnsense
Version 24.1.7_4
Architecture amd64
Commit 4cae8a160

Run1e commented 2 weeks ago

I couldn't find the dhclient logs I needed, so I ran a pcap to figure out what was actually responding to my DHCP request (obvious in hindsight, but it was not my default gateway lol) and now the Option Modifiers seemingly work which causes a RENEW to fire every 60 seconds. That makes the 600s EXPIRE event not happen, which was causing the gateway to drop for a few seconds in the first place.

To me this feels like a very flaky hack to solve my issue. My DHCP protocol knowledge is slightly lacking in this case so it's hard for me to say exactly what is wrong with my ISP's DHCP and whether opnsense should/could realistically handle it. I would love to share pcaps/logs and help debug if it's of interest though.

The deepest this issue has been dug into seems to be this comment on the previous issue: https://github.com/opnsense/core/issues/6869#issuecomment-1732534822

Not sure where he's seeing the DHCPREQUEST/DHCPOFFER logs from dhclient, but if I could find that I could at least start by confirming I'm having the same issue as him.

Run1e commented 2 weeks ago

Actually, looking at a pcap I can see that opnsense is sending out DHCP requests prior to the lease expiring.

Run1e commented 2 weeks ago

The DHCP server gives me a renewal time of 150s, rebinding time of 300s, and a Server Identifier (option 54) that is different from the DHCP servers' actual IP.

After 150s, opnsense attempts to renew the lease by sending DHCP requests to the Server Identifier it was given, but does not get a reply.

After 300s, opnsense starts broadcasting DHCP requests but does still not get any replies from any DHCP servers.

After 600s the lease expires, and opnsense sends a DHCP discover with source ip 0.0.0.0, and then gets a DHCP offer from the DHCP server. Opnsense then sends a DHCP request (again from source ip 0.0.0.0) and then gets an ACK, at which point we successfully have a lease again. During this last section, opnsense seems to freak out and the gateway goes down for a few seconds.

The only difference I can see from the DHCP requests after T2 and the one happening after 600s is that the source IP is different - that being at first it's the leased IP currently in use, and after the lease expires its 0.0.0.0.

Not sure if the DHCP server is messing up by giving out a Server Identifier which does not reply to DHCP requests, or if the issue is that the DHCP server/relay only handles DHCP requests when sent with source ip 0.0.0.0? Or both. Or neither.

Run1e commented 2 weeks ago

Adding the Option Modifiers previously mentioned:

supersede dhcp-server-identifier ip.of.actual.dhcpserver,supersede dhcp-lease-time 120

results in a DHCP request being sent to the actual ip of the DHCP server every minute (as opposed to the Server Identifier which does not respond, or broadcast to 255.255.255.255 with current lease which does not make any DHCP server respond either), which circumvents this entire issue.

fichtner commented 2 weeks ago

I don't have time to delve into this in community support but it sounds like the unicast doesn't go to the server or it is being ignored / dropped / lost. I know of a multi-WAN issue where dhclient does not bind to the interface and the unicast goes out the wrong line but I don't see any mention of multi-WAN here.

I don't think you can do much from your end if the request goes to the wire but is ultimately ignored. That's just a violation of the protocol if no answer is received and the rest appears to fall back as designed, not gracefully but the best it could under the circumstances.

Run1e commented 2 weeks ago

Yeah, that was more or less my conclusion too. Thanks for your time!