opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.19k stars 715 forks source link

Internet down after 1-2 minutes, system routing #6338

Closed ghost closed 1 year ago

ghost commented 1 year ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

22.7.11 was the last version where i didn't have any problem. Every morning when I turn on the firewall for a few minutes 1-2 everything goes fine, then there is no internet everywhere, I need to enter the webgui and restart the "routing System routing" service

Tip: to validate your setup was working with the previous version, use opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert)

It didn't solve the problem. https://prnt.sc/R2NlF-xdJYRG

To Reproduce

Steps to reproduce the behavior:

  1. Turn on firewall
  2. Wait 1-2 minute
  3. Internet go down

Expected behavior

That everything is fine, as in the previous version

Describe alternatives you considered

I reinstalled the system, I avoided putting back the backup file and I setup the whole system by hand. I removed dnscrypt, I removed unbound, I removed gateway pinger. I installed realtek driver, nothing has changed

Screenshots

https://prnt.sc/Ke1asVE5I8pP , Everything seems to be fine, but the browsers give me DNS_PROBE_ERR, the computer image at the bottom right disappears in favor of the one indicating no internet, until I restart "system routing".

Relevant log files

If applicable, information from log files supporting your claim.

Nothing appears in logs, audit, backend, general, boot, webui

Additional context

Add any other context about the problem here.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 23.1.1-amd64 Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz (2 cores, 4 threads) I don't have any information about the network card, but it worked until the previous version.

ghost commented 1 year ago

I upgraded with pkg update -f and pkg upgrade -d and pkg upgrade -f. I got the string fix from OPNsense 23.1.1-amd64 to OPNsense 23.1.1_2-amd64.

However, the problem has not been solved

AdSchellevis commented 1 year ago

relevant information would be:

ghost commented 1 year ago
  • what type of (wan) connectivity is used, does the wan interface have an address at all (Interfaces: Overview)

I use DHCP, i have a modem that work in bridge mode because in my home i have a VDSL2. The informations about the overview: https://prnt.sc/1o7WdqjLQfHn

  • any events in the system log (System: Log Files: General) at the moment the connection drops.

https://prnt.sc/LvZVOu7fhfwS This is the general log, i did a poweroff at 13:58 and power on at 14:00, i obtain the problem at 14:07. Next i restart system routing for upload screen and in log obtain this https://prnt.sc/Jwbd19EFIb2Q.

  • Is there a default route set (System: Routes: Status) after the failure and is it pointing to the correct next hop?

https://prnt.sc/87NRzL07CF4y during the failure https://prnt.sc/qtw_D8BDFQBF after restart service routing It seems that the only item, which I guess is the heart of the problem, that changes is the following and I don't have the faintest idea why. https://prnt.sc/wDC8C4xumGc1

AdSchellevis commented 1 year ago

sounds similar to https://forum.opnsense.org/index.php?topic=32347.msg157402#msg157402

when the gateway is dropped, can you check if /usr/local/etc/rc.routing_configure restores normal operation?

ghost commented 1 year ago

sounds similar to https://forum.opnsense.org/index.php?topic=32347.msg157402#msg157402

when the gateway is dropped, can you check if /usr/local/etc/rc.routing_configure restores normal operation?

https://prnt.sc/B1pEjTlpqz4T

same situation, when you see route not found it is during the down

AdSchellevis commented 1 year ago

but does executing /usr/local/etc/rc.routing_configure restore the default route in that case?

ghost commented 1 year ago

but does executing /usr/local/etc/rc.routing_configure restore the default route in that case?

Sorry, i didn't understand, https://prnt.sc/MZQWn9MOrCcL, yes, restore it

AdSchellevis commented 1 year ago

@Threefish4096 as a workaround, can you try to install https://github.com/opnsense/core/commit/2be7d9b92e76775bd97ef532150b756bf4b79075 using the command below?

opnsense-patch 2be7d9b

This is merely a workaround, we still need to figure out why the default route is dropped as at this point it should still be there after receiving the same address from the server. Assigning myself and @fichtner to the ticket.

EDIT changed commit

ghost commented 1 year ago

I tried, but the problem reoccurs

https://prnt.sc/z2ewfxGuWU4Q

LOG: https://prnt.sc/_f7Pvp9hMrqm

fichtner commented 1 year ago

Sorry, the log makes no sense to me. It goes into the error condition, doesn't recover, goes into correct reconfiguration but also doesn't do anything? It's a bit hard to get a structure here..

ghost commented 1 year ago

https://prnt.sc/foAuj_5eosrT

This morning I got up and turned everything on, while I was preparing to leave the house I noticed that the internet was not working again. So I restarted it, and the entry should be around 08:18. After that I came back at 12:15 and the internet on windows 11 was giving the world and not loading any pages.( image ) I restarted system routing and it worked again. since then I've been uploading things and I'm not having any more problems.

Just to recount the events since yesterday I applied the patch, I don't know if it will do any good.

Thank you very much for the attention!

fichtner commented 1 year ago

I think your lease times on the WAN side are pretty low so that it constantly "breaks". For the time being we have enough information to try and reproduce. As far as the patch goes let's not try to confirm if it is working or not as it's not the exact solution anyway.

ghost commented 1 year ago

I think your lease times on the WAN side are pretty low so that it constantly "breaks". For the time being we have enough information to try and reproduce. As far as the patch goes let's not try to confirm if it is working or not as it's not the exact solution anyway.

Thanks so much again, I look forward to a patch. Tag me when I have to try a new "opnsense-patch a1b2c3".

fichtner commented 1 year ago

Linking forum post for reference: https://forum.opnsense.org/index.php?topic=32347.0

rudiservo commented 1 year ago

Dum question, do you have the WAN Gateway checked has upstream? I am losing connection every time the ISP renews the DHCP, even if the IP does not change (it's my case) I would loose access to the internet (maybe default gateway) only regain when I restart the routing service. IPV6 connectivity was always working. When I checked the Gateway the upstream was not set, so I marked it has upstream, so far so good 24hours later.

Hopefully this does not have anything to do with Suricata.

AdSchellevis commented 1 year ago

@rudiservo best check the fix proposed by Franco in the forum https://forum.opnsense.org/index.php?topic=32347.msg157675#msg157675 , this is highly likely the cause of the issue.

ghost commented 1 year ago

Dum question, do you have the WAN Gateway checked has upstream?

Now that i reinstalled, no and it go...

even if the IP does not change (it's my case)

Me too, i have static ip.

only regain when I restart the routing service

Me too

IPV6 connectivity was always working

My ISP don't use it

Hopefully this does not have anything to do with Suricata

With Suricata i have other problems, go on WAN and on LAN it has like drops

Anyway, I tried to install IpFire yesterday, searching on the internet, it does the job, but I had some configuration problems, so today I reinstalled OPNSense and everything is fine... I tried to recreate the problem, but I can't, I can't does this make sense.

ghost commented 1 year ago

Screenshot_1

Now i recogniz it! After I enabled suricata and downloaded all the rules, I tried to reproduce the problem again and now it happens again.

Without Suricata on LAN, all go..

ghost commented 1 year ago

The only differences from previous installation attempts are as follows: photo_2023-02-22_19-50-17

1G of swap, before i have 0G. Mirror swap yes, before no. Encrypt swap, before no.

photo_2023-02-22_19-50-21

For install i used "other modes" and.. photo_2023-02-22_19-50-25

I hope they can be of some use. Good evening!

rudiservo commented 1 year ago

ok try 2 things, check default gateway from WAN as upstream gateway, check if suricata is in promiscuous mode.

rudiservo commented 1 year ago

@AdSchellevis you're right, the workaround seems to be the fix for now, is it going to on a patch this week?

fichtner commented 1 year ago

Debug output from forum:

2023-02-23T06:24:26 Notice  opnsense    /usr/local/etc/rc.newwanip: ROUTING: setting IPv4 default route to 81.xxx.xx.1  
2023-02-23T06:24:26 Notice  opnsense    /usr/local/etc/rc.newwanip: ROUTING: IPv4 default gateway set to wan    
2023-02-23T06:24:26 Notice  opnsense    /usr/local/etc/rc.newwanip: ROUTING: entering configure using 'wan' 
2023-02-23T06:24:26 Notice  opnsense    /usr/local/etc/rc.newwanip: No IP change detected for WAN[wan]  
2023-02-23T06:24:26 Notice  dhclient    Creating resolv.conf    
2023-02-23T06:24:26 Notice  dhclient    New Routers (vtnet2): 81.xxx.xx.1   
2023-02-23T06:24:26 Notice  dhclient    New Broadcast Address (vtnet2): 81.xxx.xx.255   
2023-02-23T06:24:26 Notice  dhclient    New Subnet Mask (vtnet2): 255.255.255.0 
2023-02-23T06:24:26 Notice  dhclient    New IP Address (vtnet2): 81.xxx.xx.x29  
2023-02-23T06:24:26 Notice  dhclient    DEBUG calling add_new_address/add_new_routes    
2023-02-23T06:24:26 Notice  dhclient    DEBUG alias_ip_address: 
2023-02-23T06:24:26 Notice  dhclient    DEBUG new_ip_address: 81.xxx.xx.x29 
2023-02-23T06:24:26 Notice  dhclient    DEBUG old_ip_address: 81.xxx.xx.x29 
2023-02-23T06:24:26 Notice  dhclient    DEBUG entering with BOUND   
2023-02-23T05:24:07 Error   dhclient    send_packet: No route to host

It's a bit strange: we are doing BOUND but with old and new address, don't flush the old one which means adding an IP address that is already there scrubs the route???? Need to verify....

fichtner commented 1 year ago

yes, the default route disappears when you add the existing address via ifconfig again and it won't even complain about it :/

ghost commented 1 year ago

ok try 2 things, check default gateway from WAN as upstream gateway, check if suricata is in promiscuous mode.

My actual Gateway, https://prnt.sc/LXcWPiI4gt1- , In the past I had the far and upstream active, these in the screen are the ones I have by default after yesterday's reinstall, I only removed "Disable Gateway Monitoring"

This morning I turned on the firewall again and the problem happened again… #anger

ornative commented 1 year ago

After upgrading today to 23.1.1_2-amd64, from machines on one of my VLANS, I can get DNS resolution but cannot connect to sites. Windows 11 tells me there is no internet connection, but I can ping the Comcast gateway and get DNS.

Wanted to add this, will probably have to reinstall an older release at this point as I can't be down for more than an hour before I start having automation issues. If there is a patch that would be helpful as at this point I am writing this connected to a hotspot through AT&T.

rudiservo commented 1 year ago

@ornative for a workaround check https://github.com/opnsense/core/issues/6338#issuecomment-1440050582 response, just edit /usr/local/etc/rc.newwanip until there is a fix, or just wait for a patch.

fichtner commented 1 year ago

Commits have been added for 23.1.2 and confirmed in the forum.