opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.38k stars 759 forks source link

no more traffic on uplink and strange way to recover #8098

Open bongoo1 opened 2 days ago

bongoo1 commented 2 days ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

issue was seen for the 1st time after update to: OPNsense 24.7.9_1-amd64 former version that was fine: i'm not sure what revision i had before (must have been from end of october 2024). this version was running very stable and i had no issues.

the story: about 10 days ago, i updated to OPNsense 24.7.9_1-amd64 and the nightmare began (about 12h after the update). there was suddenly no more internet traffic. after 1st searchig for some provider issues, i finally found that rebooting OPNsense solved the issue. then after a few hours, the same again. the dashboard does not show anything special, besides that there is no traffic on the uplink. i did several tests: they all indicate an issue with the uplink of OPNsense besides that: other internet traffic in my network, which does not go through OPNsense, works fine

1st diagnosis: hardware issue i added an additional network interface to OPNsese (over usb) and configured this for the uplink. after that, traffic was fine now for about 5 days. so this really seemed to confirm that it was a hardware issue.

the nightmare started again after about 5 days: suddenly there is no more traffic on the uplink again. this showed up now 6 or 7 times within the last 24h.

when it happened again for the 1st time, i've seen that unboundDNS was down and i restarted it. after doing so, DHCPv4 server became red and i also restarted this, and everything was fine for about 2 hours.

but for the next 5 or 6 times when the uplink failed, the dashboard never showed anything special (besides no traffic on the uplink). i then tried to do some checks and diagnostics, only confirming that the uplink was down. while doing so, it happened each time that OPNsense suddenly worked again. so i 1st thought that it automatically recovers after some time. so i did not touch anything for more than 1 hour when this happened the next time, but no recovery then. i also tried to unplug/plug the ethernet cable of the uplink. but with no effect. still no flickering of the traffic led.

latest, quite special finding: when OPNsense fails and i go to "OPNsenseIP"/ui/interfaces/overview, i see that the uplink is down. then after about 10 seconds, i do a reload of exactly the same page, and the uplink is up and everything is working fine again. the 2nd time i did the above, the uplink did not work for more than 1 hour before, and just goin to "OPNsenseIP"/ui/interfaces/overview and then doing a reload of the page got the uplink up again. this way to recover is some kind of reproducible. i did this not 3 times and it always helped so far.

so this makes me no longer believe that this is a hardware issue. it really looks like something's wrong with the firewall software when handling the interface.

To Reproduce

Steps to reproduce the behavior: there is no way to actively reproduce the issue. the issue happens suddenly. it may happen once an hour, and then does not pop up for several days. once the issue is there, resolving can be done by logging in to OPNsense and going to "OPNsenseIP"/ui/interfaces/ and then reload the page, and the issue is gone (this helped 3 times so far)

Expected behavior

the issue should not happen at all

Describe alternatives you considered

replacing the uplink NIC seemed to help 1st, but after 5 days, the issue is back.

Screenshots

grafik

the 1st screenshot shows the interface once the issue appears, and the 2nd shot is taken a few seconds later. nothing else has been done in between. just a reload.

Relevant log files

if any further infos are required, please let me know how i can get them from OPNsense.

Additional context

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 24.7.9_1-amd64 ASRock J3455M mainboard i can't find the infos on the NIC within OPNsense

fichtner commented 2 days ago

besides that: other internet traffic in my network, which does not go through OPNsense, works fine

This seems weird to me. It raises all sorts of design worries and issues it could be having.

What’s the actual issue? Link down? Which driver, which hardware? DHCP on WAN (if even)? IPv4 or IPv6 or both? DNS? ARP issue? Etc…

fichtner commented 2 days ago

And re0 and friends would indicate it could be the hardware or FreeBSD driver issue. Try the os-realtek-re plugin and/or try removing WAN to an interface that is not “re” if available on the hardware.

bongoo1 commented 2 days ago

i'm no networking specialist (but i think i know the basics) and i'm not very experienced with linux. so i unfortunately need a step by step instruction on what i need to do.

the actual issue is, that the uplink frequently goes down, and the measure to get it up again is to go to "OPNsenseIP"/ui/interfaces/overview twice. then it goes up again. this started with the latest update of OPNsense.

how can i request driver and hardware data from within OPNsense?

on the WAN, i use fixed IP and i only use IPv4 on all interfaces. as of my understanding, i use unboundDNS with DNS forwarding as configured in system/settings/general

the WAN uplink was running on re0 for a few years now, but as the issues popped up, i added ue0, which i then configured for WAN uplink. so the uplink is actually not on a "re".

how shall i proceed? how can i grab more data that allows to find the reason?

fichtner commented 2 days ago

ue0 is much less reliable as the USB will detach for whatever reasons at any point in time internally. The recovery doesn’t work.

Cheers, Franco

bongoo1 commented 2 days ago

that much is clear. therefore i used the onboard ethernet for the WAN uplink and all pcie slots are used for the 3 LAN. after having issues with the uplink, i added the USB adapter to have an additional port, which i then used to replace the onboard ethernet. with this solution, the issues were gone for about 5 days, but now i have the same issues i had before when using re0. so what do you suggest?

fichtner commented 2 days ago

I did mention os-realtek-re plugin.

bongoo1 commented 2 days ago

i'm not aware that i ever installed something like that manually, but could be that i did something a few years ago, but the plugin page says: grafik

fichtner commented 2 days ago

Ok then perhaps with the stock FreeBSD driver it's different (given it's a 1G, not 2.5G or 5G Realtek in which case you lose connectivity).

Suffice to say your hardware is suboptimal in FreeBSD. You would have more luck with OpenWRT or something else based on Linux doing the job for you.

Cheers, Franco

bongoo1 commented 2 days ago

i assume that all my realteks are 1G. so you think i should uninstall the realtek plugin?

why is my hardware suboptimal for FreeBSD? isn't OpenWRT more optimized for commercial routers, while my hardware is common PC hardware. i really like OPNsense (it's much better than the windows based solution i had before), and i would like to keep it. so what should i change on my hardware to no longer be suboptimal?

Monviech commented 2 days ago

You could either install a hypervisor on it e.g. a linux or windows based one that has better driver support.

Or you get an intel network card, or swap your hardware completely (there are a lot of threads on the forum that discuss hardware).

bongoo1 commented 1 day ago

if i would replace the network card with intel based ones. would any of those chips be recommended? Intel i226 i82576 i350t2v2blk