opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.33k stars 747 forks source link

OPNsense 22.1.7 unavailable occasionally #5782

Closed thwien closed 2 years ago

thwien commented 2 years ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Since automatic upgrade to 22.1.7_1 with the same plugins and configuration my OPNsense firewall is unavailable on WAN and LAN interface occasionally. Before <= 22.1.6 the firewall works over month without any interruption. Occasionally means after 30 minutes or sometimes after 5 hours uptime the firewall cannot route to IPv4 and IPv6 standard gateway anymore. In the case of unavailability the routing, default routes, interface states, firewall rules, ip addresses and so on seems all to be correct. I can dump incoming network traffic (tcp, icmp and OSPF multicast) over WAN and LAN interface but OPNsense cannot reply. If I ping to external ip addresses i get "sendto: no route to host".

To Reproduce

n/a

Expected behavior

No interruption of availability. Routing over IPv4 and IPv6 default routes.

Describe alternatives you considered

Downgrade to 22.1.6

Screenshots

n/a

Relevant log files

Unfortunately I spent many hours in looking into log files but cannot find any hint to the reason of this behavior. I have no clue what could be the cause nor if it is a core or plugin issue.

Additional context

LAN is configured as VLAN interface on WAN interface. The firewall acts as Nginx reverse proxy, Suricata IPS, Mailtrail, GeoIP blocking, NtopNG, Rspamd, OSPF routing, Zabbix proxy and Tinc VPN.

Following plugins are installed and used:

os-acme-client os-clamav os-frr os-maltrail os-nginx os-ntopng os-redis os-rspamd os-smart os-tinc os-zabbix-agent os-zabbix5-proxy

Environment

OPNsense 22.1.7_1 (amd64/OpenSSL) on ZFS Mirror as a single firewall Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (4 cores, 8 threads) Onboard-LAN on MSI MS-7816

Thanks a lot for any suggestions.

OPNsense-bot commented 2 years ago

Thank you for creating an issue. Since the ticket doesn't seem to be using one of our templates, we're marking this issue as low priority until further notice.

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

The easiest option to gain traction is to close this ticket and open a new one using one of our templates.

thwien commented 2 years ago

Because of os-frr was updated in OPNsense 22.1.7 and this plugin is routing related I tried downgrading os-frr using opnsense-revert but without success.

mimugmail commented 2 years ago

Did you try to revert only frr7?

thwien commented 2 years ago

No, I reverted os-frr only because it was updated for OPNsense 22.1.7. frr7-7.5.1_3 was updated in March this year and didn't cause any trouble since months. Meanwhile I reverted all recently updated plugins (os-zabbix-agent, os-zabbix5-proxy, os-nginx, os-frr, opnsense) step by step but the problem still exists.

mimugmail commented 2 years ago

No, this is the update of the package itself.

opnsense-revert -r 22.1.6 frr7

thwien commented 2 years ago

Okay. I reverted frr7 only. But it seems to stay at the same version, isn't it? What about the error message while reverting frr7? Should I revert os-frr, too?

# pkg search frr
frr7-7.5.1_3                   IP routing protocol suite including BGP, IS-IS, OSPF and RIP
os-frr-1.28                    The FRRouting Protocol Suite
os-frr-devel-1.28              The FRRouting Protocol Suite
# opnsense-revert -r 22.1.6 frr7
Fetching frr7.txz: .... done
Verifying signature with trusted certificate pkg.opnsense.org.20210903... done
frr7-7.5.1_3: already unlocked
Updating OPNsense repository catalogue...
OPNsense repository is up to date.
All repositories are up to date.
Checking integrity... done (0 conflicting)
The following 1 package(s) will be affected (of 0 checked):

New packages to be INSTALLED:
    frr7: 7.5.1_3

Number of packages to be installed: 1

The process will require 12 MiB more space.
[1/1] Installing frr7-7.5.1_3...
===> Creating groups.
Using existing group 'frr'.
Using existing group 'frrvty'.
===> Creating users
Using existing user 'frr'.
Extracting frr7-7.5.1_3: 100%
=====
Message from frr7-7.5.1_3:

--
Beware that remote control of frr7 daemons over TCP sockets is enabled by
default.
Use daemon flags in /etc/rc.conf to disable it if unneeded, for example:
zebra_flags="-P0"
ospfd_flags="-P0"

FRR's OSPF daemons tries to allocate big socket buffer, so generate warning
messages like:
"setsockopt_so_sendbuf: fd 6: SO_SNDBUF set to 1048576 (requested 8388608)"
To prevent such message kern.ipc.maxsockbuf can be increased:
sysctl kern.ipc.maxsockbuf=16777216

Error message "ifam_read() doesn't read all socket data" is under investigation
# pkg search frr
frr7-7.5.1_3                   IP routing protocol suite including BGP, IS-IS, OSPF and RIP
os-frr-1.28                    The FRRouting Protocol Suite
os-frr-devel-1.28              The FRRouting Protocol Suite
thwien commented 2 years ago

I watched the system continuously today and I noticed the first time "re0: watchdog timeout" on CLI. I guess it could be related to network hardware/driver. I installed os-realtek-re and put all packages back to the latest version. I will let you know if my system is now stable.

thwien commented 2 years ago

It seems installing os-realtek-re was the right solution. Since then my firewall works stable. Sorry for wasting your time. It is not depending to OPNsense core or plugins but to FreeBSD, which unfortunately ships a buggy driver for Realtek cards?!?