opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.31k stars 738 forks source link

IPS: Suricata Complete network "freeze" #4257

Closed rudiservo closed 3 years ago

rudiservo commented 4 years ago

I have Suricata working from the previous version without a problem, but on 20.7 it suddenly it crashes and bogs down the internet and all it's access. I have to restart both OpnSense dedicated box and the ISP router (that is on bridge mode).

Suricata is bound to WAN only.

I had to turn off suricata and it's stable for now, but I would rather have it on... is this anywhere related with netmap?

WAN interface (wan, igb0) Intel Gigabit. Rest of interfaces are Vlans on a 10Gbase Intel (ix0)

OPNsense 20.7-amd64 FreeBSD 12.1-RELEASE-p7-HBSD OpenSSL 1.1.1g 21 Apr 2020

AdSchellevis commented 4 years ago

did you turn off hardware vlan offloading? (https://docs.opnsense.org/troubleshooting/network.html)

rudiservo commented 4 years ago

@AdSchellevis Vlan Hardware Filtering actually no it was the only thing enabled... but WAN does not have a VLAN does that matter and should it affect? Going to try with it disabled but if you know why it affects Suricata and/or netmap please explain, I am a curious being!

AdSchellevis commented 4 years ago

yes, netmap doesn't work when it's enabled on the interface using it. in the previous version it didn't matter, which might indicate that the hardware offloading just didn't work in the driver.

rudiservo commented 4 years ago

Right... so it might bog down the interfaces and make it crash, completely unreachable?

AdSchellevis commented 4 years ago

it will start working the moment you stop suricata... it's not dead, just capturing packets and sending them to nowhere...

rudiservo commented 4 years ago

Ohh... that expains alot... thanks

tomatotoast commented 4 years ago

I experience similar problems with Suricata. igb NIC (Intel i340-T4) with LACP Parent and VLANs connected. The traffic from and to the VLANs is very slow. Interface filtering is disabled. Using suricata the system rendered unusable.

From what i have seen there is nothing in the logs.

EDIT: Suricata was running on the Vlan interfaces on the LAGG. This Setup was running with 20.1 Switched to physical interfaces now. Traffic seems to pass again. I will continue some testing.

OliverO2 commented 4 years ago

@AdSchellevis As people still run into this issue, I had been asked by a forum member a while ago to propose a usability improvement: https://forum.opnsense.org/index.php?topic=13600.msg62833#msg62833

Is this something you'd like to pursue? Anything I could do to help here?

AdSchellevis commented 4 years ago

@OliverO2 we changed the defaults and extended the docs https://docs.opnsense.org/troubleshooting/network.html, I'm not sure which improvement your aiming at, but since the underlaying issue is hardware / driver related there is probably not a lot we can do other then explain how options are supposed to work (provided your selected hardware supports the feature).

OliverO2 commented 4 years ago

The updated troubleshooting section is helpful. Yet it seems that people do not look there before changing settings as the forum thread and this issue shows. The intrusion detection's help settings for "IPS mode" still have this unclear wording:

Before enabling, please disable all hardware offloading first in advanced network.

This language misses the "VLAN hardware filtering" setting and refers to a section "advanced network" which is not there.

So my proposal in the above forum post was to change the "IPS mode" help text and the affected "hardware offloading" setting labels to

I agree that explanation is all that can be done.

AdSchellevis commented 4 years ago

@OliverO2 sure, feel free to open a PR for the help text, there's certainly room for improvement there. (options have shifted over time, in the past these where in the advanced networking section)

rudiservo commented 3 years ago

Sorry to bump again on this, but just spent over 4 hours trying to get suricata to work.

So after a number of starts and restart of suricata, both with promiscuous mode on and off, ips on and off, somehow the OPNSENSE test rule actually worked... ONCE and then I lost all connectivity when trying to add abuse.ch rules... and I cant get it working again... I got dedicated WAN igb (Intel dual nic Gbit pciexpess), 10 vlans for inside with 10Gbps Intel X520, realtek motherboard nic not in use, zerotier, openvpn, wireguard, nginx. ALL hardware offloading and vlan is disabled has instructed.

log from the connectivity lost.

` time Process line
2021-02-14T19:01:09 opnsense[21213] /usr/local/etc/rc.newwanip: Aborted IPv4 detection: no address for igb0  
2021-02-14T19:01:09 opnsense[21213] plugins_configure newwanip (execute task : dyndns_configure_do(,wan))  
2021-02-14T19:01:09 kernel pflog0: promiscuous mode enabled  
2021-02-14T19:01:09 kernel pflog0: promiscuous mode disabled  
2021-02-14T19:01:09 opnsense[21213] plugins_configure newwanip (,wan)  
2021-02-14T19:01:08 opnsense[29602] plugins_configure dns (execute task : unbound_configure_do())  
2021-02-14T19:01:08 opnsense[29602] plugins_configure dns (execute task : dnsmasq_configure_do())  
2021-02-14T19:01:08 opnsense[29602] plugins_configure dns ()  
2021-02-14T19:01:08 opnsense[65082] /usr/local/etc/rc.newwanip: Interface '' is disabled or empty, nothing to do.  
2021-02-14T19:01:08 opnsense[65082] /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'ovpns3'  
2021-02-14T19:01:08 opnsense[29602] plugins_configure dhcp (execute task : dhcpd_dhcp_configure())  
2021-02-14T19:01:08 opnsense[29602] plugins_configure dhcp ()  
2021-02-14T19:01:08 opnsense[21213] /usr/local/etc/rc.newwanip: OpenVPN server 3 instance started on PID 40978.  
2021-02-14T19:01:08 kernel ovpns3: link state changed to UP  
2021-02-14T19:01:08 kernel pflog0: promiscuous mode enabled  
2021-02-14T19:01:08 kernel pflog0: promiscuous mode disabled  
2021-02-14T19:01:07 kernel ovpns3: link state changed to DOWN  
2021-02-14T19:01:07 kernel pflog0: promiscuous mode enabled  
2021-02-14T19:01:07 kernel pflog0: promiscuous mode disabled  
2021-02-14T19:01:06 opnsense[27396] /usr/local/etc/rc.newwanip: Interface 'opt8' is disabled or empty, nothing to do.  
2021-02-14T19:01:06 opnsense[27396] /usr/local/etc/rc.newwanip: IPv4 renewal is starting on 'ovpns2'  
2021-02-14T19:01:06 kernel ovpns2: link state changed to UP  
2021-02-14T19:01:06 opnsense[21213] /usr/local/etc/rc.newwanip: OpenVPN server 2 instance started on PID 61641.  
2021-02-14T19:01:06 kernel pflog0: promiscuous mode enabled  
2021-02-14T19:01:06 kernel pflog0: promiscuous mode disabled  
2021-02-14T19:01:06 opnsense[29602] plugins_configure ipsec (execute task : ipsec_configure_do(,wan))  
2021-02-14T19:01:06 opnsense[29602] plugins_configure ipsec (,wan)  
2021-02-14T19:01:06 opnsense[29602] /usr/local/etc/rc.linkup: ROUTING: skipping IPv4 default route  
2021-02-14T19:01:06 opnsense[29602] /usr/local/etc/rc.linkup: ROUTING: IPv4 default gateway set to opt7  
2021-02-14T19:01:06 opnsense[29602] /usr/local/etc/rc.linkup: ROUTING: entering configure using 'wan'  
2021-02-14T19:01:06 opnsense[29602] /usr/local/etc/rc.linkup: The command '/sbin/dhclient -c '/var/etc/dhclient_wan.conf' -p '/var/run/dhclient.igb0.pid' 'igb0'' returned exit code '1', the output was 'dhclient already running, pid: 44581. exiting.'  
2021-02-14T19:01:06 dhclient[62951] exiting.  
2021-02-14T19:01:06 dhclient[62951] dhclient already running, pid: 44581.  
2021-02-14T19:01:06 opnsense[29602] /usr/local/etc/rc.linkup: HOTPLUG: Configuring interface wan  
2021-02-14T19:01:06 opnsense[29602] /usr/local/etc/rc.linkup: DEVD Ethernet attached event for wan  
2021-02-14T19:01:06 kernel pflog0: promiscuous mode enabled  
2021-02-14T19:01:06 kernel pflog0: promiscuous mode disabled  
2021-02-14T19:01:05 kernel ovpns2: link state changed to DOWN  
2021-02-14T19:01:05 opnsense[35587]    
2021-02-14T19:01:05 opnsense[35587] /usr/local/etc/rc.filter_configure: There were error(s) loading the rules: /tmp/rules.debug:320: no routing address with matching address family found. - The line in question reads [320]: pass out log route-to ( igb0 ) from {igb0} to {!(igb0:network)} keep state allow-opts label "cd93fefa18691a23a58dfb8426bd1580" # let out anything from firewall host itself (force gw)  
2021-02-14T19:01:04 opnsense[94418] /usr/local/etc/rc.linkup: Clearing states for stale wan route on igb0  
2021-02-14T19:01:04 opnsense[94418] /usr/local/etc/rc.linkup: DEVD Ethernet detached event for wan  
2021-02-14T19:01:04 kernel pflog0: promiscuous mode enabled  
2021-02-14T19:01:04 kernel pflog0: promiscuous mode disabled

`

mimugmail commented 3 years ago

Is WAN on a VLAN or dedicated NIC? Are you on 21.1.1?

rudiservo commented 3 years ago

@mimugmail dedicated on Intel gigabit nic, no vlan. all hardware offloading has to be disabled, including vlans because of netmap, already did that, rebooted. Still, suricata does not seem to work or it's hit and miss, sometimes it does, sometimes it doesn't same config... logs don't really show much, suricata just says it's started with x workers, rulesets loaded... and opn logs are the same of what I posted except for the WAN losing it's IP.

It is a bit confusing and frustrating when two exact configurations worked once and then it does not work again.

OPNsense-bot commented 3 years ago

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

hipitihop commented 1 year ago

I've been hunting a related issue for some time. Complete lock out, network loss, often occurs night/early morning but not always. Sporadic, can go for a week or so without a lockup. When box locks up, it can be hours before it is discovered (still asleep) so it may not be dead, but no access, and never returns. Only work around is power cycle the dedicated box. This started some versions ago, worked fine prior, Currently on:

OPNsense 23.7.2-amd64 FreeBSD 13.2-RELEASE-p2 OpenSSL 1.1.1v 1 Aug 2023 Hardware: Protectli FW6B i3-7100U CPU @ 2.40GHz (2 cores, 4 threads) RAM 16GB

I thought it was related to a cron job which ran each night to update rules. Disabling the rules update/cron job (manually update each week) improves the situation, less frequent lock ups, but still occurs.

MCZocker32 commented 1 year ago

@hipitihop I have exactly the same issue. It worked for months (since I've gotten my OPNsense) I would guess that somewhere in August it started with what it seemed random crashes. Now almost every other day or everyday around 20:45-23:00 it crashes where the hardware device turns off completely without the ability to restart it unless the power is completely cut off.

AdSchellevis commented 1 year ago

try changing the tunable dev.netmap.admode to 2 (https://man.freebsd.org/cgi/man.cgi?query=netmap&sektion=4), some hardware/drivers have issues with netmap, software mode in these cases seem to be more stable.

hipitihop commented 1 year ago

try changing the tunable dev.netmap.admode to 2 (https://man.freebsd.org/cgi/man.cgi?query=netmap&sektion=4), some hardware/drivers have issues with netmap, software mode in these cases seem to be more stable.

@AdSchellevis I appreciate the response. I tried to set this tunable by adding it /etc/sysctl.conf however, it does not stick across reboot. Any tips ?

AdSchellevis commented 1 year ago

System->Settings->Tunables (https://docs.opnsense.org/manual/settingsmenu.html#tunables)

hipitihop commented 9 months ago

@AdSchellevis Many thanks. After changing the tunables and running for a couple of months, I no longer have any lockups.

rudiservo commented 9 months ago

@AdSchellevis I take it that this is mostly a realtek issue and the recomendation would be netmap emulated mode instead of native?

If so, should this be in the manual?

Thanks!

AdSchellevis commented 9 months ago

@rudiservo a note in the docs wouldn't hurt, miles may vary with different drivers. just open a pull-request for discussion