ntop / ntopng

Web-based Traffic and Security Network Traffic Monitoring
http://www.ntop.org
GNU General Public License v3.0
6.31k stars 658 forks source link

ntopng silently exits after consuming all memory #8132

Closed sim4dim closed 10 months ago

sim4dim commented 11 months ago

Environment:

What happened: ntopng exits after consuming all memory (16G) and swap (8G)

How did you reproduce it? Happens every time after 4-6 hours

Debug Information:

19/Dec/2023 20:59:29 [main.cpp:442] Logging onto /var/db/ntopng-enterprise/ntopng.log 19/Dec/2023 20:59:29 [main.cpp:445] Working directory: /var/db/ntopng-enterprise 19/Dec/2023 20:59:29 [main.cpp:447] Scripts/HTML pages directory: /usr/local/share/ntopng 19/Dec/2023 20:59:29 [Ntop.cpp:536] Welcome to ntopng amd64 v.6.1.231219 (dev:52e7badf92487baf39c9cb8f601ebea93e50950d:20231219) 19/Dec/2023 20:59:29 [Ntop.cpp:545] Built on FreeBSD 13.1 19/Dec/2023 20:59:29 [Ntop.cpp:547] (C) 1998-23 ntop 19/Dec/2023 20:59:29 [Ntop.cpp:1024] Adding 192.168.1.1/32 as IPv4 interface address for bge3 19/Dec/2023 20:59:29 [Ntop.cpp:1035] Adding 192.168.1.0/24 as IPv4 local network for bge3 19/Dec/2023 20:59:29 [Ntop.cpp:1024] Adding {removed} as IPv4 interface address for igc0 19/Dec/2023 20:59:29 [Ntop.cpp:1035] Adding {removed} as IPv4 local network for igc0 19/Dec/2023 20:59:29 [Ntop.cpp:1024] Adding 192.168.100.100/32 as IPv4 interface address for igc0 19/Dec/2023 20:59:29 [Ntop.cpp:1035] Adding 192.168.100.0/23 as IPv4 local network for igc0 19/Dec/2023 20:59:29 [PeriodicActivities.cpp:108] Started periodic activities loop... 19/Dec/2023 20:59:30 [startup.lua:38] Processing startup.lua: please hold on... 19/Dec/2023 20:59:30 [startup.lua:121] [lists_utils.lua:835] Refreshing category lists... 19/Dec/2023 20:59:30 [startup.lua:121] [lists_utils.lua:467] Failure loading host 'ip' category '100' in list 'Stratosphere Lab' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host '56565' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host 'noluyoruzawk' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host '4040' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host 'datacikerim' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host 'nicehash' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host 'mpapwpodllalw' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host 'makelovenotmalware.local' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host 'lwwfechxdr8aiq0bbhtrxry7i1c8itnz' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:467] Failure loading host 'ddkkba0zqra9dtqunixbqaa8olgtkc5j' category '100' in list 'ThreatFox' 19/Dec/2023 20:59:31 [startup.lua:121] [lists_utils.lua:756] Category Lists (31573 hosts, 11223 IPs, 0 JA3) loaded in 1 sec 19/Dec/2023 20:59:31 [startup.lua:125] Initializing device polices... 19/Dec/2023 20:59:31 [startup.lua:141] Initializing alerts... 19/Dec/2023 20:59:31 [startup.lua:150] Initializing timeseries... 19/Dec/2023 20:59:31 [startup.lua:217] [blog_utils.lua:125] Fetching latest ntop blog posts... 19/Dec/2023 20:59:31 [startup.lua:245] Completed startup.lua 19/Dec/2023 20:59:31 [PeriodicActivities.cpp:167] Found 10 activities 19/Dec/2023 20:59:31 [NetworkInterface.cpp:3717] Started packet polling on interface WAN (opt1) [id: 1]... 19/Dec/2023 20:59:31 [NetworkInterface.cpp:3717] Started packet polling on interface LAN (lan) [id: 4]... 19/Dec/2023 21:00:34 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 21:00:34 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 21:15:35 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 21:15:35 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 21:30:05 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 21:31:25 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 21:45:39 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 21:45:39 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 22:00:06 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 22:01:33 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 22:15:41 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 22:15:41 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 22:30:07 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 22:31:22 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 23:00:20 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 23:00:20 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 23:15:30 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 23:15:30 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 23:30:20 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 19/Dec/2023 23:31:38 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 20/Dec/2023 00:00:05 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 20/Dec/2023 00:01:43 [local_network_checks.lua:36] [recipients.lua:955] ERROR: Failure encoding notification 20/Dec/2023 00:02:14 [LuaEngine.cpp:326] [bge3] ERROR: Cannot complete top talkers generation in 1 minute. Is there a huge number of hosts in the system? 20/Dec/2023 00:02:20 [LuaEngine.cpp:326] [igc0] ERROR: Cannot complete top talkers generation in 1 minute. Is there a huge number of hosts in the system? 20/Dec/2023 00:02:51 [housekeeping.lua:37] [lists_utils.lua:418] Updating list 'Abuse.ch URLhaus' [https://urlhaus.abuse.ch/downloads/hostfile/]... OK 20/Dec/2023 00:02:54 [housekeeping.lua:37] [lists_utils.lua:418] Updating list 'Emerging Threats' [https://rules.emergingthreats.net/fwrules/emerging-Block-IPs.txt]... OK 20/Dec/2023 00:02:57 [housekeeping.lua:37] [lists_utils.lua:418] Updating list 'NoCoin Filter List' [https://raw.githubusercontent.com/hoshsadiq/adblock-nocoin-list/master/hosts.txt]... OK 20/Dec/2023 00:02:57 [housekeeping.lua:37] [lists_utils.lua:418] Updating list 'SSLBL Botnet C2 IP Blacklist' [https://sslbl.abuse.ch/blacklist/sslipblacklist.txt]... OK 20/Dec/2023 00:03:02 [housekeeping.lua:37] [lists_utils.lua:418] Updating list 'Stratosphere Lab' [https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/Todays-Blacklists/AIP_historical_blacklist_prioritized_by_newest_attackers.csv]... OK 20/Dec/2023 00:03:02 [housekeeping.lua:37] [lists_utils.lua:418] Updating list 'ThreatFox' [https://threatfox.abuse.ch/downloads/hostfile/]... OK 20/Dec/2023 00:03:02 [housekeeping.lua:37] [lists_utils.lua:418] Updating list 'dshield 7 days' [https://raw.githubusercontent.com/firehol/blocklist-ipsets/master/dshield_7d.netset]... OK

And this is besides the typo in:

22/Dec/2023 14:48:46 [Ntop.cpp:729] Houkeeping activities (main loop) took 9.454s. <---------

sim4dim commented 11 months ago

Upon further analysis the issue seems to be correlated with Network Discovery: Running in foreground with -v 3 points to ntopng is getting stuck on arp entry from device itself:

22/Dec/2023 16:30:13 [NetworkDiscovery.cpp:273] Received ARP reply from 192.168.100.1 22/Dec/2023 16:30:14 [NetworkDiscovery.cpp:273] Received ARP reply from 192.168.100.1

it repeats this until memory is exhausted ...

If network discovery is disabled - issue of memory exhaustion is not observed.

cardigliano commented 11 months ago

@sim4dim I pushed a fix for the notification "encoding" in your first trace, as of the last trace about the "Received ARP reply from", do you see the same message for the same IP and only that one? Does it print it indefinitely? Who is 192.168.100.1, you said device itself, please clarify. Thank you.

sim4dim commented 11 months ago

I am having this message " Received ARP reply from 192.168.100.1" repeating indefinitely. I had misspoke - 192.168.100.1 is cable modem attached to WAN interface not device itself. I have a Virtual IP (192.168.100.100) attached to WAN interface as well.

cardigliano commented 11 months ago

In order to figure out what's going on, I added a debug mode to collect some info. Please follow the steps below:

Screenshot 2024-01-03 at 10 03 41 Screenshot 2024-01-03 at 10 04 11
sim4dim commented 10 months ago

ntopng Community v.6.1.231222 (FreeBSD 13.1) had been running for last 24hr with network discovery enabled with no memory issues and discovery running every 15 min. I will monitor it further.

cardigliano commented 10 months ago

Sounds better. I did some change in the last release that changed the way we trigger the discovery, and also the notificaitons. Please keep us posted. Thank you.