ntop / ntopng

Web-based Traffic and Security Network Traffic Monitoring
http://www.ntop.org
GNU General Public License v3.0
6.04k stars 639 forks source link

Failed flows export to Clickhouse #8409

Closed atemix closed 2 days ago

atemix commented 1 month ago

Ubuntu 22.04.3 LTS ntopng v.6.0.240502 Clickhouse 24.3.3.102

Hello team!

After 10-15 minutes after ntopng start, exported flows dissapearing from web interface. Clickhouse status is OK. After restarting flows become visible again, but interface counter for exported flows is always 0.

We have 10+ same ntopng installations. Exporting works fine for all except this one. The only difference is that failed one reading flows from kafka, installed on the same server. Other installations are packet interfaces + nprobe flows.

ntopnng-clickhouse-1

ntopnng-clickhouse-2

ntopnng-clickhouse-3

After restarting:

ntopnng-clickhouse-4

How did you reproduce it?

Ntopng reading flows from Kafka. Configured flows export to clickhouse: --dump-flows="clickhouse;0.0.0.0;XXX;XXX;XXX"

Debug Information:

No errors or warnings on startup. After sometime periodically shown this: 28/May/2024 10:00:00 [LuaEngine.cpp:610] WARNING: .../scripts/callbacks/minute/interface/localhosts_stats.lua:24: field 'dumpLocalHosts2redis' is not callable (a nil value) 28/May/2024 11:00:00 [LuaEngine.cpp:610] WARNING: .../scripts/callbacks/minute/interface/localhosts_stats.lua:24: field 'dumpLocalHosts2redis' is not callable (a nil value) 28/May/2024 11:00:00 [LuaEngine.cpp:610] WARNING: .../scripts/callbacks/minute/interface/localhosts_stats.lua:24: field 'dumpLocalHosts2redis' is not callable (a nil value) 28/May/2024 12:00:00 [LuaEngine.cpp:610] WARNING: .../scripts/callbacks/minute/interface/localhosts_stats.lua:24: field 'dumpLocalHosts2redis' is not callable (a nil value) 28/May/2024 12:00:00 [LuaEngine.cpp:610] WARNING: .../scripts/callbacks/minute/interface/localhosts_stats.lua:24: field 'dumpLocalHosts2redis' is not callable (a nil value) 28/May/2024 13:00:00 [LuaEngine.cpp:610] WARNING: .../scripts/callbacks/minute/interface/localhosts_stats.lua:24: field 'dumpLocalHosts2redis' is not callable (a nil value) 28/May/2024 13:00:00 [LuaEngine.cpp:610] WARNING: .../scripts/callbacks/minute/interface/localhosts_stats.lua:24: field 'dumpLocalHosts2redis' is not callable (a nil value)

MatteoBiscosi commented 1 month ago

Hi @atemix a new build should be available in a couple of hours with the fix to the issue you pointed out. Please let me know if the issue is still present after updating

atemix commented 1 month ago

Hi @MatteoBiscosi! Updated to 6.0.240531 - no dumpLocalHosts2redis warnings in logs. But historical flows still dissapearing after 15 minutes.

ntopnng-clickhouse-5

MatteoBiscosi commented 1 month ago

Hi @atemix is it possible for you to move to the dev version of ntopng and let me know if the issue persists?

atemix commented 1 month ago

Nothing changes after upgrading to dev 6.1.240603. Exported Flows 0 [0 fps] in interface details. After restarting ntopng flows appears in historical. Clickhouse status green.

New messages in log for category lists and remote_to_local_insecure_proto check.

03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:837] Category lists not loaded (offline) 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/Abuse.ch URLhaus.txt 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded Abuse.ch URLhaus: 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'Abuse.ch URLhaus' has 0 rules. Please report this to https://git> 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/ELLIO: Community Feed > 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded ELLIO: Community Feed (for non-commercial use): 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'ELLIO: Community Feed (for non-commercial use)' has 0 rules. Ple> 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/Emerging Threats.txt 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded Emerging Threats: 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'Emerging Threats' has 0 rules. Please report this to https://git> 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/IPsum Threat Intellige> 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded IPsum Threat Intelligence Feed: 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'IPsum Threat Intelligence Feed' has 0 rules. Please report this > 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/NoCoin Filter List.txt 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded NoCoin Filter List: 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'NoCoin Filter List' has 0 rules. Please report this to https://g> 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/SSLBL Botnet C2 IP Bla> 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded SSLBL Botnet C2 IP Blacklist: 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'SSLBL Botnet C2 IP Blacklist' has 0 rules. Please report this to> 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/Stratosphere Lab.txt 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded Stratosphere Lab: 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'Stratosphere Lab' has 0 rules. Please report this to https://git> 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/ThreatFox.txt 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded ThreatFox: 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'ThreatFox' has 0 rules. Please report this to https://github.com> 03/Jun/2024 13:42:32 [LuaEngineNtop.cpp:885] Unable to open category list in /opt/data/ntopng/category_lists/dshield 7 days.txt 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:629] Loaded dshield 7 days: 0 rules 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:633] List 'dshield 7 days' has 0 rules. Please report this to https://githu> 03/Jun/2024 13:42:32 [startup.lua:122] [lists_utils.lua:740] Loaded Category Lists (0 hosts, 0 IPs, 0 JA3) loaded in 0 sec 03/Jun/2024 13:42:33 [startup.lua:126] Initializing device polices... 03/Jun/2024 13:42:33 [startup.lua:142] Initializing alerts... 03/Jun/2024 13:42:33 [startup.lua:151] Initializing timeseries... 03/Jun/2024 13:42:33 [startup.lua:209] Importing ClickHouse dumps... 03/Jun/2024 13:42:34 [startup.lua:248] Completed startup.lua 03/Jun/2024 13:42:34 [FlowChecksLoader.cpp:296] WARNING: Unable to find flow check 'remote_to_local_insecure_proto': skipping it 03/Jun/2024 13:42:34 [FlowChecksLoader.cpp:296] WARNING: Unable to find flow check 'remote_to_local_insecure_proto': skipping it

ntopnng-dev-clickhouse-1

ntopnng-dev-clickhouse-2

MatteoBiscosi commented 1 month ago

Hi @atemix no worries about the remote_to_local_insecure_proto, it's okay. How's disk usage, is it full or not?

atemix commented 1 month ago

Hi @MatteoBiscosi It's a bit new deployment, servers have a lot of free space: clickhouse /dev/mapper/data--vg-data--lv 1007G 2.6G 954G 1% /opt/data ntopng /dev/mapper/data--vg-data--lv 251G 89M 239G 1% /opt/data

Maybe something wrong with internal ntopng interface logic? In #8428 the same ntopng.

"On ntopng restart for a couple of seconds appears traffic breakdown pie and exported flow count, but after again No traffic yet. If all nprobes stopped and no new messages in kafka (0 bps traffic in ntopng), ntopng starts to show traffic and networks. When traffic resumes, breakdown and exported flows counter hangs and doesn't changes."

If all remote nprobes down and zero traffic in ntopng, its showing exported counter and everyrhing else. When at least one nprobe dumping flows to kafka, counter hangs and doesn's changing.

image

MatteoBiscosi commented 1 month ago

Could you please share your ntopng and nprobe configuration files? if you prefer you can drop an email at biscosi at ntop.org

atemix commented 1 month ago

ntopng: --interface="kafka://127.0.0.1:[xxx]" --instance-name=[xxx] --data-dir=/opt/data/ntopng --pcap-dir=/opt/data/n2disk --http-port=0 --https-port=443 --max-num-hosts=1048576 --max-num-flows=67108864 --dns-mode=3 --local-networks=/etc/ntopng/local-networks.conf --dump-flows="clickhouse;[xxx];ntopng;[xxx];[xxx]" --offline

nprobe-netflow: --disable-startup-checks= --pid-file=/var/run/nprobe-netflow.pid --collector-port=2055 --ntopng="kafka://127.0.0.1:[xxx]" -T="@NTOPNG@" -E=0:999 -i=none -n=none -a= --flow-deduplication=15

nprobe-sflow: --disable-startup-checks= --pid-file=/var/run/nprobe-sflow.pid --collector-port=6343 --ntopng="kafka://127.0.0.1:[xxx]" -T="@NTOPNG@" -E=0:999 -i=none -n=none -a= --flow-deduplication=15

Other nprobes configured the same way, except external kafka IP.

MatteoBiscosi commented 1 month ago

Hi @atemix use zmq instead of kafka if possible here an example https://www.ntop.org/guides/ntopng/using_with_other_tools/nprobe.html

atemix commented 1 month ago

On dev 6.1.240603 unable to configure zmq interface with encryption, only part of interface page loading. After downgrading to stable 6.0.240603 everything seems working. But what's wrong with the kafka interface?

MatteoBiscosi commented 1 month ago

The issues is with kafka interface, I have to fix that issue but to do that i have to set up a kafka environment for testing and that could require a while. If it's not a must have, possibly use zmq, currently i cannot check that kafka issue

atemix commented 4 weeks ago

Switched to zmq, @MatteoBiscosi thx for help