ntop / ntopng

Web-based Traffic and Security Network Traffic Monitoring
http://www.ntop.org
GNU General Public License v3.0
6.26k stars 656 forks source link

Interface Historical Data timeout #8705

Open mzac opened 1 month ago

mzac commented 1 month ago

Environment:

What happened: When trying to access the interface historical data page, the page does not display any of the data or graphs, it just times out. We have 4 x 10g configured in a view. Timeseries is stored in Influxdb and I can validate that accessing the similar data from Influx using Grafana works fine.

This behavior is new as it used to work fine in previous versions of ntopng. I should mention our influxdb is very large (see screenshot below).

How did you reproduce it? Anytime I go into the interface historical data it does not display any information.

Debug Information: image image

Here are some read and write stats on our storage:

root@pntop01:/mnt/san1# dd if=/dev/zero of=./test1.img bs=10G count=1 oflag=dsync
0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 3.7072 s, 579 MB/s

root@pntop01:/mnt/san1# hdparm -tT /dev/mapper/mpatha-part1
/dev/mapper/mpatha-part1:
 Timing cached reads:   16104 MB in  1.98 seconds = 8114.22 MB/sec
 Timing buffered disk reads: 2308 MB in  3.00 seconds = 769.11 MB/sec
MatteoBiscosi commented 1 month ago

Hi @mzac could you please share the output of journalctl -eu ntopng

mzac commented 1 month ago

Hi @MatteoBiscosi

Not getting anything in the journalctl logs or /var/log/ntopng.log.

However after rebooting the server (restarting influx, clickhouse and ntopng) it started working again. That was about 1h45m ago and now it is not working again.

There is still available memory on the system as we can see here so not sure why it stops working, would you like me to do any further debugging?

image

MatteoBiscosi commented 1 month ago

Hi @mzac sorry for the late response but i missed that you commented the issue. It's a bit difficult try to debug the issue without logs. Do you see any error in the last log trace? image Other than that could you please share your ntopng configuration? i'd like to check if the problem could be related to CH being to slow to process your amount of data.

MatteoBiscosi commented 1 month ago

Hi @mzac could you please check the folder /var/lib/ntopng/tmp/clickhouse ? there the records are stored before pushing them on the database, if the amount of files there is too large then the problem could be related to clickhouse being too slow ingesting the amount of flows/data you have

mzac commented 1 month ago

Hi @MatteoBiscosi

I checked both clickhouse and influx tmp dirs in /var/lib/ntopng/tmp and both are not full, only about 20 files each.

That said, here are the log traces I get:

27/Sep/2024 09:24:53 [NetworkInterface.cpp:2508] WARNING If TSO/GRO is enabled, please disable it for best accuracy
27/Sep/2024 09:24:53 [NetworkInterface.cpp:2502] Packets exceeding the expected max size have been received [zc:enp25s0f3][len: 1531][max len: 1522].
27/Sep/2024 09:21:28 [NetworkInterface.cpp:2508] WARNING If TSO/GRO is enabled, please disable it for best accuracy
27/Sep/2024 09:21:28 [NetworkInterface.cpp:2502] Packets exceeding the expected max size have been received [zc:enp25s0f2][len: 1523][max len: 1522].
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'view:zc:enp175s0f0,zc:enp175s0f1,zc:enp175s0f2,zc:enp175s0f3' [id: 109]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'view:zc:enp25s0f0,zc:enp25s0f1,zc:enp25s0f2,zc:enp25s0f3' [id: 104]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'zc:enp175s0f3' [id: 108]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'zc:enp175s0f2' [id: 107]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'zc:enp175s0f1' [id: 106]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'zc:enp175s0f0' [id: 105]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'zc:enp25s0f3' [id: 103]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'zc:enp25s0f2' [id: 102]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'zc:enp25s0f1' [id: 101]...
27/Sep/2024 09:17:35 [NetworkInterface.cpp:3804] Started packet polling on interface 'zc:enp25s0f0' [id: 100]...
27/Sep/2024 09:17:35 [FlowChecksLoader.cpp:297] WARNING Unable to find flow check 'tcp_issues_generic': skipping it
27/Sep/2024 09:17:35 [FlowChecksLoader.cpp:297] WARNING Unable to find flow check 'flow_alert_ndpi_error_code_detected': skipping it
27/Sep/2024 09:17:35 [FlowChecksLoader.cpp:297] WARNING Unable to find flow check 'udp_unidirectional': skipping it
27/Sep/2024 09:17:35 [FlowChecksLoader.cpp:297] WARNING Unable to find flow check 'potentially_dangerous': skipping it
27/Sep/2024 09:17:35 [FlowChecksLoader.cpp:297] WARNING Unable to find flow check 'remote_to_local_insecure_proto': skipping it
27/Sep/2024 09:17:35 [FlowChecksLoader.cpp:297] WARNING Unable to find flow check 'flow_alert_ndpi_punicody_idn': skipping it
27/Sep/2024 09:17:35 [startup.lua:253] Completed startup.lua
27/Sep/2024 09:17:25 [startup.lua:210] Importing ClickHouse dumps...
27/Sep/2024 09:17:25 [startup.lua:152] Initializing timeseries...
27/Sep/2024 09:17:25 [startup.lua:143] Initializing alerts...
27/Sep/2024 09:17:25 [startup.lua:127] Initializing device polices...
27/Sep/2024 09:17:25 [startup.lua:123] [lists_utils.lua:700] Loaded Category Lists (9475 hosts, 53453 IPs) loaded in 1 sec
27/Sep/2024 09:17:25 [startup.lua:123] [lists_utils.lua:594] Loaded dshield 7 days: 29 rules
27/Sep/2024 09:17:25 [startup.lua:123] [lists_utils.lua:594] Loaded ThreatFox: 8784 rules
27/Sep/2024 09:17:25 [LuaEngineNtop.cpp:947] Invalid line format or private IP 127.0.0.1 service-hh4fmtad-1321953982.sh.tencentapigw.com/ [/var/lib/ntopng/category_lists/ThreatFox.txt]
27/Sep/2024 09:17:25 [startup.lua:123] [lists_utils.lua:594] Loaded Stratosphere Lab: 11850 rules
27/Sep/2024 09:17:25 [startup.lua:123] [lists_utils.lua:598] List 'Snort IP Block List' has 0 rules. Please report this to https://github.com/ntop/ntopng
27/Sep/2024 09:17:25 [startup.lua:123] [lists_utils.lua:594] Loaded Snort IP Block List: 0 rules
MatteoBiscosi commented 1 month ago

Hi @mzac could you possibly try without the views and let me know if the issue persists from the logs everything seems fine

mzac commented 1 month ago

Do you mean just looking at the interfaces not part of the view or disable the view completely? When I look at one of the individual interfaces I still get the same issue.

image

MatteoBiscosi commented 1 month ago

Hi @mzac sorry but i wanted to understand if it was a view related problem but it seems not; Could you right click -> inspect -> console and send me the errors? image

mzac commented 1 month ago

image

MatteoBiscosi commented 2 weeks ago

Hi @mzac sorry but i didn't see the answer. Is it possible that you have some proxy/firewall blocking the request? because gateway timeout means that ntopng does not answer. You do not have issues in your journalctl even after the loading the page?