Closed GrumpyOldNetworkGuy closed 3 years ago
Hi,
Our ntopng fails again like in the last few days.
root@schnuffel:~# tcpdump -nni br224 'tcp port 3443'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br224, link-type EN10MB (Ethernet), capture size 262144 bytes
14:38:46.329208 IP 10.139.205.11.33108 > 10.139.192.11.3443: Flags [S], seq 4107689044, win 64240, options [mss 1250,sackOK,TS val 2013915497 ecr 0,nop,wscale 7], length 0
14:38:46.329510 IP 10.139.205.11.33110 > 10.139.192.11.3443: Flags [S], seq 3147065321, win 64240, options [mss 1250,sackOK,TS val 2013915497 ecr 0,nop,wscale 7], length 0
14:38:46.329535 IP 10.139.205.11.33112 > 10.139.192.11.3443: Flags [S], seq 335358759, win 64240, options [mss 1250,sackOK,TS val 2013915497 ecr 0,nop,wscale 7], length 0
14:38:46.329553 IP 10.139.205.11.33114 > 10.139.192.11.3443: Flags [S], seq 2502990202, win 64240, options [mss 1250,sackOK,TS val 2013915497 ecr 0,nop,wscale 7], length 0
14:38:46.329568 IP 10.139.205.11.33116 > 10.139.192.11.3443: Flags [S], seq 1498458717, win 64240, options [mss 1250,sackOK,TS val 2013915497 ecr 0,nop,wscale 7], length 0
Restarting ntopng again... Please, can someone help me. I have no more idea. Gerd
Seems minute.lua
stalls. Please, post full nProbe and ntopng configurations. Also include recipients and endpoints if you are using them.
@simonemainardi Here are the complete configuration files: schnuffel-conf.zip No recipients or endpoints are configured in addition to builtin_recipient_sqlite/builtin_endpoint_sqlite. No alarms have been recorded in the last 24 hours.
@simonemainardi
--- 18:50 CEST ---
Now I started ntopng with only nprobe@xwin0 and nprobe@xwin1 as the source:
--- ntopng.conf ---
#
#-i=br980
-i=tcp://127.0.0.1:5556,tcp://127.0.0.1:5557
#-i=tcp://127.0.0.1:5558
#-i=tcp://127.0.0.1:5559
#
Let's wait for the next standstill.
How do you think. Should I upgrade to the nightly builds packages and test them out?
@simonemainardi --- 02:10 CEST ---
Now I start ntopng with a direct connection of zc:ens2f0@0,zc:ens2f0@1 without nprobe:
--- ntopng.conf ---
#
#-i=br980
#-i=tcp://127.0.0.1:5556,tcp://127.0.0.1:5557
-i=zc:ens2f0@0,zc:ens2f0@1
#-i=tcp://127.0.0.1:5558
#-i=tcp://127.0.0.1:5559
#
I'm curious...
@simonemainardi It's crazy, but now I was able to turn off the rest of the ckecks:
Do you have top sites or traffic behavior enabled?
Both are switched off.
In the current test, ntopng drops packets like hell:
@simonemainardi --- 11:17 CEST ---
So ZMQ is probably not the problem.
@simonemainardi I restarted ntopng. Now it is using inluxdb on another host. The other settings are the same as in the previous experiment.
@simonemainardi --- 19:44 CEST ---
Running influxdb on another host didn't help either.
Can we arrange a remote session to troubleshoot? As soon as you find the process stuck again drop me an email at mainardi at ntop dot org and I can connect.
I've fixed an issue possibly causing this behavior. New dev and stable builds are in progress, please hold on a couple of hours, update, and let me know. Thanks for providing the machine for the debugging.
@GrumpyOldNetworkGuy can you please confirm and close this issue is solved? Thanks.
New uptime record for ntopng-5.0-stable: 2 Days, 16:01:04! Thank you @simonemainardi for fixing this issue.
Hi,
Restarting ntopng reactivates recording on nprobe@xwin[0,1]. The other two ZMQ sources (LAN, DC) work perfectly. The only strange thing I found is that minute.lua is causing problems at the same time: If it gets very bad, at the same moment the GUI stops working. The connection to TCP/3443 is then still up.
The TCP connection between nprobe @ xwin [0,1] is intact. But no data flows:
To make things as easy as possible for ntopng, almost all checks have been disabled. By the way, apart from "unexpected DHCP", the checks shown here cannot be diabled. The error message is "Unknown user script ...".
Here is the configuration: configs.txt
How can I help you find the cause?
Thanks! Gerd