ntop / ntopng

Web-based Traffic and Security Network Traffic Monitoring
http://www.ntop.org
GNU General Public License v3.0
6.27k stars 656 forks source link

No Flow or SNMP Charts #2660

Closed kenatgcc closed 5 years ago

kenatgcc commented 5 years ago

I get "No results found" when trying to view charts for SNMP devices or flow exporters. I have SNMP timeseries enabled. My system info is :

Version 3.9.190617 - Enterprise Edition

Debian GNU/Linux 9.1 (stretch) Debian 9.1 [x86_64][Debian GNU/Linux 9.1 (stretch)] - 64 bit ntopng --interface "tcp://10.4.56.32:5556" --interface "enp0s25" --local-networks "10.3.0.0/16" 2.9.0-1555-41b7548 7.52.1 3.x 4.x 1.6.0 1.0 5m 3.2.6 3.7 Lua 5.3.5 4.2.1 1.2.0 This product includes GeoLite data created by MaxMind. 2.9.1 / 3.0

emanuele-f commented 5 years ago

Can you post a screenshot of the page with error?

kenatgcc commented 5 years ago

Screenshots added. snmp_chart_view snmp_intf_view snmp_device_view flow_stats_view flow_device_view

emanuele-f commented 5 years ago

What's the latest output of sudo journalctl -u ntopng?

kenatgcc commented 5 years ago

Attached.

journalctl.txt

emanuele-f commented 5 years ago

The log is truncated on the right, please post again

kenatgcc commented 5 years ago

journalctl2.txt

emanuele-f commented 5 years ago

Can you post a screenshot of the "SNMP Devices" view? What is the value of "Time Since Last Poll"? Inside the SNMP device configuration, under the cog icon, is "Device Polling" enabled?

kenatgcc commented 5 years ago

Attached.

journalctl.txt snmp_device_detail snmp_devices_view

emanuele-f commented 5 years ago

Can you provide remote gui access to debug this more easily? Please also see https://www.ntop.org/guides/ntopng/remote_assistance.html .

Are there any recent alerts for SNMP devices under the alerts icon above? The service log does not seem to be the most recent one, can you double check using journalctl and scrolling the view to the bottom? What's the output of sudo systemctl status ntopng?

kenatgcc commented 5 years ago

Attached is the outputs you requested. journalctl3.txt systemctl_status_ntopng.txt

kenatgcc commented 5 years ago

Attached.

journalctl.txt snmp_device_detail snmp_devices_view

kenatgcc commented 5 years ago

SNMP alerts screen shot

snmp_alerts

kenatgcc commented 5 years ago

I have loaded the n2n package, when UTC will you have time to connect? n2n_assistance.zip

emanuele-f commented 5 years ago

Please disable it and enable again to reset the credentials, send me the credentials privately at faranda@ntop.org, I can connect now.

Edit: make sure to enable gui admin access or provide via email gui user credentials

kenatgcc commented 5 years ago

Should I turn on InfluxDB? image

emanuele-f commented 5 years ago

Yes

kenatgcc commented 5 years ago

Do you have a link with InfluxDB instructions? image

kenatgcc commented 5 years ago

Or do I need to install the package? From what I read here, it looks like it should be automatically configured. https://www.ntop.org/guides/ntopng/basic_concepts/timeseries.html#influxdb-driver

emanuele-f commented 5 years ago

You need to install influxdb 1.7 from https://portal.influxdata.com/downloads . Then you can use the instructions above to enable it into ntopng.

kenatgcc commented 5 years ago

Okay, that's what I figured. What about nProbe running on the ntopng server, is that required? If so, should I be using a licensed Pro version so it's not limited in flow capacity?

emanuele-f commented 5 years ago

You need nprobe only if you are capturing netflow/sflow traffic, please check out https://www.ntop.org/nprobe/network-monitoring-101-a-beginners-guide-to-understanding-ntop-tools/ for more details. In such cases you will need a license.

kenatgcc commented 5 years ago

I only want the ntopng server to process ZMQ feeds from nProbe so I will systemctl disable nprobe.

kenatgcc commented 5 years ago

I have InfluxDB 1.7.6 running as the timeseries database and that seems to be working fine. With nProbe disabled on the ntopng server, I am not seeing any flow exporters or any SNMP chart views.

image

emanuele-f commented 5 years ago

Without nProbe is normal that "Flow exporters" is not shown as you are now only capturing from the network interfaces. For the chart you need to way at least 2 SNMP refresh times, so about 10 minutes after ntopng start and then they should appear.

kenatgcc commented 5 years ago

I am not seeing the chart icon in the SNMP Device view after an hour.

emanuele-f commented 5 years ago

Do you have any alerts below the InfluxDB "System" menu entry? Do you have errors in the ntopng log?

kenatgcc commented 5 years ago

No alerts below the InfluxDB "System" menu entry. A few errors in ntopng.log

20/Jun/2019 12:19:46 [LuaEngine.cpp:9040] WARNING: Script failure [/usr/share/ntopng/scripts/callbacks/system/5min.lua][.../share/ntopng/scripts/call backs/system/5min/influxdb.lua:120: attempt to call a nil value (method 'getInfluxdbVersion')] 20/Jun/2019 12:24:23 [LuaEngine.cpp:9040] WARNING: Script failure [/usr/share/ntopng/scripts/callbacks/system/5min.lua][.../share/ntopng/scripts/call backs/system/5min/influxdb.lua:120: attempt to call a nil value (method 'getInfluxdbVersion')]

kenatgcc commented 5 years ago

I am seeing the chart icons slowly populate, there are maybe 25 out of the 76 SNMP devices showing up. Perhaps if I wait a few more hours they will all get there.

image

emanuele-f commented 5 years ago

I suspect the problem is related to SNMP walks taking too much time. In https://github.com/ntop/ntopng/commit/8a239f97f443a442f595a4c2bd0e9ead37657e60 I've added a trace and alerts to monitor such duration. Please wait one hour and install the new package. Then check out the system alerts after 30 minutes and see if there are "Slow Periodic Activity" alerts: 2019-06-21_11-45

kenatgcc commented 5 years ago

I am seeing a couple of "Slow Periodic Activity" alerts after the upgrade. I have also reduced my SNMP devices from 76 to 17 to see if that helps. image

kenatgcc commented 5 years ago

I am getting a few of these alerts per hour since I installed the patch yesterday.

image

emanuele-f commented 5 years ago

We need to fix this

kenatgcc commented 5 years ago

Let me know what I can do to help.

cardigliano commented 5 years ago

@kenatgcc we pushed a fix for this to control the maximum duration of the SNMP walks, please update and let us know. Thank you.

kenatgcc commented 5 years ago

The alerts are still occurring after applying v3.9.190629 but the time to complete is reduced.

image

kenatgcc commented 5 years ago

The alerts are still occurring after applying v3.9.190629 but the time to complete is reduced.

image

cardigliano commented 5 years ago

@kenatgcc please update again to today's build and let us see the time to complete. Thank you.

kenatgcc commented 5 years ago

Some initial slow alerts, I will check again in about an hour and post results. image

emanuele-f commented 5 years ago

Alert for timeseries and discover.lua where incorrect, now fixed in d8d2638d50d48f4592191b2c4f0484e00970a594 . The second script however is taking very long, are you still using InfluxDB for the export as shown in image below?

2019-07-02_14-18

kenatgcc commented 5 years ago

Yes, InfluxDB is still my database. The latest system alerts are below image

cardigliano commented 5 years ago

@kenatgcc please update again (sorry for all the iterations, we are trying to identify what is causing the delay in your installation). Thank you.

kenatgcc commented 5 years ago

@cardigliano I may have mentioned before that this is a proof of concept machine, basically just a desktop pc. Below is a snapshot from Webmin, does this application need more a powerful machine to run properly? Load average rarely goes above 1.00.

image

kenatgcc commented 5 years ago

Here's some good news, there have been no "Slow Periodic Activity" alerts since the update and restart @ 08:10 EDT. image

cardigliano commented 5 years ago

@kenatgcc the machine you are currently using seems to be powerful enough for what you are doing now. As of alerts, we are currently interrupting snmp polling to honour the 5 minutes polling slot, however we should notify what activity has been interrupted now. Thank you for your feedback.

cardigliano commented 5 years ago

@kenatgcc I pushed more improvements to detect "unresponsive" devices when snmp polling is interrupted: you should see a triangle (warning) on the corresponding device in the list. This will be available with the next build.

cardigliano commented 5 years ago

@kenatgcc please confirm that you see the warning (triangle) for "unresponsive" devices, however it seems the peridoic activity honours the deadline now

kenatgcc commented 5 years ago

@cardigliano I see the triangles indicating "unresponsive" SNMP devices but, the devices respond and complete "snmpwalk" command from the server within a few seconds... What do you suggest to fix the "unresponsive" devices? Delete and re-add perhaps?? image

cardigliano commented 5 years ago

@kenatgcc this seems to be due to a few factors: the number of walks ntopng does, the time each walk takes, the number of devices and ports you have. We need to rework the way ntopng polls the devices to optimize all of this and make sure it completes all operations within the time slot even with the number of devices you have.

kenatgcc commented 5 years ago

@cardigliano should I just delete the unresponsive SNMP devices?

cardigliano commented 5 years ago

@kenatgcc as I said we need to improve snmp polling to try to speed up it and make it work for you. In the meanwhile, it's up to you if you want to keep those devices there, however please note that polling is getting interrupted for those devices, thus data is missing for them.