howardjones / network-weathermap

Network Weathermap draws diagrams from data
http://www.network-weathermap.com/
MIT License
425 stars 94 forks source link

Cacti 1.1.9 - Undefined index #106

Closed jpobeda closed 6 years ago

jpobeda commented 7 years ago

Hi Howie,

I upgraded Cacti to 1.1.9 yesterday and Today I found this on cacti log. Only these 2 lines..

2017/06/07 03:31:55 - CMDPHP PHP ERROR NOTICE Backtrace: (/poller.php: 716 api_plugin_hook)(/lib/plugins.php: 69 api_plugin_run_plugin_hook)(/lib/plugins.php: 162 weathermap_poller_bottom)(/plugins/weathermap/lib/cacti-plugin-poller.php: 158 weathermap_run_maps)(/plugins/weathermap/lib/poller-common.php: 242 readData)(/plugins/weathermap/lib/Weathermap.class.php: 1750 readDataFromTargets)(/plugins/weathermap/lib/Weathermap.class.php: 1550 performDataCollection)(/plugins/weathermap/lib/WeatherMapDataItem.class.php: 224 collectDataFromTargets)(/plugins/weathermap/lib/WeatherMapDataItem.class.php: 262 readData)(/plugins/weathermap/lib/WMTarget.class.php: 124 ReadData)(/plugins/weathermap/lib/datasources/WeatherMapDataSource_snmp.php: 95 CactiErrorHandler)(/lib/functions.php: 4364 cacti_debug_backtrace) 2017/06/07 03:31:55 - ERROR PHP NOTICE in Plugin 'weathermap': Undefined index: XXX.YY.28.253 in file: /usr/share/cacti-1.1.9/plugins/weathermap/lib/datasources/WeatherMapDataSource_snmp.php on line: 95

NODE CONFIGS ARE

NODE vpn_uptime1 LABEL {node:this:bandwidth_in:%3T} LABELFONT 109 AICONFILLCOLOR 128 198 231 LABELOUTLINECOLOR none LABELBGCOLOR none ICON 66 24 box TARGET snmp:{map:community}:XXX.YY.28.253:1.3.6.1.4.1.9.9.171.1.2.2.1.8.1.13.49.52.51.46.57.54.46.50.52.46.49.57.51.1.10.50.48.50.46.55.46.52.48.46.49.500:- POSITION main_square 117 -14

NODE vpn_uptime2 LABEL {node:this:bandwidth_in:%3T} LABELFONT 109 AICONFILLCOLOR 128 198 231 LABELOUTLINECOLOR none LABELBGCOLOR none ICON 66 24 box TARGET snmp:{map:community}:XXX.YY.28.253:1.3.6.1.4.1.9.9.171.1.2.2.1.8.1.13.49.52.51.46.57.54.46.50.52.46.49.57.51.1.10.50.48.50.46.55.46.52.48.46.57.500:- POSITION main_square 117 10

NODE vpn_uptime3 LABEL {node:this:bandwidth_in:%3T} LABELFONT 109 AICONFILLCOLOR 128 198 231 LABELOUTLINECOLOR none LABELBGCOLOR none ICON 66 24 box TARGET snmp:{map:community}:XXX.YY.28.253:1.3.6.1.4.1.9.9.171.1.2.2.1.8.1.11.50.48.51.46.57.54.46.55.51.46.51.1.15.49.50.51.46.49.48.48.46.49.48.51.46.49.49.51.500:- POSITION main_square 117 34

The only clue I can give you is the attached screenshot at that time. So It may be related to non-existent OID?

nwm_

howardjones commented 7 years ago

Did you upgrade Weathermap also? From what previous version, if you did? And from which Cacti version?

Cacti 1.x now treats any php error as a reason to kill the plugin that produced it. So if you came from 0.8.8 then it's possible the error existed before but was ignored.

howardjones commented 7 years ago

Looks like the immediate cause is that the host is not responding to a request. But it looks like there is some initialisation code missing there, which causes the actual error.

jpobeda commented 7 years ago

It was a git from last week running on 1.1.7 so it's definitely a 1.x version

I've been working around so many issues with Cacti 1.x that maybe it's been there from the very beginning.

This time this is not killing nwm, maybe because it only happened twice. I noticed Cacti does that when it happens very often.

jpobeda commented 7 years ago

It's not like a huge issue but maybe it points out to something. I'm pretty sure that at that time those tunnels were down so the OID didn't exist when nwm tried to poll them.

Would it be a different error for non-existent OID?

howardjones commented 7 years ago

Yes - definitely something to fix :-)

I'm not sure if it would be different - it looks like it's just checking if it got a valid result, so I would guess a non-existent OID would do it too. But.. it should just complain that it got invalid data, like the rrd plugin would. The actual issue is that there should be something to set the error count to zero for each host the first time it is seen, and that's missing. Same issue in all 3 snmp ds plugins.

The idea is supposed to be that if more than a certain number of requests for a host fail, it stops trying for that run. I had a situation with a big chassis switch, where if the switch was down, 200+ SNMP requests would slowly timeout and fail, and make the poller overrun.

jpobeda commented 7 years ago

Oh yeah! That's definitely something to be worry about hehehe. Don't forget that import option for snmpv2 pls <3