Closed chrcoluk closed 4 years ago
The timeout value passed to this routine is the standard Cacti SNMP timeout value, this is then passed back to Cacti itself to get the OIDs. So, my only theory right now is if there is a problem with the high value, it is likely that the number of retries plus the timeout value is causing your poller to run for too long.
But that is a core issue and more to do with the configuration than this script. Also, having such high timeouts for devices means you should really have a high polling cycle (5 mins) to handle that.
The poll completes in 6 seconds usually no where near the 300 sec period.
With the high value set and when I tested in cli, the 0s came back very quickly (under 100ms), no longer than when it succeeds. My theory is the code is hitting some kind of overflow somewhere when the high value is set.
The reason for the 3000, is probably in the past I had temperamental network conditions so ended up using 3000 on my device template.
That is likely a library issue and not something that we can’t prevent. Depending on the system it could be an overflow because I’m pretty sure somewhere that gets multiplied by 1000 again which may seem wrong but is correct with the library.
This appears to have been fixed by PR #8 as I hadn't noticed that we were multiplying a timeout that was already multipled.
I noticed out of the 3 devices I added this graph to, 2 of them reported unable to read OID values.
Yet other snmp graphs worked, and also cli test command worked.
I enabled debug logging, and looked at the syntax used.
The 2 broken devices had a 3000 (3 second) snmp timeout, the working device had 500.
I tested different values increasing 100 at a time, and at 2100 timeout it works, at 2200 or higher all values report 0.
I changed the timeout for the affected devices to 2000 and now it works on all 3 devices.