Closed sim500 closed 2 years ago
This is very interesting! We have experienced similar high CPU usage when polling a lot of devices with SNMPv3. We have never dug into it, we simply assumed it would be the encryption which was the problem. If this change can be safely applied to the library I think it would be a huge improvement for v3.
Hi @sim500 - thanks for reporting this, and for including a fix.
I've implemented the cache - pretty much as you've outlined with some minor stylistic variations. The difference is marked. The performance of noAuthNoPriv, authNoPriv and authPriv are now comparable, instead of being an authNoPriv and authPriv being an order of magnitude slower than noAuthNoPriv.
On my local testing environment, an entire MIB walk of my agent was around these approximate times before the cache was introduced (authentication always SHA1, privacy always 128-bit AES):
After implementing the cache (all other things remaining the same), these became:
Which makes this the most exciting enhancement for node-net-snmp for some time! :-)
I don't feel the security posture is vastly different with the introduction of the cache into memory, as the source variables to produce the cached values are already there anyway.
Have deployed the fix to master now, and released in version 3.6.0 of the npm.
Thanks again! :-)
I'm very glad our fix helped, it was a revelation when my colleague BlackTazz89 found this solution, he's the one to take credit. And thank you all for having made this powerful library and for maintaining it constantly updated.
Doing multiple SNMPv3 GETs at the same time overloads the CPU and probably blocks the event loop. Comparing the same operation using v2, the difference in the time of processing of the single requests and the CPU load is very noticeable. With v2 is much much faster than v3, and the computation load is more distributed.
Profiler outcomes:
V2
V3
We tough that the problem was the heavy computation for encryption. We needed to process a massive number of SNMPv3 requests in a short time, so we tried to overcome this issue using a very powerful machine with 20 threads per single core, with AES-NI hardware acceleration, spawning several processes. But every CPU was suffering anyway, and the time of processing of the single requests didn't change much. So, after further analysis, we noticed that the problem wasn't AES but SHA1, and a SHA digest has to be computed for every single GET request. This adds a ton of workload for hundreds of parallel GET requests. The function that processes this digest is "Authentication.passwordToKey". This function is logically a static function, based on the authProtocol, the authPasswordString and the engineID (unique for every single device). So same input - same output, we thought to cache the results of this function and the outcome of this workaround was very effective. No more CPU overload, way faster requests and the event-loop didn't appear to be stuck anymore.
I can't really tell if caching the SHA digests is a secure procedure. But there is a clear performance problem if the application has to compute a digest for hundreds of parallel requests, also on a single device.