Open vincentbernat opened 13 years ago
MIB caching would be very useful, I've written some snmpy-based check commands for nagios/icinga, but I have a hunch that they're spending all their CPU time just parsing MIB files.
How do you feel about using an sqlite db in /tmp/
to as a cache, storing pickled objects?
If that doesn't sound good, do you have any other suggestions?
Okay scratch that, you can't pickle CData
objects. Any ideas?
I don't see any other option than implementing a serializer/deserializer. That shouldn't be too hard but I don't find time to do it. As for storage, I was thinking putting the cache right into pyc
files. But we could just give a cache file to the manager instead of trying something too complex.
I hacked up a proof-of-concept using a giant picklefile as a datastore: https://gist.github.com/leth/048e27a6569c801006b4
Things that need improving still:
That seems pretty interesting. Feel free to turn that into a PR (with an option to pass to the manager so that caching is optional, at least first).
So, architecturally, how should this look?
Currently I have a new Manager
subclass and a new load
function living in snimpy.cache
, so you have to choose to use a manager with caching functionality.
It sounds like you're suggesting changing the existing manager class to pass in a cache store object, or control flag.
Which approach would you prefer?
The changes to the load
function (to delay loading the mib until a cache miss) make the latter approach a bit more tricky.
You are right. As MIB loading is totally independent of the manager, this seems tricky to ask the manager to use a cache or not.
After reading more carefully, I think there are some other drawbacks to cache at this level. If for example, you load A-MIB from the cache, then load B-MIB not from the cache and that B-MIB requires A-MIB, you won't be able to load it. This could be solved with an additional argument to load
to avoid to use the cache. Also, you may end up with a module partially in cache. When trying to access something not in the cache, snimpy will try to iterate over loaded MIB to get the appropriate type but will fail because the MIB is not really loaded.
Maybe the bottleneck is not libsmi but snimpy is using it inefficiently. Do you have an example of script where loading is slow and using public MIB modules? We could check where most of the time is spent.
A totally different solution would be to use checkpointing. The process could freeze itself and resume instead of restarting from 0. Your scripts will be a loop with checkpointing. There are some solutions like CRIU but nothing that could be used transparently. This seems a bit far fetched.
Thanks for taking a closer look!
After reading more carefully, I think there are some other drawbacks to cache at this level. If for example, you load A-MIB from the cache, then load B-MIB not from the cache and that B-MIB requires A-MIB, you won't be able to load it. This could be solved with an additional argument to load to avoid to use the cache.
Perhaps instead, if we have a cache miss, we have to go back and load the real mibs for those we had already loaded from the cache (A-MIB), and update our existing objects to point to them, before we load B-MIB.
The weakref
module will allow us to keep a reference to the objects.
Also, you may end up with a module partially in cache. When trying to access something not in the cache, snimpy will try to iterate over loaded MIB to get the appropriate type but will fail because the MIB is not really loaded.
Yes, we could either ensure the whole module is cached, or instead treat it like a cache miss above. I guess the former is probably best, because it gets all the cache building work done once.
Maybe the bottleneck is not libsmi but snimpy is using it inefficiently. Do you have an example of script where loading is slow and using public MIB modules? We could check where most of the time is spent.
I do need to do some benchmarking, I'll get back to you on this.
A totally different solution would be to use checkpointing. The process could freeze itself and resume instead of restarting from 0. Your scripts will be a loop with checkpointing. There are some solutions like CRIU but nothing that could be used transparently. This seems a bit far fetched.
I'm afraid that's not an option; each script is started afresh by a monitoring daemon, and expected to return some data, then exit.
Well, sorry in advance for hijacking this thread, but maybe you are using the wrong approach. If a monitoring daemon forks python, load your script, load snimpy, libsmi, MIB's, etc, you will most probably not scale, and caching MIBs will only solve a part of the problem.
IMHO, a better approach would be to create a web-service that loads snimpy and all the relevant stuff in its init phase, and wait for your daemon to make requests using REST, and translating them against your SNMP devices.
This is what I did in Agent-Jones. See https://github.com/cbueche/Agent-Jones
Yes, that's another option, if this doesn't pan out I'll bear it in mind. I'll see how far the caching gets me, and benchmark to see what's going on is.
I'm using icinga which (like nagios) makes running executables as checks easy, IIRC adding a new check mechanism (e.g. a fork-less, API-call based check) is possible, but a whole new kettle of fish for me. A half-way-house alternative of check-executable-which-calls-api is another option.
I propose that:
Great, thanks! I'll update when I get some time to investigate further
For some scripts, MIB loading can take some time. It would be great to be able to enable a cache for MIBs. This would also enable the possibility to run scripts without the corresponding MIB.