vincentbernat / snimpy

interactive SNMP tool with Python
http://snimpy.readthedocs.org/
185 stars 44 forks source link

MIB caching #2

Open vincentbernat opened 13 years ago

vincentbernat commented 13 years ago

For some scripts, MIB loading can take some time. It would be great to be able to enable a cache for MIBs. This would also enable the possibility to run scripts without the corresponding MIB.

leth commented 9 years ago

MIB caching would be very useful, I've written some snmpy-based check commands for nagios/icinga, but I have a hunch that they're spending all their CPU time just parsing MIB files.

leth commented 9 years ago

How do you feel about using an sqlite db in /tmp/ to as a cache, storing pickled objects?

If that doesn't sound good, do you have any other suggestions?

leth commented 9 years ago

Okay scratch that, you can't pickle CData objects. Any ideas?

vincentbernat commented 9 years ago

I don't see any other option than implementing a serializer/deserializer. That shouldn't be too hard but I don't find time to do it. As for storage, I was thinking putting the cache right into pyc files. But we could just give a cache file to the manager instead of trying something too complex.

leth commented 9 years ago

I hacked up a proof-of-concept using a giant picklefile as a datastore: https://gist.github.com/leth/048e27a6569c801006b4

Things that need improving still:

vincentbernat commented 9 years ago

That seems pretty interesting. Feel free to turn that into a PR (with an option to pass to the manager so that caching is optional, at least first).

leth commented 9 years ago

So, architecturally, how should this look? Currently I have a new Manager subclass and a new load function living in snimpy.cache, so you have to choose to use a manager with caching functionality.

It sounds like you're suggesting changing the existing manager class to pass in a cache store object, or control flag.

Which approach would you prefer?

The changes to the load function (to delay loading the mib until a cache miss) make the latter approach a bit more tricky.

vincentbernat commented 9 years ago

You are right. As MIB loading is totally independent of the manager, this seems tricky to ask the manager to use a cache or not.

After reading more carefully, I think there are some other drawbacks to cache at this level. If for example, you load A-MIB from the cache, then load B-MIB not from the cache and that B-MIB requires A-MIB, you won't be able to load it. This could be solved with an additional argument to load to avoid to use the cache. Also, you may end up with a module partially in cache. When trying to access something not in the cache, snimpy will try to iterate over loaded MIB to get the appropriate type but will fail because the MIB is not really loaded.

Maybe the bottleneck is not libsmi but snimpy is using it inefficiently. Do you have an example of script where loading is slow and using public MIB modules? We could check where most of the time is spent.

A totally different solution would be to use checkpointing. The process could freeze itself and resume instead of restarting from 0. Your scripts will be a loop with checkpointing. There are some solutions like CRIU but nothing that could be used transparently. This seems a bit far fetched.

leth commented 9 years ago

Thanks for taking a closer look!

After reading more carefully, I think there are some other drawbacks to cache at this level. If for example, you load A-MIB from the cache, then load B-MIB not from the cache and that B-MIB requires A-MIB, you won't be able to load it. This could be solved with an additional argument to load to avoid to use the cache.

Perhaps instead, if we have a cache miss, we have to go back and load the real mibs for those we had already loaded from the cache (A-MIB), and update our existing objects to point to them, before we load B-MIB. The weakref module will allow us to keep a reference to the objects.

Also, you may end up with a module partially in cache. When trying to access something not in the cache, snimpy will try to iterate over loaded MIB to get the appropriate type but will fail because the MIB is not really loaded.

Yes, we could either ensure the whole module is cached, or instead treat it like a cache miss above. I guess the former is probably best, because it gets all the cache building work done once.

Maybe the bottleneck is not libsmi but snimpy is using it inefficiently. Do you have an example of script where loading is slow and using public MIB modules? We could check where most of the time is spent.

I do need to do some benchmarking, I'll get back to you on this.

A totally different solution would be to use checkpointing. The process could freeze itself and resume instead of restarting from 0. Your scripts will be a loop with checkpointing. There are some solutions like CRIU but nothing that could be used transparently. This seems a bit far fetched.

I'm afraid that's not an option; each script is started afresh by a monitoring daemon, and expected to return some data, then exit.

cbueche commented 9 years ago

Well, sorry in advance for hijacking this thread, but maybe you are using the wrong approach. If a monitoring daemon forks python, load your script, load snimpy, libsmi, MIB's, etc, you will most probably not scale, and caching MIBs will only solve a part of the problem.

IMHO, a better approach would be to create a web-service that loads snimpy and all the relevant stuff in its init phase, and wait for your daemon to make requests using REST, and translating them against your SNMP devices.

This is what I did in Agent-Jones. See https://github.com/cbueche/Agent-Jones

leth commented 9 years ago

Yes, that's another option, if this doesn't pan out I'll bear it in mind. I'll see how far the caching gets me, and benchmark to see what's going on is.

I'm using icinga which (like nagios) makes running executables as checks easy, IIRC adding a new check mechanism (e.g. a fork-less, API-call based check) is possible, but a whole new kettle of fish for me. A half-way-house alternative of check-executable-which-calls-api is another option.

vincentbernat commented 9 years ago

I propose that:

  1. You provide an example of what you consider slow so that we can check if the bottleneck is libsmi parsing itself or just snimpy which may do things more efficiently.
  2. We improve snimpy, either solving the eventual bottleneck or implementing caching. In the later case, I would prefer if caching was contained in the mib module. The workaround you described for some cases seem good. Caching could be enabled/disabled with a function call in the mib module.
leth commented 9 years ago

Great, thanks! I'll update when I get some time to investigate further