FR: walk OIDs in parallel

Cougar commented 6 years ago

Hi,

I have some boxes with a very big number of interfaces and collecting all the data using table OID takes minutes. I can split table to separate entries in 'walk:' list but actual walking still happens one by one in sequence and doesn't make it any faster.

Would be great if (some of) walks could be done in parallel.

I suggest to have a configuration option for number of parallel walks that defaults to 1 (current behavior).

brian-brazil commented 6 years ago

Is your device capable of handling multiple walks at once? Most network devices are CPU constrained and unable to do so. You should also check that the snmp exporter is close to the device network wise.

SuperQ commented 6 years ago

If you have a large number of walks, you could run different Prometheus jobs for different walk modules. This will be executed in parallel.

In addition to what Brian said (SNMP is very latency sensitive, run the exporter with 10ms of the devies if possible), if you're using SNMP v1, it will be much slower than v2/v3.

Cougar commented 6 years ago

There are different platforms and some of them can answer in parallel already. For instance one of them has internally 6 SNMP agents running at the same time. Splitting jobs is possible but takes lot of additional walks for index lookups. I also checked that it shouldn't be a big change in the code. If nobody take it, I'll probably make a pull request for that in some time later anyway ;-)

RichiH commented 3 years ago

Copying comment over from https://docs.google.com/document/d/1McJJIiJfHgoecVrVNXx4ABJmI5M21e-6O9IgMNbVnvw

Tarko Tikan 2:32 PM Feb 28 See http://p.ip.fi/g0Ss.txt (this is "old" CPU ISAM, newer generation hardware scales better but do not have box with large enough tables at hand right now). Each bulkwalk returns 720 OIDs.

smiller-dn commented 1 year ago

FWIW, while support for gNMI is maybe not super common, if you can use that instead of SNMP, that might work better overall -- it's a subscription model, so the router/switch/whatever can say "I know you want all this data and I will shove it across this TCP connection as fast as the TCP window and the speed of light in glass will let me." 😄 Which ought to involve many fewer round trips than we see with SNMP.

I believe that at least some Cisco and Juniper gear supports it, and I think there's Arista support as well (but I'm less sure, as I don't have any Aristas).

The code is in openconfig/gnmic and there are docs here as well.

SuperQ commented 1 year ago

We now support scraping multiple modules in parallel. So if you can break down the walks you want done in parallel into multiple modules, this is now supported.

Cougar commented 1 year ago

Do multiple modules share data too? Can we use this data for label lookups or is it necessary to walk those OIDs in every module?

In my case with hundreds of interfaces which have fixed ifIndex for ifName I saved time with very straightforward patch where I can put known values to the config file and avoid SNMP walk to these OIDs. This was much easier change than parallel collection and was helpful enough in my case right now. I described this feature in #825

SuperQ commented 1 year ago

No, when scraping multiple modules they currently do not share data.

Static values are an interesting idea. Although I think a better long-term solution would be to implement some kind of SNMP cache. This has been requested before as a way to help with HA Prometheus configurations.

See: https://github.com/prometheus/snmp_exporter/issues/744

prometheus / snmp_exporter

FR: walk OIDs in parallel #349