prometheus / snmp_exporter

SNMP Exporter for Prometheus
Apache License 2.0
1.65k stars 614 forks source link

Allow configuring SNMP version and credentials through parameters #762

Open daenney opened 2 years ago

daenney commented 2 years ago

With the current design, the SNMP version to use, the community string and auth credentials (for v3) are part of the module that the generator creates. If my understanding of how this works is correct, it implies that if you want to scrape 2 devices but they have a different SNMP community (or one that has been reconfigured away from the 'public' default), you end up needing to add a new module.

This results in a ton of duplication in snmp.yml, where the only difference is the version field or some keys of the auth dictionary. For example a couple of my D-Link switches are perfectly happy to be queried using if_mib, but only support SNMPv1.

This would add a bunch of parameters on https://github.com/prometheus/snmp_exporter/blob/8678b6022c439074555f958fdaa54517c10ee711/main.go#L72-L87 like version, auth_community etc. if there's a need to override the default for a target.

I'm wondering if:

SuperQ commented 2 years ago

I think we already have a proposal for splitting the auth and walk/metric mapping in the config. This would allow for less redundancy in the configuration.

We do not want to support passing community/auth params directly via the URL as this is extremely insecure as Prometheus does not protect scrape URLs as secrets.

daenney commented 2 years ago

Ah, I think you're referring to #619?

SuperQ commented 2 years ago

Yes, that's the one.

daenney commented 2 years ago

That looks to have stalled a bit from what I can gather.

I can completely understand not wanting to expose the SNMPv3 credentials through a URL, but given the version isn't secret and the community string is essentially public, would a PR to add those in the mean time be acceptable?

RichiH commented 2 years ago

I disagree that "the community string is essentially public".

Another reason why this approach is cited as an anti-pattern in other places is that if you control the content sent to targets via URL, not via config access, you can exploit remote vulnerabilities; made even worse by that fact that exporters are usually in privileged networks / allowlisted in ACLs/firewalls, etc. Access to said exporters is commonly allowed from less privileged networks.

Would you be willing to work on a PR implementing https://github.com/prometheus/snmp_exporter/issues/619 to pass auth & change auth that way?

daenney commented 2 years ago

So looking at #619 I'm a bit fuzzy on what would the solution would end up looking like.

The Google doc states the issue about auth being stored in snmp.yml, but no accepted solutions. I'm guessing some discussion happened somewhere, based on your comment on the 15th of March last year, but it's still not entirely clear what we'd actually want. From #85 I got a distinct impression of a lack of desire from maintainers to support multiple configuration files/loading, but it seems that's what #619 is currently proposing? And what's meant by "directory support"?

If someone could spec out an example of what a "new style" configuration would look like and how we envision overriding auth for a module would work, I might get a better sense of we're building towards.

RichiH commented 2 years ago

I know we documented it somewhere, but I am unable to find it off-hand. Re-typing from memory:

Old:

if_mib:
  version: 2
  auth:
    community: SomeCommunityString
  walk:
  - 1.3.6.1.2.1.2
  - 1.3.6.1.2.1.31.1.1
  get:
  - 1.3.6.1.2.1.1.3.0
  metrics:
  - name: sysUpTime

New, after the PR:

auth:
  foobar:
    version: 2
    community: SomeCommunityString
if_mib: 
  walk:
  - 1.3.6.1.2.1.2
  - 1.3.6.1.2.1.31.1.1
  get:
  - 1.3.6.1.2.1.1.3.0
  metrics:
  - name: sysUpTime

As you can see this poses an interesting problem as auth and all modules are on the same level now. So a format which should be supported as well, and would become the newly recommended way with the next release, would be:

auth:
  foobar:
    version: 2
    community: SomeCommunityString
module:
  if_mib: 
    walk:
    - 1.3.6.1.2.1.2
    - 1.3.6.1.2.1.31.1.1
    get:
    - 1.3.6.1.2.1.1.3.0
    metrics:
    - name: sysUpTime
daenney commented 2 years ago

That definitely helps, thanks!

One thing I'm still wondering about, how do we make this reusable? Imagine a scenario where for whatever reason there's 2 sets of devices, both queryable using if_mib but one needs SNMPv1 and the other SNMPv2 (this happens to be my reality because some D-Link devices are terrible).

Based on the new proposed format, though we can lift the auth out from the OID configuration, it would seem we can still only have one, globally that needs to apply to all devices of a single snmp_exporter instance? Right now I can solve this by defining an additional module which is if_mib but with the version overriden, avoiding the need to run multiple SNMP exporters. How does that work in the new case, or is it not intended to support that?

RichiH commented 2 years ago

You would use http://localhost:9116/snmp?module=if_mib&target=1.2.3.4&auth=foo, and http://localhost:9116/snmp?module=if_mib&target=2.3.4.5&auth=bar.

daenney commented 2 years ago

Right, yes. I have no idea why this didn't click for me earlier. This makes sense. I might take a stab at a PR for this.

On the format, you mentioned initially still supporting the format where the modules were at the root, i.e not children of the modules key. Though I can see some benefit to it, it is a one-line change in the configuration, whereas having the code to support both formats would be more work and we'd potentially need to define what to do if we have a clash (such as if_mib at the root but also modules.if_mib). I'm wondering if it's really necessary to be that cautious, or if we can swap to the new format immediately and forego supporting both styles. The release notes of 0.20.0 already mention that future releases will have breaking configuration format changes, it seems we should take advantage of that?

SuperQ commented 2 years ago

We would make a new version that did not support the old style, only the new style. Users would be required to convert their configs.

RichiH commented 2 years ago

I am fine either way, requiring a change seems long term cleanest.

The generator would also need to support the new format.

Sent by mobile; please excuse my brevity.

On Thu, Jun 2, 2022, 07:54 Ben Kochie @.***> wrote:

We would make a new version that did not support the old style, only the new style. Users would be required to convert their configs.

— Reply to this email directly, view it on GitHub https://github.com/prometheus/snmp_exporter/issues/762#issuecomment-1144465619, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFYII4Y27MNGFD47OKEZPDVNBECRANCNFSM5XK54TIA . You are receiving this because you commented.Message ID: @.***>

daenney commented 2 years ago

We seem to have the same problem with WalkParams in general. If we only extract Auth, then we're still keeping timeout, retries etc. inside a module. It feels like that too ought to be configurable by target as some devices can respond much slower than others etc. I don't think they should be part of the auth, since many devices might share the same credential set but need a different timeout value configured for the walk.

That seems to leave us with the need for another top-level key. I can't really come up with a descriptive name beyond (walk)parameters so suggestions are very welcome or if someone has an idea about a better approach for it.

The final format would then be:

auth:
  foo: {}
parmeters:
  foo: {}
modules:
  if_mib: {}

A combination of module, auth configuration and walk parameters can then be selected through URL args.

daenney commented 2 years ago

I'm also wondering if we should stick version in auth or in parameters. The version dictates something about auth, but SNMPv3 also introduced other capabilities (even though we don't use them in this case, so from our point of view it's functionally v2 + fancier auth).

RichiH commented 2 years ago

Retries and timeout seem linked to the module, to me.

Even if it's the same hardware, some with full tables and some with route reflector clients, this is arguably a different model from the point of view of the operator and thus Prometheus.

daenney commented 2 years ago

I don't quite see how. From where I'm standing a module should define what to query, and potentially how to associate data with each other. Why would retries and timeouts be part of that?

daenney commented 2 years ago

Thinking more about this, it feels like retry and timeout should be query parameters on the target, instead of a named parameter group. So you'd end up with: http://localhost:9116/snmp?module=if_mib&target=1.2.3.4&auth=foo&timeout=10&retry=5 instead of &params=annoying-noncompliant-devices.

One of the main reasons the current format is frustrating and results in a lot of duplication is that it confounds the what to query and the how to query. Without cleanly splitting that up we'll continue to have tons of duplication in the modules. You could keep a default for timeout and retry in the module I suppose, but you need some kind of way to override it.

nelsonjchen commented 2 years ago

Another reason why this approach is cited as an anti-pattern in other places is that if you control the content sent to targets via URL, not via config access, you can exploit remote vulnerabilities; made even worse by that fact that exporters are usually in privileged networks / allowlisted in ACLs/firewalls, etc. Access to said exporters is commonly allowed from less privileged networks.

Weird thought, but what if URLs or certain data could be pre-signed with encrypted credentials and the authentication data in them is also encrypted with those credentials? The authentication content sent to the targets could then not be modified or tampered with without access to some key that is there when the URL is generated.

Admittedly, I haven't really thought out or fleshed out this idea but it's like pre-signed AWS API, S3, and so on URLs/requests in my head.

node17 commented 1 year ago

With the current design, the SNMP version to use, the community string and auth credentials (for v3) are part of the module that the generator creates. If my understanding of how this works is correct, it implies that if you want to scrape 2 devices but they have a different SNMP community (or one that has been reconfigured away from the 'public' default), you end up needing to add a new module.

This results in a ton of duplication in snmp.yml, where the only difference is the version field or some keys of the auth dictionary. For example a couple of my D-Link switches are perfectly happy to be queried using if_mib, but only support SNMPv1.

This would add a bunch of parameters on

https://github.com/prometheus/snmp_exporter/blob/8678b6022c439074555f958fdaa54517c10ee711/main.go#L72-L87 like version, auth_community etc. if there's a need to override the default for a target.

I'm wondering if:

* If there's a better way to go about this (i.e is there some way to do this without needing to go through separate modules)?

* This is a change that the maintainers would be willing to accept?

how to add new module ? and how it is works?

JDA88 commented 1 year ago

Thinking more about this, it feels like retry and timeout should be query parameters on the target, instead of a named parameter group. So you'd end up with: http://localhost:9116/snmp?module=if_mib&target=1.2.3.4&auth=foo&timeout=10&retry=5 instead of &params=annoying-noncompliant-devices.

Having timeout and retry as parameters would save us so many duplicate! If it's possible please introduce this change :)

RichiH commented 1 year ago

Both timeout and retry might be used for DoS attacks so they shouldn't be part of the URL; but they could be split out into e.g. the auth module, or a new optional connection module.

SuperQ commented 1 year ago

We've now implemented splitting modules and auths in https://github.com/prometheus/snmp_exporter/pull/859