prometheus / snmp_exporter

SNMP Exporter for Prometheus
Apache License 2.0
1.66k stars 617 forks source link

Duplicate metrics with dual index table #632

Closed iH8c0ff33 closed 2 years ago

iH8c0ff33 commented 3 years ago

Host operating system: output of uname -a

Linux hostname 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

snmp_exporter version: output of snmp_exporter -version

Container image: prom/snmp-exporter:v0.20.0

What device/snmpwalk OID are you using?

I'm trying to walk the upsPhaseOutputPhaseTable in an APC UPS, the table OID is 1.3.6.1.4.1.318.1.1.1.9.3.3. The same problem occurs for upsPhaseInputPhaseTable, OID 1.3.6.1.4.1.318.1.1.1.9.2.3.

If this is a new device, please link to the MIB(s).

I'm using the latest PowerNet-MIB from apc.com (link).

Table and entry for reference (PhaseOutputPhaseTable has a similar structure):

   upsPhaseInputPhaseTable OBJECT-TYPE
       SYNTAX     SEQUENCE OF UpsPhaseInputPhaseEntry
       ACCESS     not-accessible
       STATUS     mandatory
       DESCRIPTION
               "A list of input table entries.  The number of entries
               is given by the sum of the upsPhaseNumInputPhases."
       ::= { upsPhaseInput 3 }

   upsPhaseInputPhaseEntry OBJECT-TYPE
       SYNTAX     UpsPhaseInputPhaseEntry
       ACCESS     not-accessible
       STATUS     mandatory
       DESCRIPTION
               "An entry containing information applicable to a
               particular input phase."
       INDEX { upsPhaseInputPhaseTableIndex, upsPhaseInputPhaseIndex }
       ::= { upsPhaseInputPhaseTable 1 }

What did you do that produced an error?

I generated the snmp.yml config using the generator and specifying to walk either upsPhaseOutputPhaseTable or upsPhaseInputPhaseTable. Then tried debugging making an HTTP request (/snmp?module=apcups&target=[ipaddress])

What did you expect to see?

I expected to get metrics of the tables.

What did you see instead?

I got an error saying that some metrics were duplicated.

HTTP response

An error has occurred while serving metrics:

44 error(s) occurred:
* collected metric "upsPhaseOutputPower" { label:<name:"upsPhaseOutputPhaseIndex" value:"1" > label:<name:"upsPhaseOutputPhaseTableIndex" value:"1" > gauge:<value:20000 > } was collected before with the same name and label values
* collected metric "upsPhaseOutputMaxPercentLoad" { label:<name:"upsPhaseOutputPhaseIndex" value:"1" > label:<name:"upsPhaseOutputPhaseTableIndex" value:"1" > gauge:<value:-1 > } was collected before with the same name and label values
* collected metric "upsPhaseOutputPercentPower" { label:<name:"upsPhaseOutputPhaseIndex" value:"1" > label:<name:"upsPhaseOutputPhaseTableIndex" value:"1" > gauge:<value:-1 > } was collected before with the same name and label values
[ ... ]

snmp_exporter debug log (here only PhaseOutputPhaseTable is enabled, among with other oids which works correctly):

level=debug ts=2021-03-22T10:50:09.486Z caller=collector.go:140 module=apcups target=[ipaddress] msg="Get of OIDs completed" oids=21 duration_seconds=285.269294ms
level=debug ts=2021-03-22T10:50:09.486Z caller=collector.go:154 module=apcups target=[ipaddress] msg="OID not supported by target" oids=.1.3.6.1.4.1.318.1.1.1.4.3.1.0
level=debug ts=2021-03-22T10:50:09.486Z caller=collector.go:164 module=apcups target=[ipaddress] msg="Walking subtree" oid=1.3.6.1.4.1.318.1.1.1.3.2.10
level=debug ts=2021-03-22T10:50:09.666Z caller=collector.go:177 module=apcups target=[ipaddress] msg="Walk of subtree completed" oid=1.3.6.1.4.1.318.1.1.1.3.2.10 duration_seconds=179.243148ms
level=debug ts=2021-03-22T10:50:09.666Z caller=collector.go:164 module=apcups target=[ipaddress] msg="Walking subtree" oid=1.3.6.1.4.1.318.1.1.1.4.2.10
level=debug ts=2021-03-22T10:50:09.845Z caller=collector.go:177 module=apcups target=[ipaddress] msg="Walk of subtree completed" oid=1.3.6.1.4.1.318.1.1.1.4.2.10 duration_seconds=179.424668ms
level=debug ts=2021-03-22T10:50:09.845Z caller=collector.go:164 module=apcups target=[ipaddress] msg="Walking subtree" oid=1.3.6.1.4.1.318.1.1.1.9.2.2
level=debug ts=2021-03-22T10:50:10.129Z caller=collector.go:177 module=apcups target=[ipaddress] msg="Walk of subtree completed" oid=1.3.6.1.4.1.318.1.1.1.9.2.2 duration_seconds=283.587645ms
level=debug ts=2021-03-22T10:50:10.129Z caller=collector.go:164 module=apcups target=[ipaddress] msg="Walking subtree" oid=1.3.6.1.4.1.318.1.1.1.9.3.2
level=debug ts=2021-03-22T10:50:10.403Z caller=collector.go:177 module=apcups target=[ipaddress] msg="Walk of subtree completed" oid=1.3.6.1.4.1.318.1.1.1.9.3.2 duration_seconds=273.865598ms
level=debug ts=2021-03-22T10:50:10.403Z caller=collector.go:164 module=apcups target=[ipaddress] msg="Walking subtree" oid=1.3.6.1.4.1.318.1.1.1.9.3.3
level=debug ts=2021-03-22T10:50:11.360Z caller=collector.go:177 module=apcups target=[ipaddress] msg="Walk of subtree completed" oid=1.3.6.1.4.1.318.1.1.1.9.3.3 duration_seconds=956.691568ms
level=debug ts=2021-03-22T10:50:11.364Z caller=main.go:113 module=apcups target=[ipaddress] msg="Finished scrape" duration_seconds=2.162809637
RichiH commented 3 years ago

Thanks for the report; could you attach a snmpwalk as well, please?

iH8c0ff33 commented 3 years ago

@RichiH do you need the whole walk (I think I'd need to use a gist in that case)?

Here's the walk for upsPhaseOutputPhaseTable:

PowerNet-MIB::upsPhaseOutputPhaseTableIndex.1.1.1 = INTEGER: 1
PowerNet-MIB::upsPhaseOutputPhaseTableIndex.1.1.2 = INTEGER: 1
PowerNet-MIB::upsPhaseOutputPhaseTableIndex.1.1.3 = INTEGER: 1
PowerNet-MIB::upsPhaseOutputPhaseIndex.1.1.1 = INTEGER: 1
PowerNet-MIB::upsPhaseOutputPhaseIndex.1.1.2 = INTEGER: 2
PowerNet-MIB::upsPhaseOutputPhaseIndex.1.1.3 = INTEGER: 3
PowerNet-MIB::upsPhaseOutputVoltage.1.1.1 = INTEGER: 401
PowerNet-MIB::upsPhaseOutputVoltage.1.1.2 = INTEGER: 400
PowerNet-MIB::upsPhaseOutputVoltage.1.1.3 = INTEGER: 399
PowerNet-MIB::upsPhaseOutputCurrent.1.1.1 = INTEGER: 970
PowerNet-MIB::upsPhaseOutputCurrent.1.1.2 = INTEGER: 820
PowerNet-MIB::upsPhaseOutputCurrent.1.1.3 = INTEGER: 890
PowerNet-MIB::upsPhaseOutputMaxCurrent.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxCurrent.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxCurrent.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinCurrent.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinCurrent.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinCurrent.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputLoad.1.1.1 = INTEGER: 22000
PowerNet-MIB::upsPhaseOutputLoad.1.1.2 = INTEGER: 18000
PowerNet-MIB::upsPhaseOutputLoad.1.1.3 = INTEGER: 20000
PowerNet-MIB::upsPhaseOutputMaxLoad.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxLoad.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxLoad.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinLoad.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinLoad.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinLoad.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputPercentLoad.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputPercentLoad.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputPercentLoad.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxPercentLoad.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxPercentLoad.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxPercentLoad.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPercentLoad.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPercentLoad.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPercentLoad.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputPower.1.1.1 = INTEGER: 22000
PowerNet-MIB::upsPhaseOutputPower.1.1.2 = INTEGER: 19000
PowerNet-MIB::upsPhaseOutputPower.1.1.3 = INTEGER: 20000
PowerNet-MIB::upsPhaseOutputMaxPower.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxPower.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxPower.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPower.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPower.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPower.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputPercentPower.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputPercentPower.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputPercentPower.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxPercentPower.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxPercentPower.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMaxPercentPower.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPercentPower.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPercentPower.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputMinPercentPower.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputPowerFactor.1.1.1 = INTEGER: 98
PowerNet-MIB::upsPhaseOutputPowerFactor.1.1.2 = INTEGER: 100
PowerNet-MIB::upsPhaseOutputPowerFactor.1.1.3 = INTEGER: 97
PowerNet-MIB::upsPhaseOutputApparentPower.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputApparentPower.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputApparentPower.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputInverterVoltage.1.1.1 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputInverterVoltage.1.1.2 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputInverterVoltage.1.1.3 = INTEGER: -1
PowerNet-MIB::upsPhaseOutputVoltagePN.1.1.1 = INTEGER: 231
PowerNet-MIB::upsPhaseOutputVoltagePN.1.1.2 = INTEGER: 231
PowerNet-MIB::upsPhaseOutputVoltagePN.1.1.3 = INTEGER: 231
jcollie commented 3 years ago

I'm seeing the same problem with the upsHighPrecBatteryPackTable on a SRT2200RMXLA-NC

confusedpc commented 2 years ago

I also have this issue SKU: SRT5KRMXLT which effects the following metrics:

upsHighPrecBatteryCartridgeIndex upsHighPrecBatteryPackCartridgeHealth upsHighPrecBatteryPackCartridgeInstallDate upsHighPrecBatteryPackCartridgeReplaceDate upsHighPrecBatteryPackCartridgeStatus upsHighPrecBatteryPackFirmwareRevision upsHighPrecBatteryPackIndex upsHighPrecBatteryPackSerialNumber upsHighPrecBatteryPackStatus upsHighPrecBatteryPackTemperature

For example, snmpwalk .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.1.1.1 = INTEGER: 240 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.1.1.2 = INTEGER: 240 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.2.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.2.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.3.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.3.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.4.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.4.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.5.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.5.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.6.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.6.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.7.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.7.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.8.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.8.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.9.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.9.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.10.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.10.1.2 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.11.1.1 = INTEGER: 0 .1.3.6.1.4.1.318.1.1.1.2.3.10.2.1.5.11.1.2 = INTEGER: 0

or with the oid name substituted , showing the first two battery packs only, there is only one pack is present as observed by 0's for packs 2 through 11. upsHighPrecBatteryPackTemperature.1.1.1 = INTEGER: 240 upsHighPrecBatteryPackTemperature.1.1.2 = INTEGER: 240 upsHighPrecBatteryPackTemperature.2.1.1 = INTEGER: 0 upsHighPrecBatteryPackTemperature.2.1.2 = INTEGER: 0

If you visualize the numbers after the oid name as if they were a table it appears to me that: The first "column" is "upsHighPrecBatteryPackIndex" , which appears to increment correctly in the exporter output. The second column is seen as the second index "upsHighPrecBatteryCartridgeIndex", but the value returned in the snmpwalk never iterates. The third column is the real "upsHighPrecBatteryCartridgeIndex"

exporter output, sorted and reduced to this specific oid, in an attempt at some brevity:

I can resolve my issue by doing performing the following hand edits the snmp.yml file, ( yes I hear everyone screaming, "don't do that" )

Replace:

indexes:
  - labelname: upsHighPrecBatteryPackIndex
    type: gauge
  - labelname: upsHighPrecBatteryCartridgeIndex
    type: gauge

With:

indexes:
  - labelname: upsHighPrecBatteryPackIndex
    type: gauge
  - labelname: upsHighPrecBatteryDrop
    type: gauge
  - labelname: upsHighPrecBatteryCartridgeIndex
    type: gauge

And then, you can either keep the "upsHighPrecBatteryDrop" label which absorbs the non-iterating value .. or you can do what it suggests and drop the label your prometheus.yml.

metric_relabel_configs:
  - regex: upsHighPrecBatteryDrop
    action: labeldrop

I'm not sure who to blame. Is snmp_exporter not picking up the indexes correctly? Or is the UPS returning the oid with the last two elements reversed? Based on the fix I applied, it appears that snmp_exporter looks for indexes from left to right, consecutively across the oid.

Is that the correct way? I don't know. Someone with more knowledge of indexes in snmp responses needs to chime in perhaps.

SuperQ commented 2 years ago

Yes, it seems like the PowerNet-MIB does not match the device. The device is returning 3 indexes, but the MIB only has two.

iH8c0ff33 commented 2 years ago

Yes, it seems like the PowerNet-MIB does not match the device. The device is returning 3 indexes, but the MIB only has two.

Yeah, I've solved the issue by manually editing the MIB. Not very conveniente, but it works... Anyway, I think the issue is due to PowerNet-MIB and not snmp_exporter, so I think this issue should be closed.