jenningsloy318 / redfish_exporter

exporter to get metrics from redfish based hardware such as lenovo/dell/superc servers
Apache License 2.0
70 stars 62 forks source link

cannot unmarshal number 0.9500012 into Go struct field Power.PowerSupplies #13

Closed matejzero closed 4 years ago

matejzero commented 4 years ago

When querying Dell server with iDRAC, the exporter exits with a panic:

INFO[0012] Errors Getting powerinf from chassis : json: cannot unmarshal number 0.920000016689301 into Go struct field Power.PowerSupplies of type int  source="chassis_collector.go:246"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xe0 pc=0x86ca57]

goroutine 68 [running]:
github.com/jenningsloy318/redfish_exporter/collector.(*ChassisCollector).Collect(0xc000144de0, 0xc0002b4ea0)
    /Users/matejz/GIT/github/redfish_exporter/collector/chassis_collector.go:227 +0x897
github.com/jenningsloy318/redfish_exporter/collector.(*RedfishCollector).Collect.func1(0xc0000d5ef0, 0xc0002b4ea0, 0xa8a0c0, 0xc000144de0)
    /Users/matejz/GIT/github/redfish_exporter/collector/redfish_collector.go:91 +0x6b
created by github.com/jenningsloy318/redfish_exporter/collector.(*RedfishCollector).Collect
    /Users/matejz/GIT/github/redfish_exporter/collector/redfish_collector.go:89 +0x1e8

The problem is on Dell's side as it reports EfficiencyPercent as a ratio instead of percentage. According to Redfish docs, RF should return % with a number between 0 and 100. But it could still be a float.

I opened a ticket on gofish. I also wrote to Dell support and reported an issue.

Just opened this ticket here so it's known in case anyone else comes at the same problem.

matejzero commented 4 years ago

The unmarshal problem was fixed in upstream by https://github.com/stmcginnis/gofish/commit/686ec78d313a65621d1369585a2ec47bfa5b02f0.

But the panic still happens. Should this be a separate issue as it seems to be connected with temperatures: https://github.com/jenningsloy318/redfish_exporter/blob/master/collector/chassis_collector.go#L227. But I don't know enough to debug this.

jenningsloy318 commented 4 years ago

Hi @matejzero I updated the dependencies, as this is the newest fixed from upstream so there is a little bit delay.

you can also update the dependencies manually and then re-compile it.

and if works, can you share the hw modele and fm version, I will add the the tested hw list

matejzero commented 4 years ago

I forgot to say, I did update the dependencies and the unmarshal error disappeared, but the exporter still fails with fatal error: panic: runtime error: invalid memory address or nil pointer dereference

jenningsloy318 commented 4 years ago

You can try it again,yesterday I did this but no error occurs

matejzero commented 4 years ago

It works on Lenovo servers, but not on Dell.

INFO[0010] Scraping target server.example.org       source="main.go:41"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xe8 pc=0x84cb19]

goroutine 51 [running]:
github.com/jenningsloy318/redfish_exporter/collector.(*ChassisCollector).Collect(0xc000116f80, 0xc00012e9c0)
    /Users/matejz/Documents/GIT/github/redfish_exporter/collector/chassis_collector.go:235 +0x9d9
github.com/jenningsloy318/redfish_exporter/collector.(*RedfishCollector).Collect.func1(0xc000023040, 0xc00012e9c0, 0xa4e3e0, 0xc000116f80)
    /Users/matejz/Documents/GIT/github/redfish_exporter/collector/redfish_collector.go:91 +0x67
created by github.com/jenningsloy318/redfish_exporter/collector.(*RedfishCollector).Collect
    /Users/matejz/Documents/GIT/github/redfish_exporter/collector/redfish_collector.go:89 +0x1b8

I don't know where to look for the problem. Is there something missing from redfish response?

stmcginnis commented 4 years ago

I believe the issue is here:

https://github.com/jenningsloy318/redfish_exporter/blob/295621a3a564e24b45e061574e4ab273752f3afd/collector/chassis_collector.go#L253

There needs to be a check that the Power returned is not nil.

In Redfish, some linked objects are optional. The Chassis.Power is one of them. So when getting the linked object from chassis.Power(), it may not error, but it also may not actually have a Power object to return. You can see the logic of how this is handled here:

https://github.com/stmcginnis/gofish/blob/686ec78d313a65621d1369585a2ec47bfa5b02f0/redfish/chassis.go#L361

This is a common pattern throughout gofish, but it does make me wonder if it should be handled a little more intuitively. The though was the consuming code should always do (if err != nil || power == nil), but that gets a little verbose.

The fix here for now is to add that nil check on the result, but I am interested in thoughts on whether gofish should actually return an error if there actually is no linked object to return. Not just in this case, but throughout the lib where there are similar calls.

matejzero commented 4 years ago

The problem is also this line: https://github.com/jenningsloy318/redfish_exporter/blob/295621a3a564e24b45e061574e4ab273752f3afd/collector/chassis_collector.go#L231

I fixed both lines to err != nil || chassisPowerInfo == nil and err != nil || chassisThermal == nil. It fixes the original problem, but new error arrised:

An error has occurred while serving metrics:

[from Gatherer #2] collected metric "redfish_manager_state" { label:<name:"manager_id" value:"iDRAC.Embedded.1" > label:<name:"model" value:"15G Monolithic" > label:<name:"name" value:"Manager" > label:<name:"type" value:"BMC" > gauge:<value:1 > } was collected before with the same name and label values

I guess it's not that simple to create "generic redfish exporter":)

Regarding your last question, I have no opinion as I'm not a dev and I have no idea if this is usually covered in a library or application.

jenningsloy318 commented 4 years ago

@stmcginnis thanks for pointing out this, I will check the code and improve the logic again

@matejzero can you share some sample output, you can use postman or curl to get the corresponding output the all related objects? then I can modify the code to refect the dell server spec. and for the last error, I found that there is a type error, I fixed it and commited the change, you can compile it again later.

matejzero commented 4 years ago

Outputs are attached.

Query URIs were:

power.log thermal.log

jenningsloy318 commented 4 years ago

Hi @matejzero Did you have tried to compile and test it again? if the problems still happens ?

matejzero commented 4 years ago

I am sorry, it looks like I was testing my own fork and not your fix. I checked out latest master and it works now with Dell servers as well.

I get

INFO[0009] Errors Getting Thermal from chassis : %!s(<nil>)  source="chassis_collector.go:233"
INFO[0009] Errors Getting powerinf from chassis : %!s(<nil>)  source="chassis_collector.go:255"

in the console, but metrics are still returned.

I will compare them with Lenovo servers to see if anything useful is missing.

jenningsloy318 commented 4 years ago

From the output, it seems that Thermal and powerinfo metrics are not collected, can you confirm that ?

matejzero commented 4 years ago

I have no thermal info.

As for powerinfo, this is all I get regarding power (supply):

# HELP redfish_chassis_power_average_consumed_watts power wattage watts number of chassis component
# TYPE redfish_chassis_power_average_consumed_watts gauge
redfish_chassis_power_average_consumed_watts{chassis_id="System.Embedded.1",power_votage="System Power Control",power_votage_id="0",resource="power_wattage"} 107
# HELP redfish_chassis_power_powersupply_health powersupply health of chassis component,1(OK),2(Warning),3(Critical)
# TYPE redfish_chassis_power_powersupply_health gauge
redfish_chassis_power_powersupply_health{chassis_id="System.Embedded.1",power_supply="PS1 Status",power_supply_id="0",resource="power_supply"} 1
redfish_chassis_power_powersupply_health{chassis_id="System.Embedded.1",power_supply="PS2 Status",power_supply_id="1",resource="power_supply"} 1
# HELP redfish_chassis_power_powersupply_last_power_output_watts last_power_output_watts of powersupply on this chassis
# TYPE redfish_chassis_power_powersupply_last_power_output_watts gauge
redfish_chassis_power_powersupply_last_power_output_watts{chassis_id="System.Embedded.1",power_supply="PS1 Status",power_supply_id="0",resource="power_supply"} 0
redfish_chassis_power_powersupply_last_power_output_watts{chassis_id="System.Embedded.1",power_supply="PS2 Status",power_supply_id="1",resource="power_supply"} 0
# HELP redfish_chassis_power_powersupply_power_capacity_watts power_capacity_watts of powersupply on this chassis
# TYPE redfish_chassis_power_powersupply_power_capacity_watts gauge
redfish_chassis_power_powersupply_power_capacity_watts{chassis_id="System.Embedded.1",power_supply="PS1 Status",power_supply_id="0",resource="power_supply"} 550
redfish_chassis_power_powersupply_power_capacity_watts{chassis_id="System.Embedded.1",power_supply="PS2 Status",power_supply_id="1",resource="power_supply"} 550
# HELP redfish_chassis_power_powersupply_state powersupply state of chassis component,1(Enabled),2(Disabled),3(StandbyOffinline),4(StandbySpare),5(InTest),6(Starting),7(Absent),8(UnavailableOffline),9(Deferring),10(Quiesced),11(Updating)
# TYPE redfish_chassis_power_powersupply_state gauge
redfish_chassis_power_powersupply_state{chassis_id="System.Embedded.1",power_supply="PS1 Status",power_supply_id="0",resource="power_supply"} 1
redfish_chassis_power_powersupply_state{chassis_id="System.Embedded.1",power_supply="PS2 Status",power_supply_id="1",resource="power_supply"} 1
# HELP redfish_chassis_power_voltage_state power voltage state of chassis component,1(Enabled),2(Disabled),3(StandbyOffinline),4(StandbySpare),5(InTest),6(Starting),7(Absent),8(UnavailableOffline),9(Deferring),10(Quiesced),11(Updating)
# TYPE redfish_chassis_power_voltage_state gauge
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 MEMABCD VDD",power_votage_id="CPU1MEMABCDVDD",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 MEMABCD VPP",power_votage_id="CPU1MEMABCDVPP",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 MEMABCD VTT",power_votage_id="CPU1MEMABCDVTT",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 MEMEFGH VDD PG",power_votage_id="CPU1MEMEFGHVDDPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 MEMEFGH VPP",power_votage_id="CPU1MEMEFGHVPP",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 MEMEFGH VTT",power_votage_id="CPU1MEMEFGHVTT",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 P0 MAIN PG",power_votage_id="CPU1P0MAINPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 SOC0 MAIN PG",power_votage_id="CPU1SOC0MAINPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 SOC0 SW PG",power_votage_id="CPU1SOC0SWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="CPU1 VRD 1P2 HUB PG",power_votage_id="CPU1VRD1P2HUBPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="PS1 Voltage 1",power_votage_id="PS1Voltage",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="PS2 Voltage 2",power_votage_id="PS2Voltage",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board 1.8V MAIN PG",power_votage_id="SystemBoard1.8VMAINPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board 1.8V SW PG",power_votage_id="SystemBoard1.8VSWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board 2.5V SW PG",power_votage_id="SystemBoard2.5VSWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board 3.3V A PG",power_votage_id="SystemBoard3.3VAPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board 5V PG",power_votage_id="SystemBoard5VPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board 5V SW PG",power_votage_id="SystemBoard5VSWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board BP1 PG",power_votage_id="SystemBoardBP1PG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board DIMM PG",power_votage_id="SystemBoardDIMMPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board LOM PG",power_votage_id="SystemBoardLOMPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board PS1 PG FAIL",power_votage_id="SystemBoardPS1PGFAIL",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board PS2 PG FAIL",power_votage_id="SystemBoardPS2PGFAIL",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board VSB11 SW PG",power_votage_id="SystemBoardVSB11SWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_state{chassis_id="System.Embedded.1",power_votage="System Board VSBM SW PG",power_votage_id="SystemBoardVSBMSWPG",resource="power_voltage"} 1
# HELP redfish_chassis_power_voltage_volts power voltage volts number of chassis component
# TYPE redfish_chassis_power_voltage_volts gauge
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 MEMABCD VDD",power_votage_id="CPU1MEMABCDVDD",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 MEMABCD VPP",power_votage_id="CPU1MEMABCDVPP",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 MEMABCD VTT",power_votage_id="CPU1MEMABCDVTT",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 MEMEFGH VDD PG",power_votage_id="CPU1MEMEFGHVDDPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 MEMEFGH VPP",power_votage_id="CPU1MEMEFGHVPP",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 MEMEFGH VTT",power_votage_id="CPU1MEMEFGHVTT",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 P0 MAIN PG",power_votage_id="CPU1P0MAINPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 SOC0 MAIN PG",power_votage_id="CPU1SOC0MAINPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 SOC0 SW PG",power_votage_id="CPU1SOC0SWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="CPU1 VRD 1P2 HUB PG",power_votage_id="CPU1VRD1P2HUBPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="PS1 Voltage 1",power_votage_id="PS1Voltage",resource="power_voltage"} 230
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="PS2 Voltage 2",power_votage_id="PS2Voltage",resource="power_voltage"} 230
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board 1.8V MAIN PG",power_votage_id="SystemBoard1.8VMAINPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board 1.8V SW PG",power_votage_id="SystemBoard1.8VSWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board 2.5V SW PG",power_votage_id="SystemBoard2.5VSWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board 3.3V A PG",power_votage_id="SystemBoard3.3VAPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board 5V PG",power_votage_id="SystemBoard5VPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board 5V SW PG",power_votage_id="SystemBoard5VSWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board BP1 PG",power_votage_id="SystemBoardBP1PG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board DIMM PG",power_votage_id="SystemBoardDIMMPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board LOM PG",power_votage_id="SystemBoardLOMPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board PS1 PG FAIL",power_votage_id="SystemBoardPS1PGFAIL",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board PS2 PG FAIL",power_votage_id="SystemBoardPS2PGFAIL",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board VSB11 SW PG",power_votage_id="SystemBoardVSB11SWPG",resource="power_voltage"} 1
redfish_chassis_power_voltage_volts{chassis_id="System.Embedded.1",power_votage="System Board VSBM SW PG",power_votage_id="SystemBoardVSBMSWPG",resource="power_voltage"} 1
jenningsloy318 commented 4 years ago

since there is no updates, close this issue.

matejzero commented 4 years ago

OK, the last update was from me as you was asking if I get and thermal / powerinfo metrics. Was there something more expected from me?

The ticket with Dell is still opened. Yesterday I was in contact with support and they let me know the issue was passed to L3 software team to try and fix the issue. I'm waiting for feedback.

matejzero commented 4 years ago

Dell answered today that the fix will be in the next firmware update.