jenningsloy318 / redfish_exporter

exporter to get metrics from redfish based hardware such as lenovo/dell/superc servers
Apache License 2.0
70 stars 62 forks source link

Exporter crashes on SuperMicro servers #15

Closed NosIreland closed 4 years ago

NosIreland commented 4 years ago

Hi, I am trying to use exporter on SuperMicro 1114S but it crashes with error below:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xe0 pc=0x848215]

goroutine 140 [running]:
github.com/jenningsloy318/redfish_exporter/collector.(*ChassisCollector).Collect(0xc0004ce280, 0xc000092060)
        /home/I336589/git/go/src/github.com/jenningsloy318/redfish_exporter/collector/chassis_collector.go:228 +0x895
github.com/jenningsloy318/redfish_exporter/collector.(*RedfishCollector).Collect.func1(0xc0001260d0, 0xc000092060, 0xa411a0, 0xc0004ce280)
        /home/I336589/git/go/src/github.com/jenningsloy318/redfish_exporter/collector/redfish_collector.go:90 +0x67
created by github.com/jenningsloy318/redfish_exporter/collector.(*RedfishCollector).Collect
        /home/I336589/git/go/src/github.com/jenningsloy318/redfish_exporter/collector/redfish_collector.go:88 +0x1b8

I have otherSuperMicros and it is working fine it just this model that is having this problem.

stmcginnis commented 4 years ago

Based on the stack trace, it looks like this server is returning (or not returning) something in the Chassis values that is hitting an issue when it trying to parse it.

Can you hit this in a browser or via curl to get the raw JSON response that you could paste here?

To get the list of available Chassis, go to:

https://server/redfish/v1/Chassis

Within the JSON results from that, you should see a section called Members that is an array of @odata.id links. I'm not sure on the default format for Supermicro, but you should see a link there to at least one chassis that would represent the server's chassis. On a Dell PowerEdge, that would lead to something like:

https://server/redfish/v1/Chassis/System.Embedded.1

It's a relative link, so copy out the @odata.id from the initial result and use that to replace the path in the URL to that server. That should get you to the actual full Chassis details that is causing the issue. If you can paste that full JSON here, or link to it in some sort of pastebin service, we might be able to see what's causing the issue.

NosIreland commented 4 years ago

Here is the output of /redfish/v1/Chassis/ the one that work { "@odata.context": "/redfish/v1/$metadata#ChassisCollection.ChassisCollection", "@odata.type": "#ChassisCollection.ChassisCollection", "@odata.id": "/redfish/v1/Chassis", "Name": "Chassis Collection", "Members": [ { "@odata.id": "/redfish/v1/Chassis/1" } ], "Members@odata.count": 1 }

Here is the one that does not: { "@odata.type": "#ChassisCollection.ChassisCollection", "@odata.id": "/redfish/v1/Chassis", "Name": "Chassis Collection", "Members": [ { "@odata.id": "/redfish/v1/Chassis/1" }, { "@odata.id": "/redfish/v1/Chassis/HA-RAID.0.StorageEnclosure.0" } ], "Members@odata.count": 2 }

There are more members which should not be the issue, but also the context is missing so maybe this is the cause?

stmcginnis commented 4 years ago

Thanks. What we actually need is the chassis output itself though.

The one that is failing appears to be different in that it has a RAID chassis in addition to the server chassis. Can you get the json output from both of those?

NosIreland commented 4 years ago

here they are: Chassis/1

{
    "@odata.type": "#Chassis.v1_10_0.Chassis",
    "@odata.id": "/redfish/v1/Chassis/1",
    "Id": "1",
    "Name": "Computer System Chassis",
    "ChassisType": "RackMount",
    "SerialNumber": "XXXXXXXXXXXXXXX",
    "PartNumber": "CSE-116TS-R000WNBP2-1",
    "AssetTag": "",
    "IndicatorLED": "Off",
    "Status": {
        "State": "Enabled",
        "Health": "OK",
        "HealthRollup": "OK"
    },
    "PhysicalSecurity": {
        "IntrusionSensorNumber": 170,
        "IntrusionSensor": "Normal",
        "IntrusionSensorReArm": "Manual"
    },
    "Power": {
        "@odata.id": "/redfish/v1/Chassis/1/Power"
    },
    "PCIeDevices": {
        "@odata.id": "/redfish/v1/Chassis/1/PCIeDevices"
    },
    "Thermal": {
        "@odata.id": "/redfish/v1/Chassis/1/Thermal"
    },
    "NetworkAdapters": {
        "@odata.id": "/redfish/v1/Chassis/1/NetworkAdapters"
    },
    "PCIeSlots": {
        "@odata.id": "/redfish/v1/Chassis/1/PCIeSlots"
    },
    "Sensors": {
        "@odata.id": "/redfish/v1/Chassis/1/Sensors"
    },
    "Links": {
        "ComputerSystems": [
            {
                "@odata.id": "/redfish/v1/Systems/1"
            }
        ],
        "ManagedBy": [
            {
                "@odata.id": "/redfish/v1/Managers/1"
            }
        ],
        "ManagersInChassis": [
            {
                "@odata.id": "/redfish/v1/Managers/1"
            }
        ]
    },
    "Oem": {
        "Supermicro": {
            "@odata.type": "#SmcChassisExtensions.v1_0_0.Chassis",
            "BoardSerialNumber": "XXXXXXXXXX",
            "GUID": "42313031-4D53-3CEC-EF45-600400000000",
            "BoardID": "0x1b2b"
        }
    }
}

RAID


{
    "@odata.type": "#Chassis.v1_9_1.Chassis",
    "@odata.id": "/redfish/v1/Chassis/HA-RAID.0.StorageEnclosure.0",
    "Id": "HA-RAID.0.StorageEnclosure.0",
    "Name": "Internal Enclosure 0",
    "ChassisType": "Enclosure",
    "Model": "Internal Enclosure",
    "SerialNumber": "",
    "PartNumber": "",
    "Links": {
        "ManagedBy": [
            {
                "@odata.id": "/redfish/v1/Managers/1"
            }
        ],
        "Storage": [
            {
                "@odata.id": "/redfish/v1/Systems/1/Storage/HA-RAID"
            }
        ],
        "Drives": [
            {
                "@odata.id": "/redfish/v1/Chassis/HA-RAID.0.StorageEnclosure.0/Drives/Disk.Bay.0"
            },
            {
                "@odata.id": "/redfish/v1/Chassis/HA-RAID.0.StorageEnclosure.0/Drives/Disk.Bay.1"
            },
            {
                "@odata.id": "/redfish/v1/Chassis/HA-RAID.0.StorageEnclosure.0/Drives/Disk.Bay.2"
            },
            {
                "@odata.id": "/redfish/v1/Chassis/HA-RAID.0.StorageEnclosure.0/Drives/Disk.Bay.3"
            }
        ]
    },
    "Oem": {}
}
stmcginnis commented 4 years ago

The chassis RAID response does not include all the properties that the normal server chassis returns. But according to the redfish spec, it is returning at least all minimum required properties, plus a few more. I tested in the gofish library that is used to communicate with the server, and it appears to handle things OK there. So must be something in this repo's code.

stmcginnis commented 4 years ago

Issue appears to be here since the RAID chassis doesn't return Status:

https://github.com/jenningsloy318/redfish_exporter/blob/abb9b643115e562d17e5a65b1d8d495f78ca946c/collector/chassis_collector.go#L228-L230

stmcginnis commented 4 years ago

Actually, that looks like maybe a red herring. The stack trace looks like it's failing on Status.State, but this at least is also an issue:

https://github.com/jenningsloy318/redfish_exporter/blob/abb9b643115e562d17e5a65b1d8d495f78ca946c/collector/chassis_collector.go#L293

As this chassis does not have NetworkAdapters and there is not a check like there is above that for Thermal.

stmcginnis commented 4 years ago

The stack trace points to parseCommonStatusState, but that looks fine to me. I added that to a local unit test that takes the "missing" input of a chassis.Status.State (which is just "") and it at least appears that code should handle things fine.

jenningsloy318 commented 4 years ago

I will look into the code and see if there is something I can improve next week

jenningsloy318 commented 4 years ago

@NosIreland I update the code, https://github.com/jenningsloy318/redfish_exporter/commit/54354ed4d518665a44c20ddd6527b25a16b39224 to add condition check if it is empty response, please re-compile if it is working now

NosIreland commented 4 years ago

recompiled and tested. The exporter no longer crashes ~~but neither it collects any data from the server. The only data that I get is the one provided by exporter(go_stats, etc) but no redfish data from the server itself.~~ and working as expected so far. I will keep testing. Thank you

jenningsloy318 commented 4 years ago

@NosIreland glad to help, I will close it