mrlhansen / idrac_exporter

Simple iDRAC exporter for Prometheus
MIT License
123 stars 37 forks source link

When i enable network metric,i have received this error log #92

Closed love6875520 closed 3 weeks ago

love6875520 commented 4 weeks ago

ERROR Error collecting metrics for host xx.xx.xxx.xxx: 3 error(s) occurred:

I use idrac_exporter to collect more than 400 dell servers,when i enable network metric, more than 100 servers unable to collecte.

mrlhansen commented 4 weeks ago

Hi @love6875520

I need some additional information to start looking into this.

Based on this, I will probably need you to send me some JSON dumps from iDRAC later on.

love6875520 commented 4 weeks ago

The machine models include Dell PowerEdge R640 R650 R840

This is the verbose log,thx

2024-08-19T15:40:16.846 DEBUG Handling request from vloa-prometheus:9348 for host xx.xx.xxx.xx 2024-08-19T15:40:16.846 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1" 2024-08-19T15:40:17.014 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems" 2024-08-19T15:40:17.116 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis" 2024-08-19T15:40:17.228 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1" 2024-08-19T15:40:17.378 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1" 2024-08-19T15:40:17.652 DEBUG Collecting metrics for host xx.xx.xxx.xx 2024-08-19T15:40:17.652 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory" 2024-08-19T15:40:17.652 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1" 2024-08-19T15:40:17.652 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/NetworkInterfaces" 2024-08-19T15:40:17.652 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/Thermal" 2024-08-19T15:40:17.652 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Storage" 2024-08-19T15:40:17.652 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/Power" 2024-08-19T15:40:17.808 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A6" 2024-08-19T15:40:18.332 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1" 2024-08-19T15:40:18.414 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/NetworkInterfaces/NIC.Integrated.1" 2024-08-19T15:40:18.430 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.B7" 2024-08-19T15:40:18.740 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Drives/Disk.Bay.0:Enclosure.Internal.0-1:RAID.Integrated.1-1" 2024-08-19T15:40:18.910 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Integrated.1/NetworkPorts" 2024-08-19T15:40:19.088 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Integrated.1/NetworkPorts/NIC.Integrated.1-1" 2024-08-19T15:40:19.239 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A10" 2024-08-19T15:40:19.339 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Storage/AHCI.Embedded.1-1" 2024-08-19T15:40:19.534 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Storage/AHCI.Embedded.2-1" 2024-08-19T15:40:19.599 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.B2" 2024-08-19T15:40:19.629 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Integrated.1/NetworkPorts/NIC.Integrated.1-2" 2024-08-19T15:40:19.858 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Integrated.1/NetworkPorts/NIC.Integrated.1-3" 2024-08-19T15:40:19.868 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A1" 2024-08-19T15:40:20.153 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.B6" 2024-08-19T15:40:20.220 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Integrated.1/NetworkPorts/NIC.Integrated.1-4" 2024-08-19T15:40:20.349 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.B5" 2024-08-19T15:40:20.381 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/NetworkInterfaces/NIC.Slot.2" 2024-08-19T15:40:20.469 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.B10" 2024-08-19T15:40:20.639 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.B3" 2024-08-19T15:40:20.644 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkPorts" 2024-08-19T15:40:20.752 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkPorts/NIC.Slot.2-1" 2024-08-19T15:40:20.843 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.B1" 2024-08-19T15:40:20.989 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A3" 2024-08-19T15:40:21.011 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkPorts/NIC.Slot.2-2" 2024-08-19T15:40:21.122 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A2" 2024-08-19T15:40:21.270 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A7" 2024-08-19T15:40:21.369 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A5" 2024-08-19T15:40:21.471 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.B4" 2024-08-19T15:40:21.575 DEBUG Querying url "https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/Memory/DIMM.Socket.A4" 2024-08-19T15:40:21.691 ERROR Error collecting metrics for host xx.xx.xxx.xx: 3 error(s) occurred:

mrlhansen commented 4 weeks ago

Thanks!

I have some R640 myself where the issue is not present, but it might depend the hardware and the version of iDRAC. I need two JSON dumps, which can be obtained using curl.

curl -k -u USERNAME https://xx.xx.xxx.xx/redfish/v1/Systems/System.Embedded.1/NetworkInterfaces/NIC.Slot.2
curl -k -u USERNAME https://xx.xx.xxx.xx/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkPorts/NIC.Slot.2-1

Replace the USERNAME with the username used for logging into iDRAC, and put the IP addresses back. The output should not contain anything sensitive, just state for the cards.

love6875520 commented 4 weeks ago

The machine info is below:

Server model:PowerEdge R640 BIOS Version:2.3.10 iDRAC firmware version 6.10.00.00

{"@odata.context":"/redfish/v1/$metadata#NetworkInterface.NetworkInterface","@odata.id":"/redfish/v1/Systems/System.Embedded.1/NetworkInterfaces/NIC.Slot.2","@odata.type":"#NetworkInterface.v1_2_1.NetworkInterface","Description":"Network Device View","Id":"NIC.Slot.2","Links":{"NetworkAdapter":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2"}},"Name":"Network Device View","NetworkDeviceFunctions":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkDeviceFunctions"},"NetworkPorts":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkPorts"},"Status":{"Health":null,"HealthRollup":null,"State":"Enabled"}}

{"@odata.context":"/redfish/v1/$metadata#NetworkPort.NetworkPort","@odata.id":"/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkPorts/NIC.Slot.2-1","@odata.type":"#NetworkPort.v1_4_1.NetworkPort","ActiveLinkTechnology":"Ethernet","AssociatedNetworkAddresses":["14:02:EC:8D:D2:D4","14:02:EC:8D:D2:D5"],"CurrentLinkSpeedMbps":0,"Description":"Network Port View","EEEEnabled":null,"FlowControlConfiguration":null,"FlowControlStatus":null,"Id":"NIC.Slot.2","LinkStatus":null,"Name":"Network Port View","NetDevFuncMaxBWAlloc":[{"MaxBWAllocPercent":null,"NetworkDeviceFunction":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkDeviceFunctions/NIC.Slot.2-1"}},{"MaxBWAllocPercent":null,"NetworkDeviceFunction":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkDeviceFunctions/NIC.Slot.2-2"}}],"NetDevFuncMaxBWAlloc@odata.count":2,"NetDevFuncMinBWAlloc":[{"MinBWAllocPercent":null,"NetworkDeviceFunction":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkDeviceFunctions/NIC.Slot.2-1"}},{"MinBWAllocPercent":null,"NetworkDeviceFunction":{"@odata.id":"/redfish/v1/Chassis/System.Embedded.1/NetworkAdapters/NIC.Slot.2/NetworkDeviceFunctions/NIC.Slot.2-2"}}],"NetDevFuncMinBWAlloc@odata.count":2,"Oem":{},"PhysicalPortNumber":"2","Status":{"State":"Enabled","Health":null,"HealthRollup":null},"SupportedEthernetCapabilities":[],"SupportedEthernetCapabilities@odata.count":0,"SupportedLinkCapabilities":[{"AutoSpeedNegotiation":null,"LinkNetworkTechnology":"Ethernet","LinkSpeedMbps":0},{"AutoSpeedNegotiation":null,"LinkNetworkTechnology":"Ethernet","LinkSpeedMbps":0}],"SupportedLinkCapabilities@odata.count":2,"VendorId":"8086","WakeOnLANEnabled":null}

mrlhansen commented 4 weeks ago

The issue is that both ports are reported with NIC.Slot.2 as id, but they should actually be reported as NIC.Slot.2-1 and NIC.Slot.2-2 - which is also the case for my machines (which are running a newer version of both iDRAC and the UEFI). I can implement a fix for this particular edge case, such that the error is not reported. However, I can also see that no relevant information seemingly is being reported (no link status, health status, or link speed). So even if I implement the fix, there will be no useful information. This might change with a firmware update.

love6875520 commented 4 weeks ago

Thanks a lot!

mrlhansen commented 3 weeks ago

I made a new release that includes this fix.