prometheus-community / ipmi_exporter

Remote IPMI exporter for Prometheus
MIT License
459 stars 131 forks source link

Support for ipmi-fru module? #127

Open k0ste opened 1 year ago

k0ste commented 1 year ago

ipmi-fru displays Field Replaceable Unit (FRU) Information. The FRU may hold a variety of information, such as device information, hardware information, serial numbers, and part numbers.

This may be useful, for determine PSU overall capacity & serial numbers Currently we can't alert, when power usage near power capacity

Example FRU for Intel platform:

FRU Inventory Device: Baseboard (ID 00h)

  FRU Chassis Type: Rack Mount Chassis
  FRU Chassis Part Number: R2208WT
  FRU Chassis Serial Number: BQWF80100215
  FRU Chassis Custom Info: ...............................
  FRU Chassis Custom Info: ...............................

  FRU Board Manufacturing Date/Time: 01/08/18 - 21:30:00
  FRU Board Manufacturer: Intel Corporation
  FRU Board Product Name: S2600WFT
  FRU Board Serial Number: BQWF80100215
  FRU Board Part Number: H48104-851
  FRU FRU File ID: FRU Ver 1.29

  FRU Product Manufacturer Name: Intel Corporation
  FRU Product Name: S2600WFT
  FRU Product Part/Model Number: H97440-005
  FRU Product Version: ....................
  FRU Product Serial Number: BQWT80400118
  FRU Product Asset Tag: ....................

FRU Inventory Device: Front Panel (ID 04h)

  FRU Board Manufacturing Date/Time: 12/23/17 - 14:18:00
  FRU Board Manufacturer: Intel Corporation
  FRU Board Product Name: FFPANEL
  FRU Board Serial Number: BQWL75000554
  FRU Board Part Number: H39380-171
  FRU FRU File ID: FRU Ver 0.01

FRU Inventory Device: HS Backplane 1 (ID 05h)

  FRU Error: board info area checksum invalid

FRU Inventory Device: PCIe Riser 1 (ID 0Bh)

  FRU Board Manufacturing Date/Time: 12/25/17 - 20:43:00
  FRU Board Manufacturer: Intel Corporation
  FRU Board Product Name: A2UL8RISER2
  FRU Board Serial Number: BQWL75106900
  FRU Board Part Number: H20087-171
  FRU FRU File ID: FRU Ver 0.01

FRU Inventory Device: PCIe Riser 2 (ID 0Ch)

  FRU Board Manufacturing Date/Time: 12/25/17 - 23:14:00
  FRU Board Manufacturer: Intel Corporation
  FRU Board Product Name: A2UL8RISER2
  FRU Board Serial Number: BQWL75106324
  FRU Board Part Number: H20087-171
  FRU FRU File ID: FRU Ver 0.01

FRU Inventory Device: PCIe Riser 3 (ID 0Dh)

  FRU Board Manufacturing Date/Time: 12/18/17 - 15:51:00
  FRU Board Manufacturer: Intel Corporation
  FRU Board Product Name: A2UX8X4RISER
  FRU Board Serial Number: BQWL74903545
  FRU Board Part Number: G94347-171
  FRU FRU File ID: FRU Ver 0.02

FRU Inventory Device: Pwr Supply 1 FRU (ID 02h)

  FRU Product Manufacturer Name: SOLUM CO., LTD.                  
  FRU Product Name: PSSF132202A
  FRU Product Part/Model Number: H79286-007
  FRU Product Version: 00A
  FRU Product Serial Number: CNS1322A4AHCA0031

  FRU Power Supply Overall Capacity: 1300 Watts
  FRU Power Supply Peak VA: 1440 VA
  FRU Power Supply Max Inrush Current: 35 Amps
  FRU Power Supply Inrush Interval: 5 ms
  FRU Power Supply Low End Input Voltage 1: 90000 mV
  FRU Power Supply High End Input Voltage 1: 140000 mV
  FRU Power Supply Low End Input Voltage 2: 180000 mV
  FRU Power Supply High End Input Voltage 2: 264000 mV
  FRU Power Supply Low End Acceptable Frequency: 47 Hz
  FRU Power Supply High End Acceptable Frequency: 63 Hz
  FRU Power Supply A/C Dropout Tolerance: 10 ms
  FRU Power Supply Predictive Fail Support: No
  FRU Power Supply Power Factor Correction Supported: Yes
  FRU Power Supply AutoSwitch Supprt: Yes
  FRU Power Supply Hot Swap Support: Yes
  FRU Power Supply Peak Capacity: 1440 Watts
  FRU Power Supply Hold Up Time: 10 s
  FRU Power Supply Voltage 1: 12V
  FRU Power Supply Voltage 2: 12V
  FRU Power Supply Total Combined Wattage: 1300 Watts

  FRU DC Output Output Number: 1
  FRU DC Output Output on Standy: No
  FRU DC Output Nominal Voltage: 12000 mV
  FRU DC Output Maximum Negative Voltage Deviation: 11400 mV
  FRU DC Output Maximum Positive Voltage Deviation: 12600 mV
  FRU DC Output Ripple and Noise pk-pk: 120 mV
  FRU DC Output Minimum Current Draw: 0 mA
  FRU DC Output Maximum Current Draw: 65535 mA

  FRU DC Output Output Number: 2
  FRU DC Output Output on Standy: Yes
  FRU DC Output Nominal Voltage: 12000 mV
  FRU DC Output Maximum Negative Voltage Deviation: 11400 mV
  FRU DC Output Maximum Positive Voltage Deviation: 12600 mV
  FRU DC Output Ripple and Noise pk-pk: 120 mV
  FRU DC Output Minimum Current Draw: 0 mA
  FRU DC Output Maximum Current Draw: 2100 mA

FRU Inventory Device: Pwr Supply 2 FRU (ID 03h)

  FRU Product Manufacturer Name: SOLUM CO., LTD.                  
  FRU Product Name: PSSF132202A
  FRU Product Part/Model Number: H79286-007
  FRU Product Version: 00A
  FRU Product Serial Number: CNS1322A4AHB20503

  FRU Power Supply Overall Capacity: 1300 Watts
  FRU Power Supply Peak VA: 1440 VA
  FRU Power Supply Max Inrush Current: 35 Amps
  FRU Power Supply Inrush Interval: 5 ms
  FRU Power Supply Low End Input Voltage 1: 90000 mV
  FRU Power Supply High End Input Voltage 1: 140000 mV
  FRU Power Supply Low End Input Voltage 2: 180000 mV
  FRU Power Supply High End Input Voltage 2: 264000 mV
  FRU Power Supply Low End Acceptable Frequency: 47 Hz
  FRU Power Supply High End Acceptable Frequency: 63 Hz
  FRU Power Supply A/C Dropout Tolerance: 10 ms
  FRU Power Supply Predictive Fail Support: No
  FRU Power Supply Power Factor Correction Supported: Yes
  FRU Power Supply AutoSwitch Supprt: Yes
  FRU Power Supply Hot Swap Support: Yes
  FRU Power Supply Peak Capacity: 1440 Watts
  FRU Power Supply Hold Up Time: 10 s
  FRU Power Supply Voltage 1: 12V
  FRU Power Supply Voltage 2: 12V
  FRU Power Supply Total Combined Wattage: 1300 Watts

  FRU DC Output Output Number: 1
  FRU DC Output Output on Standy: No
  FRU DC Output Nominal Voltage: 12000 mV
  FRU DC Output Maximum Negative Voltage Deviation: 11400 mV
  FRU DC Output Maximum Positive Voltage Deviation: 12600 mV
  FRU DC Output Ripple and Noise pk-pk: 120 mV
  FRU DC Output Minimum Current Draw: 0 mA
  FRU DC Output Maximum Current Draw: 65535 mA

  FRU DC Output Output Number: 2
  FRU DC Output Output on Standy: Yes
  FRU DC Output Nominal Voltage: 12000 mV
  FRU DC Output Maximum Negative Voltage Deviation: 11400 mV
  FRU DC Output Maximum Positive Voltage Deviation: 12600 mV
  FRU DC Output Ripple and Noise pk-pk: 120 mV
  FRU DC Output Minimum Current Draw: 0 mA
  FRU DC Output Maximum Current Draw: 2100 mA

Example FRU for Dell platform:

FRU Inventory Device: System Board (ID 00h)

  FRU Board Manufacturing Date/Time: 07/24/18 - 13:20:00
  FRU Board Manufacturer: DELL
  FRU Board Product Name: PowerEdge R540                  
  FRU Board Serial Number: CNFCP0087J01O7
  FRU Board Part Number: 0NJK2FA03
  FRU FRU File ID: 00h

  FRU Product Manufacturer Name: DELL
  FRU Product Name: PowerEdge R540                  
  FRU Product Part/Model Number: 
  FRU Product Version: 01
  FRU Product Serial Number: 9HJNYR2                         
  FRU Product Asset Tag:                                                                

FRU Inventory Device: PS1 (ID 01h)

  FRU Board Manufacturing Date/Time: 04/01/16 - 17:55:00
  FRU Board Manufacturer: DELL
  FRU Board Product Name: PWR SPLY,750W,RDNT,DELTA      
  FRU Board Serial Number: CN1797263S3M1T
  FRU Board Part Number: 0V1YJ6A00
  FRU FRU File ID: 5

  FRU Error: multirecord area checksum invalid

FRU Inventory Device: PS2 (ID 02h)

  FRU Board Manufacturing Date/Time: 04/27/16 - 06:06:00
  FRU Board Manufacturer: DELL
  FRU Board Product Name: PWR SPLY,750W,RDNT,ARTESYN    
  FRU Board Serial Number: PH1629864R00F5
  FRU Board Part Number: 09K04TA00
  FRU FRU File ID: 5

  FRU Error: multirecord area checksum invalid

FRU Inventory Device: BP1 (ID 0Dh)

  FRU Board Manufacturing Date/Time: 03/21/18 - 01:16:00
  FRU Board Manufacturer: DELL
  FRU Board Product Name: DRIVE BACKPLANE                 
  FRU Board Serial Number: CNIVC0083D0309
  FRU Board Part Number: 08N0NGA00
  FRU FRU File ID: 00h

FRU Inventory Device: NDC (ID 04h)

FRU Inventory Device: PERC1 (ID 0Ah)

  FRU Board Manufacturing Date/Time: 06/10/20 - 16:41:00
  FRU Board Manufacturer: DELL
  FRU Board Product Name: Dell Storage Cntlr. H730P - Adp.
  FRU Board Serial Number: CNFCP0006902VT
  FRU Board Part Number: 0XYHWNA01
  FRU FRU File ID: 00h

FRU Inventory Device: OEM fru (ID 11h)

FRU Inventory Device: BP0 (ID 0Ch)

FRU Inventory Device: OCP Mezz (ID 07h)

  FRU Board Manufacturing Date/Time: 05/20/21 - 07:37:00
  FRU Board Manufacturer: DELL
  FRU Board Product Name: BRCM 10GbE 2P 57416 LOM SFP     
  FRU Board Serial Number: CNFCP0015I00BV
  FRU Board Part Number: 0CF4P0A01
  FRU FRU File ID: 00h

FRU Inventory Device: PERC2 (ID 0Bh)

Example FRU for Quanta platform:

FRU Inventory Device: MB_FRU (ID 00h)

  FRU Chassis Type: Rack Mount Chassis
  FRU Chassis Part Number: ---
  FRU Chassis Serial Number: QTFCR2815009D
  FRU Chassis Custom Info: ---
  FRU Chassis Custom Info: ---

  FRU Board Manufacturing Date/Time: 04/09/18 - 09:13:00
  FRU Board Manufacturer: Quanta Cloud Technology Inc.
  FRU Board Product Name: S5BQ-MB (LBG-1G)
  FRU Board Serial Number: NT981100027
  FRU Board Part Number: 31S5BMB0040
  FRU FRU File ID: V0.45
  FRU Board Custom Info: ---
  FRU Board Custom Info: ---
  FRU Board Custom Info: 05

  FRU Product Manufacturer Name: Quanta Cloud Technology Inc.
  FRU Product Name: QuantaGrid D52BQ-2U
  FRU Product Part/Model Number: ---
  FRU Product Version: ---
  FRU Product Serial Number: QTFCR2815009D
  FRU Product Asset Tag: ---
  FRU FRU File ID: ---
  FRU Product Custom Info: ---
  FRU Product Custom Info: ---
  FRU Product Custom Info: ---

FRU Inventory Device: FP_FRU (ID 01h)

  FRU Chassis Type: Rack Mount Chassis
  FRU Chassis Part Number: ---
  FRU Chassis Serial Number: QTFCR2815009D
  FRU Chassis Custom Info: ---
  FRU Chassis Custom Info: ---

  FRU Board Manufacturing Date/Time: 04/12/18 - 08:25:00
  FRU Board Manufacturer: Quanta Cloud Technology Inc.
  FRU Board Product Name: S5BQ-FP
  FRU Board Serial Number: KT981100136
  FRU Board Part Number: ---
  FRU FRU File ID: V0.47
  FRU Board Custom Info: ---
  FRU Board Custom Info: ---
  FRU Board Custom Info: ---

  FRU Product Manufacturer Name: Quanta Cloud Technology Inc.
  FRU Product Name: QuantaGrid D52BQ-2U (3.5 LFF_SATA_SAS)
  FRU Product Part/Model Number: ---
  FRU Product Version: ---
  FRU Product Serial Number: QTFCR2815009D
  FRU Product Asset Tag: ---
  FRU FRU File ID: ---
  FRU Product Custom Info: 03
  FRU Product Custom Info: ---
  FRU Product Custom Info: ---

FRU Inventory Device: NIC_FRU (ID 0Ah)

  FRU Board Manufacturing Date/Time: 07/14/17 - 12:15:00
  FRU Board Manufacturer: Quanta Cloud Technology Inc.
  FRU Board Product Name: ON 1GbE I357-T4
  FRU Board Serial Number: 6R272600040
  FRU Board Part Number: 3GS5BMA00B0
  FRU FRU File ID: V0.03
  FRU Board Custom Info: 00000002

  FRU OEM Manufacturer ID: Quanta Computer Inc. (1C4Ch)
  FRU OEM Data: 05h 02h
bitfehler commented 1 year ago

As far as I can tell, this is all static information, right? I'd say most of this is not really a good fit for a time series database. That said, I understand it could be useful to have something like a serial number as a label someplace, so I am not generally opposed to making this available as opt-in.

Currently we can't alert, when power usage near power capacity

Which power capacity do you mean? Are your PSUs unable to handle full load of the unit? From my experience, the bottle neck is rather the power capacity per rack or feed, which you just have to know. Can you roughly describe what kind of alert you would set up if you had all this data available?

k0ste commented 1 year ago

As far as I can tell, this is all static information, right? I'd say most of this is not really a good fit for a time series database. That said, I understand it could be useful to have something like a serial number as a label someplace, so I am not generally opposed to making this available as opt-in.

Meta information is okay for tsdb. If that didn't work, how would disk monitoring work? For each disk there is a meta metric, which indicates the rotation speed, interface, manufacturer, serial number. Alerting without this information would be impossible. Of course opt-in, like any module in ipmi_exporter

Which power capacity do you mean? Are your PSUs unable to handle full load of the unit? From my experience, the bottle neck is rather the power capacity per rack or feed, which you just have to know. Can you roughly describe what kind of alert you would set up if you had all this data available?

I agree with that, the power capacity of rack is top of bottle necks. For example, is good to know, that current load of PSU is near of 100% of PSU capacity. This will be useful information with 8 GPU or 4 Socket systems. We don't know what PSU the assemblers supplied us, so it would be convenient to have PSU overall capacity

robbat2 commented 12 months ago

@k0ste do you have an idea/opinions on how you'd shape this into metrics & labels?

1 metric per ID, with selected fields filled in.

# split over multiple lines here for readability in discussion only
ipmi_fru_info{
id="00h",
FRU_Inventory_Device="BMC FRU (ID 00h)",
FRU_Chassis_Type="Other",
FRU_Chassis_Part_Number="CSE-xxx-xxxx",
FRU_Chassis_Serial_Number="Cxxxxx",
FRU_Board_Manufacturing_Date/Time="mm/dd/yy - hh:mm:ss",
FRU_Board_Manufacturer="Supermicro",
FRU_Board_Product_Name="X13xxx
FRU_Board_Serial_Number="xxx,
FRU_Board_Part_Number="xxx",
FRU_Product_Manufacturer_Name="Supermicro",
FRU_Product_Part/Model_Number="SYS-xxx",
FRU_Product_Serial_Number="xxx",
} 1

I absolutely do want the FRU functionality, to answer some fleet questions.

k0ste commented 12 months ago

@robbat2, it seems to me that at the initial stage we need to be more restrained in metrics. I would start with:

fru_baseboard_info{product_name="S2600WFT", part_number="R2208WT", serial_number="BQWF80100215", manufacturer="Intel Corporation"}
fru_baseboard_info{product_name="PowerEdge R540", part_number="0NJK2FA03", serial_number="CNFCP0087J01O7", manufacturer="DELL"}
fru_power_supply_info{part_number="H79286-007", serial_number="CNS1322A4AHCA0031", manufacturer="SOLUM CO., LTD."}
fru_power_supply_info{part_number="0V1YJ6A00", serial_number="CN1797263S3M1T", manufacturer="DELL"}

Something like that is more then enough to start. After this we can check the code with zoo of baremetal servers

robbat2 commented 12 months ago

yes, I wanted to be more constrained in metrics, taking only id=00h, but the catch is that each FRU item has many fields.

Here's my ipmi-fru example from a recent supermicro system; it only has id=00h, no other IDs. Of specific note there are 3 distinct part & serial numbers in this FRU record.

$ ipmi-fru
FRU Inventory Device: BMC FRU (ID 00h)

  FRU Chassis Type: Other
  FRU Chassis Part Number: CSE................
  FRU Chassis Serial Number: C..............

  FRU Board Manufacturing Date/Time: ...................
  FRU Board Manufacturer: Supermicro
  FRU Board Product Name: X13......
  FRU Board Serial Number: WM..........
  FRU Board Part Number: X13......

  FRU Product Manufacturer Name: Supermicro
  FRU Product Part/Model Number: SYS........
  FRU Product Serial Number: S..............

At the bare minimum, I'd want this exported as:

# split over multiple lines here for readability in discussion only
ipmi_fru_info{
id="00h",
FRU_Chassis_Part_Number="CSE-....",
FRU_Chassis_Serial_Number="C...",
FRU_Board_Serial_Number="WM...",
FRU_Board_Part_Number="X13...",
FRU_Product_Part_Number="SYS-....",
FRU_Product_Serial_Number="S.....",
} 1

I wish ipmi-fru had a JSON output

k0ste commented 12 months ago

The our SMC's have only this

FRU Inventory Device: BMC FRU (ID 00h)

  FRU Board Manufacturing Date/Time: 08/11/21 - 10:00:00
  FRU Board Manufacturer: Supermicro
  FRU Board Product Name: X11DPi-NT
  FRU Board Serial Number: NM217S007782

  FRU Product Serial Number:

🫠

ilanni2460 commented 7 months ago

Hi, everyone, when will this fru function be supported? I am currently unable to obtain the serial number of the server through ipmi_exporter, as well as other hardware information, such as the number of hard disks, memory and other information.