Open k0ste opened 2 years ago
As far as I can tell, this is all static information, right? I'd say most of this is not really a good fit for a time series database. That said, I understand it could be useful to have something like a serial number as a label someplace, so I am not generally opposed to making this available as opt-in.
Currently we can't alert, when power usage near power capacity
Which power capacity do you mean? Are your PSUs unable to handle full load of the unit? From my experience, the bottle neck is rather the power capacity per rack or feed, which you just have to know. Can you roughly describe what kind of alert you would set up if you had all this data available?
As far as I can tell, this is all static information, right? I'd say most of this is not really a good fit for a time series database. That said, I understand it could be useful to have something like a serial number as a label someplace, so I am not generally opposed to making this available as opt-in.
Meta information is okay for tsdb. If that didn't work, how would disk monitoring work? For each disk there is a meta metric, which indicates the rotation speed, interface, manufacturer, serial number. Alerting without this information would be impossible. Of course opt-in, like any module in ipmi_exporter
Which power capacity do you mean? Are your PSUs unable to handle full load of the unit? From my experience, the bottle neck is rather the power capacity per rack or feed, which you just have to know. Can you roughly describe what kind of alert you would set up if you had all this data available?
I agree with that, the power capacity of rack is top of bottle necks. For example, is good to know, that current load of PSU is near of 100% of PSU capacity. This will be useful information with 8 GPU
or 4 Socket
systems. We don't know what PSU the assemblers supplied us, so it would be convenient to have PSU overall capacity
@k0ste do you have an idea/opinions on how you'd shape this into metrics & labels?
1 metric per ID, with selected fields filled in.
# split over multiple lines here for readability in discussion only
ipmi_fru_info{
id="00h",
FRU_Inventory_Device="BMC FRU (ID 00h)",
FRU_Chassis_Type="Other",
FRU_Chassis_Part_Number="CSE-xxx-xxxx",
FRU_Chassis_Serial_Number="Cxxxxx",
FRU_Board_Manufacturing_Date/Time="mm/dd/yy - hh:mm:ss",
FRU_Board_Manufacturer="Supermicro",
FRU_Board_Product_Name="X13xxx
FRU_Board_Serial_Number="xxx,
FRU_Board_Part_Number="xxx",
FRU_Product_Manufacturer_Name="Supermicro",
FRU_Product_Part/Model_Number="SYS-xxx",
FRU_Product_Serial_Number="xxx",
} 1
I absolutely do want the FRU functionality, to answer some fleet questions.
@robbat2, it seems to me that at the initial stage we need to be more restrained in metrics. I would start with:
fru_baseboard_info{product_name="S2600WFT", part_number="R2208WT", serial_number="BQWF80100215", manufacturer="Intel Corporation"}
fru_baseboard_info{product_name="PowerEdge R540", part_number="0NJK2FA03", serial_number="CNFCP0087J01O7", manufacturer="DELL"}
fru_power_supply_info{part_number="H79286-007", serial_number="CNS1322A4AHCA0031", manufacturer="SOLUM CO., LTD."}
fru_power_supply_info{part_number="0V1YJ6A00", serial_number="CN1797263S3M1T", manufacturer="DELL"}
Something like that is more then enough to start. After this we can check the code with zoo of baremetal servers
yes, I wanted to be more constrained in metrics, taking only id=00h, but the catch is that each FRU item has many fields.
Here's my ipmi-fru
example from a recent supermicro system; it only has id=00h, no other IDs. Of specific note there are 3 distinct part & serial numbers in this FRU record.
$ ipmi-fru
FRU Inventory Device: BMC FRU (ID 00h)
FRU Chassis Type: Other
FRU Chassis Part Number: CSE................
FRU Chassis Serial Number: C..............
FRU Board Manufacturing Date/Time: ...................
FRU Board Manufacturer: Supermicro
FRU Board Product Name: X13......
FRU Board Serial Number: WM..........
FRU Board Part Number: X13......
FRU Product Manufacturer Name: Supermicro
FRU Product Part/Model Number: SYS........
FRU Product Serial Number: S..............
At the bare minimum, I'd want this exported as:
# split over multiple lines here for readability in discussion only
ipmi_fru_info{
id="00h",
FRU_Chassis_Part_Number="CSE-....",
FRU_Chassis_Serial_Number="C...",
FRU_Board_Serial_Number="WM...",
FRU_Board_Part_Number="X13...",
FRU_Product_Part_Number="SYS-....",
FRU_Product_Serial_Number="S.....",
} 1
I wish ipmi-fru
had a JSON output
The our SMC's have only this
FRU Inventory Device: BMC FRU (ID 00h)
FRU Board Manufacturing Date/Time: 08/11/21 - 10:00:00
FRU Board Manufacturer: Supermicro
FRU Board Product Name: X11DPi-NT
FRU Board Serial Number: NM217S007782
FRU Product Serial Number:
🫠
Hi, everyone, when will this fru function be supported? I am currently unable to obtain the serial number of the server through ipmi_exporter, as well as other hardware information, such as the number of hard disks, memory and other information.
ipmi-fru displays Field Replaceable Unit (FRU) Information. The FRU may hold a variety of information, such as device information, hardware information, serial numbers, and part numbers.
This may be useful, for determine PSU overall capacity & serial numbers Currently we can't alert, when power usage near power capacity
Example FRU for Intel platform:
Example FRU for Dell platform:
Example FRU for Quanta platform: