Closed rootfs closed 1 year ago
you mean add a new model that we can obtain power data through those devices and put into current kepler model so like previously we only consider machine itself but now we need consider related device energy ?
The power consumption of a platform (i.e. server) can be reported by the BMC/IPMI/HMC.
In our current implementation of Kepler, we collect the platform power consumption from the motherboard sensor (HMC), which is available in most modern servers. This sensor provides data on the power consumption of components directly attached to the motherboard, such as the CPU and memory. However, it may not include the power consumption of components like disks and GPUs.
Access to the motherboard sensor is possible via the ACPI interface within the machine or through IPMI, which reads the motherboard sensor via the BMC. We currently use the ACPI interface in Kepler, but in cases where the ACPI interface is disabled and IPMI is enabled, IPMI could be used instead.
It's important to note that the power consumption data obtained through IPMI may differ if the source is BMC or out-of-band management systems
that can consolidate the power consumption of different components, including the platform, disk, and GPU.
ok, make sense to me , appreciate the detailed info~
Please see the joint message from IPMI promoters here. Even IPMI v2.0 is a 10+ years-old spec, there are various of open-source projects related to IPMI metrics exporter. Shall we directly support BMC-Redfish integration for OOB power monitoring? Another question is about the metrics usage, since BMC data is some kinds of runtime transient power, not aggregate, how could it be used in Kepler then?
IPMI or Redfish
I am ok with any direction
BMC data is some kinds of runtime transient power, not aggregate, how could it be used in Kepler
We do extrapolation (current power * elapsed time) and aggregate it in Kepler.
Let's focus on Redfish first.
Some questions, 1) Are the users ok with giving BMC access to Kepler? (out-of-band)? 2) Are we only assuming the BM Kepler use case?
Some questions,
- Are the users ok with giving BMC access to Kepler? (out-of-band)?
That out-of-band architecture looks more secure than giving BMC access to each node
- Are we only assuming the BM Kepler use case?
If we consider external power source as anything that powers the machine (BMC for BM or hypervisor level power source for VM), then this architecture could work for both BM and VM
1) I'm a bit confused about the BMC access. AFAIK, out-of-band access would need to give Kepler access to the BMC. Would you please clarify which architecture you are referring to? Perhaps, we can deep dive into this during the community meeting. 2) Yes, if we use out-of-band measurements, we can have both node and VM/BM power measurements through Kepler, but there is a chance that would double-count the idle power. VM-BM mapping should be carefully tracked so as not to double-count the idle power.
05/09 meeting:
Potential implementations:
Do you have any update on this issue?
I've just compared power value from Kepler and Redfish. Even though the difference of them is not large, I think it's better to fill the gap if I can. Is anyone working on it?
Brief report
Environment
Load
stress-ng -c <n>
(n=1..4)Value of power
Findings
Screenshot of graph
Thanks @tiwatsuka, very interesting work.
Which color is Kepler and Redfish? Blue and green, respectively?
Is this the Kepler node power or sum of all containers? Can you share your prometheus query? Since the Prometheus query takes the average of a time window we can expect some variations.
The OTHER part is the total power from the motherboard sensor (using ACPI API) less the RAPL power. Given that you're running a CPU-intensive application, the "OTHER" part of the power consumption should ideally be minimal and relatively constant. A disk or network-intensive workload might potentially impact the "OTHER" power consumption if the power drawn by the disk and network components is being accounted for by the motherboard sensor. However, I haven't personally tested this scenario.
thank you @tiwatsuka! This is a very cool study. Kepler (blue) appears to match with redfish (green) most of the time but when there are major transitions, there are some lags. This is likely due to the report interval differences between BMC and RAPL. On my setup (dell), the report interval is 1 min.
# redfishtool -r xxxx -u xxxx -p xxxx raw GET /redfish/v1/Chassis/System.Embedded.1/Power/PowerControl
{
"@odata.context": "/redfish/v1/$metadata#Power.Power",
"@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Power#/PowerControl/0",
"@odata.type": "#Power.v1_6_1.PowerControl",
"MemberId": "0",
"Name": "System Power Control",
"PowerAllocatedWatts": 1536,
"PowerAvailableWatts": 0,
"PowerCapacityWatts": 1536,
"PowerConsumedWatts": 389,
"PowerLimit": {
"CorrectionInMs": 0,
"LimitException": "HardPowerOff",
"LimitInWatts": 485
},
"PowerMetrics": {
"AverageConsumedWatts": 389,
"IntervalInMin": 1,
"MaxConsumedWatts": 415,
"MinConsumedWatts": 386
},
"PowerRequestedWatts": 1097,
"RelatedItem": [
{
"@odata.id": "/redfish/v1/Chassis/System.Embedded.1"
},
{
"@odata.id": "/redfish/v1/Systems/System.Embedded.1"
}
],
"RelatedItem@odata.count": 2
}
I have explored different ways of support redfish, including the open API approach and gofish. But both appear to be overkill for our use case. I am going to just support the Power API in kepler.
@marceloamaral The blue line is Kepler and the green is Redfish.
Here is the query. I simply copied from the dashboard of Kepler.
sum(irate(kepler_container_package_joules_total{container_namespace=~\"$namespace\"}[1m])) +
sum(irate(kepler_container_dram_joules_total{container_namespace=~\"$namespace\"}[1m])) +
sum(irate(kepler_container_other_host_components_joules_total{container_namespace=~\"$namespace\"}[1m]))
AFAIK, power from BMC is AC power consumption and one from RAPL is DC power consumption. When DC power required by CPU increase, the loss of AC-DC conversion also increase. If it is true and Kepler considers this, the lost should be included in "OTHER" part, I guess.
@rootfs The interval is 20 on my setting. I think this affects only Average, Max and Min consumed watts. "PowerConsumedWatts" can be different from "AverageConsumedWatts".
"PowerControl": [
{
"@odata.id": "/redfish/v1/Chassis/1/Power#PowerControl/0",
"MemberId": "0",
"PowerCapacityWatts": 500,
"PowerConsumedWatts": 74,
"PowerMetrics": {
"AverageConsumedWatts": 39,
"IntervalInMin": 20,
"MaxConsumedWatts": 81,
"MinConsumedWatts": 37
}
}
],
In my observation, power from BMC usually lag several soconds (even when I use ipmi-tool). However I didn't verify it on so many hardware neither find specification about it. The lag might lead wrong estimation when the load on a node changes frequently.
@tiwatsuka thanks for the info. We are working on the BMC support, it is still early but would you help review and test on your environment? I don't have any HPE servers yet.
BMC support is finished.
Current Kepler Architecture
Out of band external power source support