openbmc / phosphor-pid-control

OpenBMC PID-based Thermal Control Daemon
Apache License 2.0
18 stars 21 forks source link

Add new class of PID loop for using power/wattage as input #24

Closed Krellan closed 1 year ago

Krellan commented 1 year ago

As of now, the class of PID loop temp searches only the sensors with temperature in their D-Bus object path, for example an input sensor with name ChassisTemp will be read from this object path: /xyz/openbmc_project/sensors/temperature/ChassisTemp

While this works great for temperature sensors, it doesn't work for power sensors. There has been discussion about wanting to use power (wattage) sensors as input.

I propose adding a new Class choice, namely power, for reading power sensors instead of temperature sensors.

The semantics would otherwise be the same as temp, namely, a higher reading would be understood to indicate a hotter system, and when multiple inputs are given, the hottest of them all would be considered to be the final input.

The only functional change would be the D-Bus object path search directory that the sensors would be taken from. A power sensor with name ChassisPowerIn would be read from this object path, for example: /xyz/openbmc_project/sensors/power/ChassisPowerIn

Here's an example of a JSON stanza:

{
  ...
  "Type": "Pid",
  "Class": "power",
  "Inputs": [
    "left_psu_pin",
    "right_psu_pin"
  ],
  ...
}

In this example, left_psu_pin and right_psu_pin would each have their power looked at, and the highest power reading would be taken as the final input to this PID loop.

It can also be useful to add together power readings, for a combined power total. This is a completely new idea for power, as this is something that isn't useful at all for temperature, because temperatures can't be added together (except in Kelvin, but that's another story).

I propose another class of PID loop, powersum, for this new feature. Instead of simply taking the maximum over all of the inputs, all of the individual input values of all the listed inputs will be summed together, and that sum will be taken as the final input to this PID loop.

Here's an example:

{
  ...
  "Type": "Pid",
  "Class": "powersum",
  "Inputs": [
    "left_psu_pin",
    "right_psu_pin"
  ],
  ...
}

In this example, the values of left_psu_pin and right_psu_pin will be summed together. So, if left_psu_pin is 400 watts, and right_psu_pin is 500 watts, then the sum total of 900 will be taken in and used as the final input to this PID loop.

How does this sound?

harveyquta commented 1 year ago

I don't really understand why need to add power to PID loop. If power reading > setpoint Watts, PWM goes up, but psu power reading will keep increasing. It seems weird. Or....you want to reduce the output PWM if power > setpoint watts? If so, hmm...actually, I cannot imagine the final result after this feature adds to PID control currently. If reducing output PWM because of power > setpoint, could the sensor overheat?

Krellan commented 1 year ago

The purpose of this is to allow the fans to react to wattage going up, in the same way that they currently react to temperature going up.

It's true, that the fans will contribute to the total power reading, if the system is configured to measure total power. That could result in a positive feedback loop, as you said. However, this should not happen in a properly configured system, as the fans should only consume a small amount of power, relative to the power consumption of the computation/machinery that is in need of being cooled. As always, it is up to the user of this software to choose proper coefficients, to avoid a positive feedback loop, or any other undesirable behavior, in a PID loop.

The meta-purpose of this is to allow the system to react faster to situations that would cause a need for greater cooling. In a particular system, the wattage spikes up fast when the system is placed under heavy load, however, it takes a while for temperature to reflect this. By reacting to wattage, in addition to temperature, we get the beneficial effect of having our cooling system respond faster, thus keeping the system consistently cool.

The relation is the same. Input value goes up, that means the fans need to speed up.

In situations where multiple (temp, margin, power, powersum) loops all contribute to the same RPM decision, then the highest RPM will still be used as the final decision. This algorithm would not change.

iwoloschin commented 1 year ago

Instead of PSU Pin, why not use something like the CPU's own reported package power? I know for certain the BMC can access CPU package power on AMD Epyc CPUs and I believe the same information is available on Intel CPUs via PECI (though I also believe the current PECI stack is broken and has non-functional userspace tooling). I'm not sure if POWER or Ampere CPUs have similar reporting.

On my systems the fans tend to be the worst offender for increased power draw. I wouldn't ever use something like this off of PSU measurements, but I could see potentially using CPU (or other subcomponent) power measurements.

Krellan commented 1 year ago

It doesn't have to be a PSU, that was just a typical example.

It's up to the user of this software to choose the power sensors that are best for their needs. If it's CPU package power, that might nicely work as well, especially if controlling a CPU fan.

Krellan commented 1 year ago

OK, here's an implementation: https://gerrit.openbmc.org/c/openbmc/phosphor-pid-control/+/58651

Krellan commented 1 year ago

This feature has landed: https://github.com/openbmc/phosphor-pid-control/commit/23e22b90c32fc7b40489fae41b1ad6439152b07f

Closing this issue.