openbmc / phosphor-pid-control

OpenBMC PID-based Thermal Control Daemon
Apache License 2.0
16 stars 21 forks source link

Set a fixed fan PWM as transitional mode at service startup and terminate #34

Open potinlai opened 1 year ago

potinlai commented 1 year ago

In current implementation, phosphor-pid-control needs to build sensors and zones from configuration first, then start calculating the PWM outout value. Any error during build sensors will causes phosphor-pid-control delay 10 seconds and then rebuild sensors again. In this case, phosphor-pid-control will never calculate and update PWM output value. (This issue also mentioned in https://github.com/openbmc/phosphor-pid-control/issues/33)

I would to propose a new feature to apply a fixed fan PWM as transitional mode at below two time points.

  1. At service startup, before loading configuration. This can ensure fan start working immediately to avoid any problem of wrong configuration or bad sensor.

  2. At service terminate. From the point of view of keeping the system safe, It is much safer to set a fixed PWM value when service stop then keeping last calculated value.

Krellan commented 1 year ago

Sounds reasonable. I think the failsafe value should be loaded at startup time and at shutdown time, as you mentioned. I can see how this might introduce an unwanted spike in fan speed during a normal startup, however. Perhaps have the option of using a different fixed PWM value for startup/shutdown than failsafe?

potinlai commented 1 year ago

Hi @Krellan, I was thinking reuse failsafe, however failsafe is defined in the configuration, which is unkown at service startup, so we will need a meson option for this.

Also, phosphor-pid-control does not knows any fan's information until configuration is loaded, so It looks hard to come out a general mechanism that works across all platforms.

So far I wrote a customized .conf fife and adding lots of ExecStartPre and ExecStopPost lines into the file for our platform (https://gerrit.openbmc.org/c/openbmc/openbmc/+/61752), the test result is passed, but it doesn't look good and also needs the exact object path of each PWM.

Just wondering any good ideas can achieve this feature in a more general way.

iwoloschin commented 1 year ago

I've added the following line to my builds' phosphor-pid-control.service.in template:

ExecStopPost=/bin/bash -c 'find /sys/class/hwmon/hwmon*/pwm* -exec bash -cv \"echo 255 > {}\" \\;'

The big issue here was making sure that fans went to, and stayed at, 100% through a BMC reboot or firmware upgrade. I'd rather run fans at 100% for these short, uncontrolled periods than run the risk of overheating.

Krellan commented 10 months ago

It would be nice to set the fans to a fixed safe speed (like failsafe mode) when the service is stopped, or is otherwise not running.

If service is just started = can do this within phosphor-pid-control

If service is stopped gracefully = can do this within phosphor-pid-control

If service crashes or is forced to stop = need to do this outside. The most likely place should be the systemd service file for the phosphor-pid-control service.

If BMC reboots or is running before the service is started = need to do this outside. The most likely place should be as soon in userspace as possible, just after the fan sensors appear. This should probably also be a systemd service file. Alternatively, earlier userspace can be achieved by writing directly to the kernel hwmon files, which would be done similarly from startup, but have no dependency on starting the fan sensors first.

Krellan commented 10 months ago

There was some discussion in Discord about adding this to the Device Tree.

Then, kernel drivers could read this setting while they initialize, and apply it to fan hardware ASAP. This would take effect before userspace.

https://discord.com/channels/775381525260664832/775381525260664836/1153785687749963776

Seems like a good idea to me, as long as it is applied consistently across all of the different fan drivers.