openbmc / dbus-sensors

D-Bus configurable sensor scanning applications
Apache License 2.0
23 stars 44 forks source link

[issue] Manul fan control is flaky #25

Open huangalang opened 1 year ago

huangalang commented 1 year ago

hello users:

we are noticing there's a chance that set pwm value via FanSensor property will fail and it's easy to reproduce the issue

/test log as bellow/

=================================================================== root@test-machine:/sys/bus/i2c/devices/15-002f/hwmon/hwmon4# busctl set-property xyz.openbmc_project.FanSensor /xyz/openbmc_project/control/fanpwm/fan0_pwm xyz.openbmc_project.Control.FanPwm Target t 75 root@test-machine:/sys/bus/i2c/devices/15-002f/hwmon/hwmon4# cat pwm1 75

root@test-machine:/sys/bus/i2c/devices/15-002f/hwmon/hwmon4# busctl set-property xyz.openbmc_project.FanSensor /xyz/openbmc_project/control/fanpwm/fan0_pwm xyz.openbmc_project.Control.FanPwm Target t 90 root@test-machine:/sys/bus/i2c/devices/15-002f/hwmon/hwmon4# cat pwm1 75 -> issue

it should not happen because usually resp is updated properly via following line https://github.com/openbmc/dbus-sensors/blob/master/src/PwmSensor.cpp#L132

but we found when the issue happens , the program will match following condition checking https://github.com/openbmc/dbus-sensors/blob/master/src/PwmSensor.cpp#L127

and when it happens , the pwm value can be passed down to sysfs attribute does anyone come accross the same issue?

thanks

chaul-ampere commented 1 year ago

I see that you seemed to be using an i2c fan controller driver. In the past, we noticed that the driver shows in /sys/*/pwm1 the current PWM that the fan is running at, instead of the PWM that we have echoed to it (two different registers for the purposes). So pwm1 in sysfs does not display right away what we just set (it changes with delay step by step, specially with large gap). This when combined with dbus-sensors fansensor mechanism of setting and checking causes an issue that when you set to Target a PWM that you have just set before the current Target value, it will fail to write to sysfs. If you echo directly to pwm1 a value a lot larger or smaller than the current pwm, and cat it and see that it increases or decreases slowly towards the goal, then we are facing the same prob.

huangalang commented 1 year ago

@chaul-ampere

yes we are using i2c fan controller driver (MAX31790ATI) not quite understand what you described , in our case , we haven't run into the problem setting pwm via sysfs directlybut if we set pwm via dbus property , the setting will be blocked in the condition checking in following link https://github.com/openbmc/dbus-sensors/blob/master/src/PwmSensor.cpp#L127

it seems that our case is a different story from yours however, thanks for your sharing , we will try to run the scenario you described and see what will happen

chiuyikai24 commented 1 year ago

I see that you seemed to be using an i2c fan controller driver. In the past, we noticed that the driver shows in /sys/*/pwm1 the current PWM that the fan is running at, instead of the PWM that we have echoed to it (two different registers for the purposes). So pwm1 in sysfs does not display right away what we just set (it changes with delay step by step, specially with large gap). This when combined with dbus-sensors fansensor mechanism of setting and checking causes an issue that when you set to Target a PWM that you have just set before the current Target value, it will fail to write to sysfs. If you echo directly to pwm1 a value a lot larger or smaller than the current pwm, and cat it and see that it increases or decreases slowly towards the goal, then we are facing the same prob.

It appears to exhibit some similarities, and we are currently conducting continuous experimentation to determine if we can replicate the same conditions. Additionally, I would like to inquire if you have any reference or recommended solutions for this issue?

chaul-ampere commented 1 year ago

I see that you seemed to be using an i2c fan controller driver. In the past, we noticed that the driver shows in /sys/*/pwm1 the current PWM that the fan is running at, instead of the PWM that we have echoed to it (two different registers for the purposes). So pwm1 in sysfs does not display right away what we just set (it changes with delay step by step, specially with large gap). This when combined with dbus-sensors fansensor mechanism of setting and checking causes an issue that when you set to Target a PWM that you have just set before the current Target value, it will fail to write to sysfs. If you echo directly to pwm1 a value a lot larger or smaller than the current pwm, and cat it and see that it increases or decreases slowly towards the goal, then we are facing the same prob.

It appears to exhibit some similarities, and we are currently conducting continuous experimentation to determine if we can replicate the same conditions. Additionally, I would like to inquire if you have any reference or recommended solutions for this issue?

If you have the same prob as ours (we are using MAX31790 i2c fan controller), then the Target property of "/xyz/openbmc_project/control/fanpwm/x" is showing the PWM at which the fans are currently running (from sysfs pwm1). Because dbus-sensors alr has the Value property under "/xyz/openbmc_project/sensors/fan_pwm/x" paths to indicate that, we don't want Target to have the same meaning. We added a patch in the driver to change to using the other register MAX31790_REG_PWMOUT to let sysfs pwm1 show what we echoed to it, instead of MAX31790_REG_PWM_DUTY_CYCLE.

chiuyikai24 commented 1 year ago

I see that you seemed to be using an i2c fan controller driver. In the past, we noticed that the driver shows in /sys/*/pwm1 the current PWM that the fan is running at, instead of the PWM that we have echoed to it (two different registers for the purposes). So pwm1 in sysfs does not display right away what we just set (it changes with delay step by step, specially with large gap). This when combined with dbus-sensors fansensor mechanism of setting and checking causes an issue that when you set to Target a PWM that you have just set before the current Target value, it will fail to write to sysfs. If you echo directly to pwm1 a value a lot larger or smaller than the current pwm, and cat it and see that it increases or decreases slowly towards the goal, then we are facing the same prob.

It appears to exhibit some similarities, and we are currently conducting continuous experimentation to determine if we can replicate the same conditions. Additionally, I would like to inquire if you have any reference or recommended solutions for this issue?

If you have the same prob as ours (we are using MAX31790 i2c fan controller), then the Target property of "/xyz/openbmc_project/control/fanpwm/x" is showing the PWM at which the fans are currently running (from sysfs pwm1). Because dbus-sensors alr has the Value property under "/xyz/openbmc_project/sensors/fan_pwm/x" paths to indicate that, we don't want Target to have the same meaning. We added a patch in the driver to change to using the other register MAX31790_REG_PWMOUT to let sysfs pwm1 show what we echoed to it, instead of MAX31790_REG_PWM_DUTY_CYCLE.

Thank you, it does indeed appear to be the problem as you described. Here's the test log:

busctl set-property xyz.openbmc_project.FanSensor /xyz/openbmc_project/control/fanpwm/fan0_pwm xyz.openbmc_project.Control.FanPwm Target t 49
cat pwm1 ->185
cat pwm1 ->120
cat pwm1 -> 54
cat pwm1 -> 49

https://github.com/AspeedTech-BMC/linux/blob/aspeed-master-v5.15/drivers/hwmon/max31790.c#L110

Thank you for providing your excellent suggestion. It is indeed true that there is a delay issue in reading the register values for fan control in MAX31790. Following your advice, we were using MAX31790_REG_PWMOUT for reading, which effectively resolved this problem. We have also successfully tested it on the machine.

chaul-ampere commented 1 year ago

I'm glad it helped. It's our temporary solution and we have not yet figured out what to do more about this issue or should the choice between the two regs be configurable in the device tree, so we just left it there waiting for progress from outside.

chiuyikai24 commented 1 year ago

I'm glad it helped. It's our temporary solution and we have not yet figured out what to do more about this issue or should the choice between the two regs be configurable in the device tree, so we just left it there waiting for progress from outside.

Currently, this approach would be the best temporary solution. I believe it is more closely related to the fan control of the IC itself, but we can only wait and see if there are any external updates. Thank you once again.

huangalang commented 1 year ago

@chaul-ampere thanks for your advice , as you described that we need to either modify driver behaviour of setting/getting pwm value or modify dbus-sensor , fan sensor behaviour , you approach it by modifying the driver , however we are not sure that's the right solution for us , we got to think it through

thanks