Closed andrew-ld closed 2 months ago
Interesting - there have already been issues with the order in which settings are applied, but lact should handle what's described in the issue fine. The current order for apply settings is:
Code that handles this: https://github.com/ilya-zlobintsev/LACT/blob/master/lact-daemon/src/server/gpu_controller/mod.rs#L719 Did you manage to hit the issue when applying the settings in lact, or are you just informing about its existence?
I opened this issue to keep track of the status of things, however actually even I can't change the fan speed on my sapphire 7900xtx.
for example, I've tried firing the fans to full on the curve and also with static speed and nothing seems to happen.
Is this the case only when you set the fan speed using lact, or when manually writing to the sysfs (like the examples in the linked issue) as well?
lact
Interesting - there have already been issues with the order in which settings are applied, but lact should handle what's described in the issue fine. The current order for apply settings is:
* Power cap * Clocks table * Performance level * Fan curve
Code that handles this: https://github.com/ilya-zlobintsev/LACT/blob/master/lact-daemon/src/server/gpu_controller/mod.rs#L719 Did you manage to hit the issue when applying the settings in lact, or are you just informing about its existence?
When I write anything to /sys/class/drm/card?/device/gpu_od/fan_ctrl/{acoustic_limit_rpm_threshold,acoustic_target_rpm_threshold,fan_minimum_pwm,fan_target_temperature}
, everything set via pp_od_clk_voltage
gets ignored by the GPU, no matter in which order they are set. So it is not possible to alter, for example, fan_minimum_pwm
when also setting clock speeds. fan_curve
seems to be the exception in my limited testing when set before altering pp_od_clk_voltage
.
Doing things manually, it is possible to set clock speeds, voltage offset, and fan curve, but I am unable to do so in LACT without it getting ignored by the GPU since LACT seems to always restore the serialized values for those aforementioned settings even if they are default values.
Though I am aware this is primarily a driver issue, it would be nice to have a way to not write to all fan_ctrl/*
sysfs files when applying other settings/launching lactd.
Sapphire NITRO+ RX 7900 XTX Vapor-X
Kernel 6.10.0-0.rc3.20240612git2ef5971ff345.33
Makes sense, we can at least avoid writing to the files if the value is unchanged.
@zenofile i've added checks for this in https://github.com/ilya-zlobintsev/LACT/commit/ca3e54015a39f7cc0c840643def5e642ef8ef101, could you test if it helps?
Thanks for looking into this. When the Automatic fan mode is enabled with default values, it seems it is working like intended, however when Curve is active, even with default values, it doesn't seem to work.
Thermals → Automatic, default values OC → Basic → Clocks + Voltage offset altered → Apply
⇒ OC Values are applied and working, however fan_curve is still written to (reset?).
Thermals → Curve, default values OC → Basic → Clocks + Voltage offset altered → Apply
⇒ OC values are ignored by the GPU, fan_curve is written to last.
But this is better than it was before; now when lactd is restarted at least clockspeed and voltage values are respected in unaltered automatic fan mode (default).
I tried experimenting with the order a little: writing any values into pp_od_clk_voltage
after the fan values are committed, the OC settings get ignored by the GPU. The actual committing can be done in any order though. So ensuring to only commit at the end after everything is written, it works fine. Maybe this was clear from the beginning, but I did not find any documentation mentioning this.
Also resets can be issued on fan_curve
, acoustic_limit_rpm_threshold
and acoustic_target_rpm_threshold
. Any reset on fan_minimum_pwm
or fan_target_temperature
after pp_od_clk_voltage
was committed and the OC settings are getting ignored again 🤷🏻 .
For example, this works fine:
gpu=card1
device=/sys/class/drm/${gpu}/device
fan=/sys/class/drm/${gpu}/device/gpu_od/fan_ctrl
echo 'r' > $fan/fan_target_temperature
echo 'r' > $fan/acoustic_target_rpm_threshold
echo 'r' > $fan/acoustic_limit_rpm_threshold
echo 'r' > $fan/fan_minimum_pwm
sleep 0.25s
echo 'auto' > $device/power_dpm_force_performance_level
echo '25' > $fan/fan_minimum_pwm
echo '75' > $fan/fan_target_temperature
echo 's 1 2525' > $device/pp_od_clk_voltage
echo 'vo -100' > $device/pp_od_clk_voltage
echo 'c' > $fan/fan_minimum_pwm
echo 'c' > $fan/fan_target_temperaturee
echo 'c' > $device/pp_od_clk_voltage
Interesting. Currently the values are committed right away, i'll see if i can make it deferred until everything is written
@zenofile i've pushed the new logic where everything is committed at once to the deferred-commit
branch, could you test if it works?
Unfortunately the OD values get ignored.
Some data when launching the lact daemon, all relevant GPU settings were reset manually beforehand (but it makes no difference when not):
info.json
:{
"initramfs_type": "Dracut",
"system_info": {
"amdgpu_overdrive_enabled": true,
"commit": "8638d24",
"kernel_version": "6.10.0-0.rc3.20240612git2ef5971ff345.36.local.fc40.x86_64",
"profile": "release",
"version": "0.5.5"
}
}
/etc/lact/config.yaml
:daemon:
log_level: debug
admin_groups:
- wheel
- sudo
disable_clocks_cleanup: false
apply_settings_timer: 5
gpus:
xxx-0000:03:00.0:
fan_control_enabled: false
fan_control_settings:
mode: curve
static_speed: 0.5
temperature_key: edge
interval_ms: 500
curve:
40: 0.15
50: 0.29999998
60: 0.45
70: 0.65
80: 0.9
spindown_delay_ms: 0
change_threshold: 0
pmfw_options:
acoustic_limit: 3200
acoustic_target: 1450
minimum_pwm: 25
target_temperature: 75
performance_level: auto
max_core_clock: 2525
voltage_offset: -100
power_states: {}
/usr/bin/lact daemon
DEBUG lact_daemon: current system uptime: 3162.4s
INFO lact_daemon::socket: listening on "/var/run/lactd.sock"
DEBUG lact_daemon::server::handler: initialized GPU controller xxx-0000:03:00.0 for path "/sys/class/drm/card1/device"
DEBUG lact_daemon::server::handler: found intialized drm entry for device "/sys/bus/pci/devices/0000:03:00.0"
INFO lact_daemon::server::handler: initialized 1 GPUs
DEBUG lact_daemon::server::gpu_controller: writing clocks commands: [
"s 1 2525",
"vo -100",
]
inotifywait -qrme modify .
./ MODIFY pp_od_clk_voltage
./ MODIFY power_dpm_force_performance_level
./ MODIFY power_dpm_force_performance_level
./ MODIFY pp_od_clk_voltage
./ MODIFY power_dpm_force_performance_level
./gpu_od/fan_ctrl/ MODIFY fan_curve
./gpu_od/fan_ctrl/ MODIFY fan_curve
./gpu_od/fan_ctrl/ MODIFY fan_target_temperature
./gpu_od/fan_ctrl/ MODIFY fan_minimum_pwm
./ MODIFY pp_od_clk_voltage
./ MODIFY pp_od_clk_voltage
./gpu_od/fan_ctrl/ MODIFY fan_target_temperature
./gpu_od/fan_ctrl/ MODIFY fan_minimum_pwm
When altering fan and clock settings in the GUI and applying, the values are ignored as well and the inotify event list is quite extensive.
It would help to see what is actually written to the sysfs by the daemon, is there a logging setting I can enable? Debug level seems to only print clockspeed settings.
I did strace the writes and tried it manually in that order. The culprit is the reset on fan_curve
. Somehow in this example, it causes issues. When leaving it out or moving it after the writes to fan_target_temperature
and fan_minimum_pwm
or before writes to pp_od_clk_voltage
, it seems to work fine. What a mess.
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/pp_od_clk_voltage>, "r\n", 2) = 2
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/power_dpm_force_performance_level>, "auto", 4) = 4
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/pp_od_clk_voltage>, "s 1 2525\n", 9) = 9
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/pp_od_clk_voltage>, "vo -100\n", 8) = 8
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/power_dpm_force_performance_level>, "auto", 4) = 4
** write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/gpu_od/fan_ctrl/fan_curve>, "r\n", 2) = 2
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/gpu_od/fan_ctrl/fan_target_temperature>, "76\n", 3) = 3
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/gpu_od/fan_ctrl/fan_minimum_pwm>, "26\n", 3) = 3
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/pp_od_clk_voltage>, "c\n", 2) = 2
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/gpu_od/fan_ctrl/fan_target_temperature>, "c\n", 2) = 2
write(10</sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/gpu_od/fan_ctrl/fan_minimum_pwm>, "c\n", 2) = 2
I've pushed a commit to reset the fan curve after writing other pmfw values, please tell me if it helps. And thanks for the detailed debug - it's unfortunate that this is so fragile.
It works. Restarting the daemon and altering and applying settings via GUI without daemon restart.
Good to know, I will merge these changes then.
@andrew-ld could you check if this also solves the problem for you?
Closing as this has been implemented and released.
hi, I am the author of the issue https://gitlab.freedesktop.org/drm/amd/-/issues/3131, I think lact developers should be aware of this issue, especially the last comments.
https://gitlab.freedesktop.org/drm/amd/-/issues/3131#note_2415553