ilya-zlobintsev / LACT

Linux AMDGPU Configuration Tool
MIT License
1.14k stars 30 forks source link

Feature request: Minimum GPU clock #129

Closed infirit closed 1 year ago

infirit commented 1 year ago

The aggressive power management in radeon cards make it that for certain loads it doesn't clock up properly. My specific use-case is emulating Metroid Prime trilogy in the Dolphin emulator. On linux I now have to force maximum clocks but on WIndows I can raise the minimum clock to around 800-1000mhz which is enough for the gpu to do the work it has to do.

I don't know if this is exposed in the linux driver but it would be nice to have as maximum clock makes my 6700xt coil whine a bit.

Other than that, really happy I found LACT. It works really well and I don't have to fiddle with command line scripts anymore :+1:

ilya-zlobintsev commented 1 year ago

This should be reasonably easy to do. I'm also working on adding support for configuring enabled power states (#23), which would cover your case as well (you could disable the lower states), but simply setting the minimum clock should suffice too.

ilya-zlobintsev commented 1 year ago

Implemented in https://github.com/ilya-zlobintsev/LACT/commit/5d4776d394992387aeb01dd85f1af4f6c1c599bf, please check the latest git version

infirit commented 1 year ago

Works perfectly. Many thanks!

infirit commented 1 year ago

I have some bad news, not about setting the minimum GPU clock but for the minimum memory clock. Setting this will make my system lock up on shutdown and boot. I had to manually remove it from the config file. This is the result on boot :rofl: IMG_2023-03-18-23-14-05-011

neon-sunset commented 1 year ago

RDNA2 GPUs are very sensitive to memory clock changes and "minimum clock" for memory behaves differently to GPU one. It is best to leave it untouched unless absolutely necessary.

ilya-zlobintsev commented 1 year ago

I have some bad news, not about setting the minimum GPU clock but for the minimum memory clock. Setting this will make my system lock up on shutdown and boot. I had to manually remove it from the config file. This is the result on boot rofl IMG_2023-03-18-23-14-05-011

Does this happen when you set the minimum frequency manually? E.g. echo 'm 0 500' | sudo tee /sys/class/drm/card0/device/pp_od_clk_voltage. If it doesn't, then it might have to do with how LACT handles the voltage curve when changing clockspeeds.

I'll also try and make a settings revert timer to make it harder to brick your system :D

infirit commented 1 year ago

Does this happen when you set the minimum frequency manually? E.g. echo 'm 0 500' | sudo tee /sys/class/drm/card0/device/pp_od_clk_voltage. If it doesn't, then it might have to do with how LACT handles the voltage curve when changing clockspeeds.

Using m gives me invalid argument, s does work but not sure it's what you wanted. With that changed things seem ok on shutdown at least.

edit: here is the contents of pp_od_clk_voltage if it helps

OD_SCLK:
0: 500Mhz
1: 2800Mhz
OD_MCLK:
0: 97Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK:     500Mhz       2800Mhz
MCLK:     674Mhz       1075Mhz
ilya-zlobintsev commented 1 year ago

Does this happen when you set the minimum frequency manually? E.g. echo 'm 0 500' | sudo tee /sys/class/drm/card0/device/pp_od_clk_voltage. If it doesn't, then it might have to do with how LACT handles the voltage curve when changing clockspeeds.

Using m gives me invalid argument, s does work but not sure it's what you wanted. With that changed things seem ok on shutdown at least.

edit: here is the contents of pp_od_clk_voltage if it helps

OD_SCLK:
0: 500Mhz
1: 2800Mhz
OD_MCLK:
0: 97Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK:     500Mhz       2800Mhz
MCLK:     674Mhz       1075Mhz

s sets the core clock, m sets the memory clock. 500 was just an example value, your GPU only allows setting the memory clock between 675 and 1075mhz. The question is whether setting the minimum memory clock manually by writing to pp_od_clk_voltage leads to the same issues as it does when changing it via LACT.

infirit commented 1 year ago

Oh, yeah, I blame not having had coffee yet :coffee: . Manually setting it seems fine, restarted with no issue.

$ sudo echo 'm 0 700' | sudo tee /sys/class/drm/card0/device/pp_od_clk_voltage 
m 0 700
$ cat /sys/class/drm/card0/device/pp_od_clk_voltage 
OD_SCLK:
0: 500Mhz
1: 2800Mhz
OD_MCLK:
0: 700Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK:     500Mhz       2800Mhz
MCLK:     674Mhz       1075Mhz
ilya-zlobintsev commented 1 year ago

I've managed to reproduce a similar issue on my Vega 56. It seems to be a either a driver or a hardware issue triggered by the following sequence of events:

Shutting down the system triggers the issue only about 50% of the time for me. I could not reproduce the issue on boot - setting values for the first time always works. However the issue always happens when restarting the daemon, as it first resets the GPU settings and then tries to apply them again.

This only happens when changing the minimum memory clock, resetting and reapplying the maximum clock is handled properly by the driver.

A workaround for this would be to avoid resetting the clocks table on daemon shutdown. I do not want to make this the default behaviour, as it would leave the clocks in a potentially inconsistent state after shutting down the daemon on configurations where this reset problem does not occur, but it could be added as a config option. This would solve the problem on shutdown and daemon restart, though I'm still not entirely sure why the problem happens on boot.

ilya-zlobintsev commented 1 year ago

@infirit please try the latest git version and set disable_clocks_cleanup: true under the daemon section in /etc/lact/config.yaml

infirit commented 1 year ago

Unfortunately no change. Can't boot or shutdown with disable_clocks_cleanup: true and min_memory_clock set.

edit: Can you just not write this to the config file if it's just the default? That would at least not trigger this cause just removing the offending line in the config make things work.

ilya-zlobintsev commented 1 year ago

edit: Can you just not write this to the config file if it's just the default? That would at least not trigger this cause just removing the offending line in the config make things work.

Implemented in https://github.com/ilya-zlobintsev/LACT/commit/1839107d3973cce544f68d4b6a0e455790d4f5fc

ilya-zlobintsev commented 1 year ago

@infirit did the fix help at least avoid the issue for you? Because I'm not sure what else can be done from lact's side, given that's just how the driver behaves :shrug:

infirit commented 1 year ago

Sorry, busy week. Yeah, not setting the min mem clock unnecessary is enough to workaround this. Thanks.

ps: if you need some testing done on the 6700xt in the future let me know :-)