sibradzic / amdgpu-clocks

Simple script to control power states of amdgpu driven GPUs
GNU General Public License v2.0
390 stars 43 forks source link

ROCm 3.8 #17

Closed onur-v closed 3 years ago

onur-v commented 3 years ago

The script was working perfectly until I updated to ROCm 3.8, hanging after the line Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:.

sibradzic commented 3 years ago

Hmmm, can you share the contents of a default /sys/class/drm/card0/device/pp_od_clk_voltage? What is your card and kernel version?

onur-v commented 3 years ago

The card is Radeon VII, Ubuntu 20.04 kernel 5.4.0-47. The contents of /sys/class/drm/card0/device/pp_od_clk_voltage is

OD_SCLK:
0:        808Mhz
1:       1801Mhz
OD_MCLK:
1:       1000Mhz
OD_VDDC_CURVE:
0:        808Mhz        716mV
1:       1304Mhz        799mV
2:       1801Mhz       1081mV
OD_RANGE:
SCLK:     808Mhz       2200Mhz
MCLK:     800Mhz       1200Mhz
VDDC_CURVE_SCLK[0]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[0]:     738mV        1218mV
VDDC_CURVE_SCLK[1]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[1]:     738mV        1218mV
VDDC_CURVE_SCLK[2]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[2]:     738mV        1218mV
onur-v commented 3 years ago

I guess the problem is related to ROCm 3.8, others have reported the same issue. See https://github.com/RadeonOpenCompute/ROCm/issues/1228

sibradzic commented 3 years ago

OK, looks like the format of the pp_od_clk_voltage changed (no more @ in the curve and so on). This being Radeon VII, what was the format before ROCm 3.8? Also, what's in your custom states file?

onur-v commented 3 years ago

My custom states file was this:

OD_SCLK:
1:       1801Mhz
OD_MCLK:
1:       1100Mhz
OD_VDDC_CURVE:
0:       808Mhz  @ 715mV
1:       1304Mhz @ 800mV
2:       1801Mhz @ 981mV
FORCE_POWER_CAP: 300000000
FORCE_PERF_LEVEL: manual

https://www.reddit.com/r/linux_gaming/comments/au7m3x/radeon_vii_on_linux_overclocking_undervolting/ this post from 1 year ago suggests that the format hasn't actually changed. The format that the poster shares in the link is identical to what I currently have.

sibradzic commented 3 years ago

I meant the format of the pp_od_clk_voltage has been changed as set by the driver itself, not the custom state file... Since I don't have Radeon VII to check myself, please try to paste the output of /sys/class/drm/card0/device/pp_od_clk_voltage, before ROCm 3.8, we need to compare these outputs...

onur-v commented 3 years ago

I've just thad the chance to roll back to 3.7. This is the state of /sys/class/drm/card0/device/pp_od_clk_voltage for ROCm 3.7, the last release where the script works.

OD_SCLK:
0:        808Mhz
1:       1801Mhz
OD_MCLK:
1:       1000Mhz
OD_VDDC_CURVE:
0:        808Mhz        716mV
1:       1304Mhz        800mV
2:       1801Mhz       1081mV
OD_RANGE:
SCLK:     808Mhz       2200Mhz
MCLK:     800Mhz       1200Mhz
VDDC_CURVE_SCLK[0]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[0]:     738mV        1218mV
VDDC_CURVE_SCLK[1]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[1]:     738mV        1218mV
VDDC_CURVE_SCLK[2]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[2]:     738mV        1218mV
sibradzic commented 3 years ago

It looks like AMD is re-hauling the OverDrive API with ROCm 3,8 and kernel 5.10, a tots of stuff is getting changed. I'll try to reproduce your issue on my RX5700 as soon as 5.10 rc is out.

sibradzic commented 3 years ago

This could be related to #18 Can you check if ROCm 3.8 driver contains the code along this patch in place?

sibradzic commented 3 years ago

Finally had time to test 5.10, and there is indeed an issue, as pp_od_clk_voltage output changed in the driver, yet again. @onur-v possible fix @ https://github.com/sibradzic/amdgpu-clocks/commit/5b48f04, please check.

onur-v commented 3 years ago

The fix is working. Thanks a lot!

sibradzic commented 3 years ago

@onur-v thanks for reporting! have a good day!