Closed lubosz closed 3 years ago
Hi @lubosz. I've just double-checked on 5.10 & RX5700 with these curve settings:
OD_VDDC_CURVE:
0: 800MHz 706mV
1: 1445MHz 810mV
2: 1850MHz 1200mV
Note that my card is HW limited to 1850MHz @ 1200mV so 2090MHz @ 1201mV can't be applied (resulting in same error as yours). The above worked without issue, so this ain't #18. I'm baffled why 1: 1445MHz 810mV
does not work for you (perhaps voltage is too low?), it works here...
Judging by your script output, 0: 800MHz 706mV
worked fine, but setting of states :1
& :2
were refused by kernel driver. I don't think this is a script problem, looks like your card driver is not accepting these values due to the HW limits or some other issue.
Please try with some more conservative values for curve points 1 & 2 and report back.
P.S. You may be able to override these limits (and many more things) with https://github.com/sibradzic/upp.
@lubosz ping
Hi thanks for the reply and patches.
So I retried with a more recent, this time vanilla arch kernel:
$ uname -a
Linux bstation 5.10.4-arch2-1 #1 SMP PREEMPT Fri, 01 Jan 2021 05:29:53 +0000 x86_64 GNU/Linux
The values of the VDDC curve have changed, being slightly higher than before.
$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 2095Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz 705mV
1: 1447MHz 811mV
2: 2095MHz 1204mV
OD_RANGE:
SCLK: 800Mhz 2150Mhz
MCLK: 625Mhz 950Mhz
VDDC_CURVE_SCLK[0]: 800Mhz 2150Mhz
VDDC_CURVE_VOLT[0]: 750mV 1200mV
VDDC_CURVE_SCLK[1]: 800Mhz 2150Mhz
VDDC_CURVE_VOLT[1]: 750mV 1200mV
VDDC_CURVE_SCLK[2]: 800Mhz 2150Mhz
VDDC_CURVE_VOLT[2]: 750mV 1200mV
I applied them to the config:
$ cat /etc/default/amdgpu-custom-states.card0
# For Navi (and Radeon7) we can only set highest SCLK & MCLK, "state 1":
OD_SCLK:
1: 2090Mhz
OD_MCLK:
1: 875MHz
# More fine-grain control of clocks and voltages are done with VDDC curve:
OD_VDDC_CURVE:
#0: 800MHz 706mV
#1: 1445MHz 810mV
#2: 2090MHz 1201mV
0: 800MHz 705mV
1: 1447MHz 811mV
2: 2095MHz 1204mV
# Force power limit (in micro watts):
FORCE_PERF_LEVEL: high
The new script now exactly shows that 0 and 2 cannot be applied:
$ sudo amdgpu-clocks
Writen initial backup states to /tmp/amdgpu-custom-states.card0.initial
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
SCLK state 0: 800Mhz
SCLK state 1: 2095Mhz
MCLK state 1: 875MHz
VDDC Curve state 0: 800MHz 705mV
VDDC Curve state 1: 1447MHz 811mV
VDDC Curve state 2: 2095MHz 1204mV
Maximum clocks & voltages:
SCLK clock 2150Mhz
MCLK clock 950Mhz
Curent power cap: 220W
Verifying user state values at /etc/default/amdgpu-custom-states.card0:
SCLK state 1: 2090Mhz
MCLK state 1: 875MHz
VDDC Curve state 0: 800MHz 705mV
VDDC Curve state 1: 1447MHz 811mV
VDDC Curve state 2: 2095MHz 1204mV
Force performance level to high
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
/usr/bin/amdgpu-clocks: line 140: echo: write error: Invalid argument
ERROR: echo vc 0 800 705 > /sys/class/drm/card0/device/pp_od_clk_voltage
/usr/bin/amdgpu-clocks: line 140: echo: write error: Invalid argument
ERROR: echo vc 2 2095 1204 > /sys/class/drm/card0/device/pp_od_clk_voltage
Done
Please try with some more conservative values for curve points 1 & 2 and report back.
Should I lower the MHz or the mV? Can I break my hardware with wrong values here? Shouldn't the values advertized by pp_od_clk_voltage
work?
Should I lower the MHz or the mV?
You can try different voltage first, I suspect the driver is somehow imposing limits on those. As root you can execute these commands directly in order to find working values:
echo vc 0 800 705 > /sys/class/drm/card0/device/pp_od_clk_voltage
^ in above vc 0
case try using different voltage, such as 800 or more.
echo vc 2 2095 1204 > /sys/class/drm/card0/device/pp_od_clk_voltage
^ in the vc 2
try lower voltage such as 1150 as well as lower clock such as 1800.
You can actually check and adjust the voltage limits using https://github.com/sibradzic/upp, the command to check the limis would be (the outputs are millivolts multiplied by 4):
pip3 install --user upp
upp dump | grep "\(Min\|Max\)Voltage\(Gfx\|Soc\)"
Can I break my hardware with wrong values here?
Probably not, but the risk is entierly on you, don't blame me if something weird happens. The values I am suggesting above should not break anything, you may end up with hard reset in the worst case scenario...
Shouldn't the values advertized by pp_od_clk_voltage work?
They should. But I've seen with my own eyes that they do not under some particular conditions. The driver / MCU firmware is to be blamed, nothing we can do to fix that...
ping?
I am getting the same error, with Radeon 6800, Linux 5.11.0-27-generic
The default looks like this;
$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 500Mhz
1: 2294Mhz
OD_MCLK:
0: 97Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK: 500Mhz 2600Mhz
MCLK: 674Mhz 1075Mhz
Note that the default value 0: 97Mhz
is actually lower than the acceptable range of values (674Mhz 1075Mhz
)
This seems to be causing a problem, because when I run the script with an empty /etc/default/amdgpu-custom-state.card0
file, I get the same error
(I added debug lines in the script code to help)
$ sudo ./amdgpu-clocks
Won't write initial state to /tmp/amdgpu-custom-state.card0.initial, it already exists.
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
SCLK state 0: 500Mhz
SCLK state 1: 2294Mhz
MCLK state 0: 97Mhz
MCLK state 1: 1000MHz
VDD GFX Offset: 0mV
Maximum clocks & voltages:
SCLK clock 2600Mhz
MCLK clock 1075Mhz
Curent power cap: 203W
Verifying user state values at /etc/default/amdgpu-custom-state.card0:
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
>>> setting CSTATE: s 0 500 > /sys/class/drm/card0/device/pp_od_clk_voltage
>>> setting CSTATE: s 1 2294 > /sys/class/drm/card0/device/pp_od_clk_voltage
>>> setting MSTATE: m 0 97 > /sys/class/drm/card0/device/pp_od_clk_voltage
./amdgpu-clocks: line 156: echo: write error: Invalid argument
>>> setting MSTATE: m 1 1000 > /sys/class/drm/card0/device/pp_od_clk_voltage
>>> setting VDDGFX_OFFSET: vo 0 > /sys/class/drm/card0/device/pp_od_clk_voltage
>>> commiting changes c > /sys/class/drm/card0/device/pp_od_clk_voltage
Done
In my custom power states, I attempted to "rectify" this by resetting the values to the supposedly acceptable range;
OD_MCLK:
0: 674Mhz
However this seems to just crash my PC within seconds of applying the changes.
Due to this issue re-applying the default settings with amdgpu-clocks
, it seems I am not actually able to apply custom settings either? No other custom state changes I have attempted to make have worked.
@stevekm check https://github.com/sibradzic/amdgpu-clocks/issues/32, read comments from https://github.com/sibradzic/amdgpu-clocks/issues/32#issuecomment-829787298 onwards...
Hi. I am getting Issue https://github.com/sibradzic/amdgpu-clocks/issues/18 on Kernel 5.10.
RX5700 XT and
amdgpu.ppfeaturemask=0xffffffff
parameters.