sibradzic / amdgpu-clocks

Simple script to control power states of amdgpu driven GPUs
GNU General Public License v2.0
390 stars 43 forks source link

line 140: echo: write error: Invalid argument #23

Closed lubosz closed 3 years ago

lubosz commented 3 years ago

Hi. I am getting Issue https://github.com/sibradzic/amdgpu-clocks/issues/18 on Kernel 5.10.

RX5700 XT and amdgpu.ppfeaturemask=0xffffffff parameters.

$ sudo amdgpu-clocks
Won't write initial state to /tmp/amdgpu-custom-states.card0.initial, it already exists.
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 800Mhz
  SCLK state 1: 2090Mhz
  MCLK state 1: 875MHz
  VDDC Curve state 0: 800MHz 706mV
  VDDC Curve state 1: 1445MHz 810mV
  VDDC Curve state 2: 2090MHz 1201mV
  Maximum clocks & voltages:
    SCLK clock 2150Mhz
    MCLK clock 950Mhz
  Curent power cap: 220W
Verifying user state values at /etc/default/amdgpu-custom-states.card0:
  SCLK state 1: 2090Mhz
  MCLK state 1: 875MHz
  VDDC Curve state 0: 800MHz 706mV
  VDDC Curve state 1: 1445MHz 810mV
  VDDC Curve state 2: 2090MHz 1201mV
  Force performance level to high
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
/usr/bin/amdgpu-clocks: line 140: echo: write error: Invalid argument
/usr/bin/amdgpu-clocks: line 140: echo: write error: Invalid argument
  Done
$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 2090Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz 706mV
1: 1445MHz 810mV
2: 2090MHz 1201mV
OD_RANGE:
SCLK:     800Mhz       2150Mhz
MCLK:     625Mhz        950Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[0]:     750mV        1200mV
VDDC_CURVE_SCLK[1]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[1]:     750mV        1200mV
VDDC_CURVE_SCLK[2]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[2]:     750mV        1200mV
$ cat /etc/default/amdgpu-custom-states.card0
# For Navi (and Radeon7) we can only set highest SCLK & MCLK, "state 1":
OD_SCLK:
1: 2090Mhz
OD_MCLK:
1: 875MHz
# More fine-grain control of clocks and voltages are done with VDDC curve:
OD_VDDC_CURVE:
0: 800MHz 706mV
1: 1445MHz 810mV
2: 2090MHz 1201mV
# Force power limit (in micro watts):
FORCE_PERF_LEVEL: high
$ uname -a
Linux bstation 5.10.1-103-tkg-upds #1 TKG SMP PREEMPT Wed, 16 Dec 2020 23:18:54 +0000 x86_64 GNU/Linux
$ lspci | grep VGA
0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c1)
sibradzic commented 3 years ago

Hi @lubosz. I've just double-checked on 5.10 & RX5700 with these curve settings:

OD_VDDC_CURVE:
0: 800MHz 706mV
1: 1445MHz 810mV
2: 1850MHz 1200mV

Note that my card is HW limited to 1850MHz @ 1200mV so 2090MHz @ 1201mV can't be applied (resulting in same error as yours). The above worked without issue, so this ain't #18. I'm baffled why 1: 1445MHz 810mV does not work for you (perhaps voltage is too low?), it works here...

Judging by your script output, 0: 800MHz 706mV worked fine, but setting of states :1 & :2 were refused by kernel driver. I don't think this is a script problem, looks like your card driver is not accepting these values due to the HW limits or some other issue.

Please try with some more conservative values for curve points 1 & 2 and report back.

P.S. You may be able to override these limits (and many more things) with https://github.com/sibradzic/upp.

sibradzic commented 3 years ago

@lubosz ping

lubosz commented 3 years ago

Hi thanks for the reply and patches.

So I retried with a more recent, this time vanilla arch kernel:

$ uname -a
Linux bstation 5.10.4-arch2-1 #1 SMP PREEMPT Fri, 01 Jan 2021 05:29:53 +0000 x86_64 GNU/Linux

The values of the VDDC curve have changed, being slightly higher than before.

$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 800Mhz
1: 2095Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz 705mV
1: 1447MHz 811mV
2: 2095MHz 1204mV
OD_RANGE:
SCLK:     800Mhz       2150Mhz
MCLK:     625Mhz        950Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[0]:     750mV        1200mV
VDDC_CURVE_SCLK[1]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[1]:     750mV        1200mV
VDDC_CURVE_SCLK[2]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[2]:     750mV        1200mV

I applied them to the config:

$ cat /etc/default/amdgpu-custom-states.card0
# For Navi (and Radeon7) we can only set highest SCLK & MCLK, "state 1":
OD_SCLK:
1: 2090Mhz
OD_MCLK:
1: 875MHz
# More fine-grain control of clocks and voltages are done with VDDC curve:
OD_VDDC_CURVE:
#0: 800MHz 706mV
#1: 1445MHz 810mV
#2: 2090MHz 1201mV
0: 800MHz 705mV
1: 1447MHz 811mV
2: 2095MHz 1204mV

# Force power limit (in micro watts):
FORCE_PERF_LEVEL: high

The new script now exactly shows that 0 and 2 cannot be applied:

$ sudo amdgpu-clocks
Writen initial backup states to /tmp/amdgpu-custom-states.card0.initial
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 800Mhz
  SCLK state 1: 2095Mhz
  MCLK state 1: 875MHz
  VDDC Curve state 0: 800MHz 705mV
  VDDC Curve state 1: 1447MHz 811mV
  VDDC Curve state 2: 2095MHz 1204mV
  Maximum clocks & voltages:
    SCLK clock 2150Mhz
    MCLK clock 950Mhz
  Curent power cap: 220W
Verifying user state values at /etc/default/amdgpu-custom-states.card0:
  SCLK state 1: 2090Mhz
  MCLK state 1: 875MHz
  VDDC Curve state 0: 800MHz 705mV
  VDDC Curve state 1: 1447MHz 811mV
  VDDC Curve state 2: 2095MHz 1204mV
  Force performance level to high
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
/usr/bin/amdgpu-clocks: line 140: echo: write error: Invalid argument
ERROR: echo vc 0 800 705 > /sys/class/drm/card0/device/pp_od_clk_voltage
/usr/bin/amdgpu-clocks: line 140: echo: write error: Invalid argument
ERROR: echo vc 2 2095 1204 > /sys/class/drm/card0/device/pp_od_clk_voltage
  Done

Please try with some more conservative values for curve points 1 & 2 and report back.

Should I lower the MHz or the mV? Can I break my hardware with wrong values here? Shouldn't the values advertized by pp_od_clk_voltage work?

sibradzic commented 3 years ago

Should I lower the MHz or the mV?

You can try different voltage first, I suspect the driver is somehow imposing limits on those. As root you can execute these commands directly in order to find working values: echo vc 0 800 705 > /sys/class/drm/card0/device/pp_od_clk_voltage ^ in above vc 0 case try using different voltage, such as 800 or more. echo vc 2 2095 1204 > /sys/class/drm/card0/device/pp_od_clk_voltage ^ in the vc 2 try lower voltage such as 1150 as well as lower clock such as 1800.

You can actually check and adjust the voltage limits using https://github.com/sibradzic/upp, the command to check the limis would be (the outputs are millivolts multiplied by 4): pip3 install --user upp upp dump | grep "\(Min\|Max\)Voltage\(Gfx\|Soc\)"

Can I break my hardware with wrong values here?

Probably not, but the risk is entierly on you, don't blame me if something weird happens. The values I am suggesting above should not break anything, you may end up with hard reset in the worst case scenario...

Shouldn't the values advertized by pp_od_clk_voltage work?

They should. But I've seen with my own eyes that they do not under some particular conditions. The driver / MCU firmware is to be blamed, nothing we can do to fix that...

sibradzic commented 3 years ago

ping?

stevekm commented 2 years ago

I am getting the same error, with Radeon 6800, Linux 5.11.0-27-generic

The default looks like this;

$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0: 500Mhz
1: 2294Mhz
OD_MCLK:
0: 97Mhz
1: 1000MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK:     500Mhz       2600Mhz
MCLK:     674Mhz       1075Mhz

Note that the default value 0: 97Mhz is actually lower than the acceptable range of values (674Mhz 1075Mhz)

This seems to be causing a problem, because when I run the script with an empty /etc/default/amdgpu-custom-state.card0 file, I get the same error

(I added debug lines in the script code to help)

$ sudo ./amdgpu-clocks
Won't write initial state to /tmp/amdgpu-custom-state.card0.initial, it already exists.
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 500Mhz
  SCLK state 1: 2294Mhz
  MCLK state 0: 97Mhz
  MCLK state 1: 1000MHz
  VDD GFX Offset: 0mV
  Maximum clocks & voltages:
    SCLK clock 2600Mhz
    MCLK clock 1075Mhz
  Curent power cap: 203W
Verifying user state values at /etc/default/amdgpu-custom-state.card0:
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
>>> setting CSTATE: s 0 500 > /sys/class/drm/card0/device/pp_od_clk_voltage
>>> setting CSTATE: s 1 2294 > /sys/class/drm/card0/device/pp_od_clk_voltage
>>> setting MSTATE: m 0 97 > /sys/class/drm/card0/device/pp_od_clk_voltage
./amdgpu-clocks: line 156: echo: write error: Invalid argument
>>> setting MSTATE: m 1 1000 > /sys/class/drm/card0/device/pp_od_clk_voltage
>>> setting VDDGFX_OFFSET: vo 0 > /sys/class/drm/card0/device/pp_od_clk_voltage
>>> commiting changes c > /sys/class/drm/card0/device/pp_od_clk_voltage
  Done

In my custom power states, I attempted to "rectify" this by resetting the values to the supposedly acceptable range;

OD_MCLK:
0: 674Mhz

However this seems to just crash my PC within seconds of applying the changes.

Due to this issue re-applying the default settings with amdgpu-clocks, it seems I am not actually able to apply custom settings either? No other custom state changes I have attempted to make have worked.

sibradzic commented 2 years ago

@stevekm check https://github.com/sibradzic/amdgpu-clocks/issues/32, read comments from https://github.com/sibradzic/amdgpu-clocks/issues/32#issuecomment-829787298 onwards...