sibradzic / amdgpu-clocks

Simple script to control power states of amdgpu driven GPUs
GNU General Public License v2.0
390 stars 43 forks source link

OD_VDDC_CURVE being read as OD_MCLK due to null characters #39

Closed exuvo closed 2 years ago

exuvo commented 2 years ago

I am having a wierd minor problem with reading pp_od_clk_voltage and it having a lot of null characters that break parsing. A simple cat does not show it:

# cat /sys/class/drm/card0/device/pp_od_clk_voltage 
OD_SCLK:
0: 2140Mhz
1: 2150Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz 704mV
1: 1409MHz 820mV
2: 2150MHz 1200mV
OD_RANGE:
SCLK:     800Mhz       2150Mhz
MCLK:     625Mhz        950Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[0]:     750mV        1200mV
VDDC_CURVE_SCLK[1]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[1]:     750mV        1200mV
VDDC_CURVE_SCLK[2]:     800Mhz       2150Mhz
VDDC_CURVE_VOLT[2]:     750mV        1200mV

But xxd shows it. Before OD_VDDC_CURVE there is a lot of null characters and this seems to make the case for OD_VDDC_CURVE to be missed and it then parses those values as if they were OD_MCLK.

# xxd /sys/class/drm/card0/device/pp_od_clk_voltage 
00000000: 4f44 5f53 434c 4b3a 0a30 3a20 3231 3430  OD_SCLK:.0: 2140
00000010: 4d68 7a0a 313a 2032 3135 304d 687a 0a4f  Mhz.1: 2150Mhz.O
00000020: 445f 4d43 4c4b 3a0a 313a 2038 3735 4d48  D_MCLK:.1: 875MH
00000030: 7a0a 0000 0000 0000 0000 0000 0000 0000  z...............
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 004f 445f 5644 4443 5f43 5552 5645 3a0a  .OD_VDDC_CURVE:.
00000060: 303a 2038 3030 4d48 7a20 3730 346d 560a  0: 800MHz 704mV.
00000070: 313a 2031 3430 394d 487a 2038 3230 6d56  1: 1409MHz 820mV
00000080: 0a32 3a20 3231 3530 4d48 7a20 3132 3030  .2: 2150MHz 1200
00000090: 6d56 0a00 0000 0000 0000 0000 0000 0000  mV..............
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000100: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000140: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000150: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000160: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001c0: 0000 0000 0000 0000 4f44 5f52 414e 4745  ........OD_RANGE
000001d0: 3a0a 5343 4c4b 3a20 2020 2020 3830 304d  :.SCLK:     800M
000001e0: 687a 2020 2020 2020 2032 3135 304d 687a  hz       2150Mhz
000001f0: 0a4d 434c 4b3a 2020 2020 2036 3235 4d68  .MCLK:     625Mh
00000200: 7a20 2020 2020 2020 2039 3530 4d68 7a0a  z        950Mhz.
00000210: 5644 4443 5f43 5552 5645 5f53 434c 4b5b  VDDC_CURVE_SCLK[
00000220: 305d 3a20 2020 2020 3830 304d 687a 2020  0]:     800Mhz  
00000230: 2020 2020 2032 3135 304d 687a 0a56 4444       2150Mhz.VDD
00000240: 435f 4355 5256 455f 564f 4c54 5b30 5d3a  C_CURVE_VOLT[0]:
00000250: 2020 2020 2037 3530 6d56 2020 2020 2020       750mV      
00000260: 2020 3132 3030 6d56 0a56 4444 435f 4355    1200mV.VDDC_CU
00000270: 5256 455f 5343 4c4b 5b31 5d3a 2020 2020  RVE_SCLK[1]:    
00000280: 2038 3030 4d68 7a20 2020 2020 2020 3231   800Mhz       21
00000290: 3530 4d68 7a0a 5644 4443 5f43 5552 5645  50Mhz.VDDC_CURVE
000002a0: 5f56 4f4c 545b 315d 3a20 2020 2020 3735  _VOLT[1]:     75
000002b0: 306d 5620 2020 2020 2020 2031 3230 306d  0mV        1200m
000002c0: 560a 5644 4443 5f43 5552 5645 5f53 434c  V.VDDC_CURVE_SCL
000002d0: 4b5b 325d 3a20 2020 2020 3830 304d 687a  K[2]:     800Mhz
000002e0: 2020 2020 2020 2032 3135 304d 687a 0a56         2150Mhz.V
000002f0: 4444 435f 4355 5256 455f 564f 4c54 5b32  DDC_CURVE_VOLT[2
00000300: 5d3a 2020 2020 2037 3530 6d56 2020 2020  ]:     750mV    
00000310: 2020 2020 3132 3030 6d56 0a00 0000 0000      1200mV......
00000320: 0000 0000 0000 0000 0000 0000 0000 0000  ................
lots of trailing nulls omitted here
000009b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000009c0: 0000 0000 0000                           ......

I then get some invalid argument lines as it tries to write a mclk state with voltage which is not allowed on my GPU RX 5700 XT.

# /usr/bin/amdgpu-clocks 
Won't write initial state to /tmp/amdgpu-custom-state.card0.initial, it already exists.
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 800Mhz
  SCLK state 1: 2019Mhz
  MCLK state 1: 875MHz
  MCLK state 0: 800MHz, 704mV     <- This is OD_VDDC_CURVE values
  MCLK state 1: 1409MHz, 819mV
  MCLK state 2: 2019MHz, 1185mV
    SCLK clock 2150Mhz
    MCLK clock 950Mhz
  Curent power cap: 250W
Verifying user state values at /etc/default/amdgpu-custom-state.card0:
  SCLK state 0: 1950Mhz
  SCLK state 1: 2150Mhz
  MCLK state 1: 875MHz
  VDDC Curve state 0: 800MHz 750mV
  VDDC Curve state 1: 1409MHz 820mV
  VDDC Curve state 2: 2150MHz 1200mV
  Force power cap to 250W
  Force performance level to manual
  Force SCLK state to 0 1
  Force MCLK state to 1
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
m 0 800 704 to /sys/class/drm/card0/device/pp_od_clk_voltage           <- i added an echo to see what it is printing
/usr/bin/amdgpu-clocks: line 155: echo: write error: Invalid argument
m 1 875 to /sys/class/drm/card0/device/pp_od_clk_voltage                  <- the value i set in my config
m 2 2019 1185 to /sys/class/drm/card0/device/pp_od_clk_voltage
/usr/bin/amdgpu-clocks: line 155: echo: write error: Invalid argument
  Done

I managed to fix it by filtering out the nulls at line 61 in function_parse_states:

mapfile -t STATE_LINES < <(sed 's/\x0//g' $1)
# /usr/local/bin/amdgpu-clocks 
Won't write initial state to /tmp/amdgpu-custom-state.card0.initial, it already exists.
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 800Mhz
  SCLK state 1: 2019Mhz
  MCLK state 1: 875MHz
  VDDC Curve state 0: 800MHz 704mV
  VDDC Curve state 1: 1409MHz 819mV
  VDDC Curve state 2: 2019MHz 1185mV
  Maximum clocks & voltages:
    SCLK clock 2150Mhz
    MCLK clock 950Mhz
  Curent power cap: 250W
Verifying user state values at /etc/default/amdgpu-custom-state.card0:
  SCLK state 0: 1950Mhz
  SCLK state 1: 2150Mhz
  MCLK state 1: 875MHz
  VDDC Curve state 0: 800MHz 750mV
  VDDC Curve state 1: 1409MHz 820mV
  VDDC Curve state 2: 2150MHz 1200mV
  Force power cap to 250W
  Force performance level to manual
  Force SCLK state to 0 1
  Force MCLK state to 1
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
  Done

GPU RX 5700 XT Kernel linux-zen 5.15.13 OS Arch

sibradzic commented 2 years ago

hi @exuvo , thanks for reporting this. Weird indeed, can you please share your raw /sys/class/drm/card0/device/pp_od_clk_voltage, before and after the changes were applied? You are running this? Does the same thing happen on vanilla Arch 5.13.x?

exuvo commented 2 years ago

Not sure if i understand what you want, i already included pp_od_clk_voltage which remains the same before and after as the patch is in amdgpu-clocks not the kernel driver? Yes that is the kernel source i am using. I will get back to you after testing with vanilla kernel.

sibradzic commented 2 years ago

Not sure if i understand what you want

A raw binary copy of /sys/class/drm/card0/device/pp_od_clk_voltage

exuvo commented 2 years ago

Okay but that data is already in the xxd code block. I had to give it a file ending for the upload to work here. pp_od_clk_voltage.txt

sibradzic commented 2 years ago

@exuvo thanks for sending the raw file, I just wanted to be 200% sure about these nulls @ pp_od_clk_voltage when I do tests on my side. Looks like mapfile gets totally confused when dealing with these nulls, so some tr prepping in the pipe was added. The latest commit should had fixed the issue, please verify on your side.

exuvo commented 2 years ago

Any specific reason you went with tr instead of the sed line i suggested as a fix?

sibradzic commented 2 years ago

not really