sibradzic / amdgpu-clocks

Simple script to control power states of amdgpu driven GPUs
GNU General Public License v2.0
390 stars 43 forks source link

Write error: Invalid Argument when restoring #11

Closed terencode closed 4 years ago

terencode commented 4 years ago

Since https://github.com/sibradzic/amdgpu-clocks/commit/52a3acdc7a4d352202f3c70b48a4547c90a02cd4

I have a vega 64 card.

Starting Set custom amdgpu clocks & voltages...
Writen initial backup states to /tmp/amdgpu-custom-state.card0.initial
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 852Mhz, 800mV
  SCLK state 1: 991Mhz, 900mV
  SCLK state 2: 1084Mhz, 950mV
  SCLK state 3: 1138Mhz, 1000mV
  SCLK state 4: 1200Mhz, 1050mV
  SCLK state 5: 1401Mhz, 1100mV
  SCLK state 6: 1536Mhz, 1150mV
  SCLK state 7: 1630Mhz, 1200mV
  MCLK state 0: 167Mhz, 800mV
  MCLK state 1: 500Mhz, 800mV
  MCLK state 2: 800Mhz, 950mV
  MCLK state 3: 945Mhz, 1050mV
  Maximum clocks & voltages:
    SCLK clock 2400MHz
    MCLK clock 1500MHz
    VDDC voltage 1200mV
  Curent power cap: 220W
Verifying user state values at /etc/default/amdgpu-custom-state.card0:
  SCLK state 6: 1560MHz, 1050mV
  SCLK state 7: 1650MHz, 1130mV
  MCLK state 3: 1090MHz, 1050mV
  Force power cap to 260W
  Force performance level to auto
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
  Done
Finished Set custom amdgpu clocks & voltages.
Stopping Set custom amdgpu clocks & voltages...
Restoring all states to the initial defaults from /tmp/amdgpu-custom-state.card0.initial
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 852Mhz, 800mV
  SCLK state 1: 991Mhz, 900mV
  SCLK state 2: 1084Mhz, 950mV
  SCLK state 3: 1138Mhz, 1000mV
  SCLK state 4: 1200Mhz, 1050mV
  SCLK state 5: 1401Mhz, 1100mV
  SCLK state 6: 1560Mhz, 1050mV
  SCLK state 7: 1650Mhz, 1130mV
  MCLK state 0: 167Mhz, 800mV
  MCLK state 1: 500Mhz, 800mV
  MCLK state 2: 800Mhz, 950mV
  MCLK state 3: 1090Mhz, 1050mV
  Maximum clocks & voltages:
    SCLK clock 2400MHz
    MCLK clock 1500MHz
    VDDC voltage 1200mV
  Curent power cap: 220W
Verifying user state values at /tmp/amdgpu-custom-state.card0.initial:
  SCLK state 0: 852Mhz, 800mV
  SCLK state 1: 991Mhz, 900mV
  SCLK state 2: 1084Mhz, 950mV
  SCLK state 3: 1138Mhz, 1000mV
  SCLK state 4: 1200Mhz, 1050mV
  SCLK state 5: 1401Mhz, 1100mV
  SCLK state 6: 1536Mhz, 1150mV
  SCLK state 7: 1630Mhz, 1200mV
  MCLK state 0: 167Mhz, 800mV
  MCLK state 1: 500Mhz, 800mV
  MCLK state 2: 800Mhz, 950mV
  MCLK state 3: 945Mhz, 1050mV
  Maximum clocks & voltages:
    SCLK clock 2400MHz
    MCLK clock 1500MHz
    VDDC voltage 1200mV
  Force performance level to auto
  Force power cap to 220W
  Force power profile to 0
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
/usr/bin/amdgpu-clocks: ligne 123 : echo: erreur d'écriture : Argument invalide
  Done
amdgpu-clocks.service: Succeeded.
Stopped Set custom amdgpu clocks & voltages.
Starting Set custom amdgpu clocks & voltages...
Won't write initial state to /tmp/amdgpu-custom-state.card0.initial, it already exists.
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 852Mhz, 800mV
  SCLK state 1: 991Mhz, 900mV
  SCLK state 2: 1084Mhz, 950mV
  SCLK state 3: 1138Mhz, 1000mV
  SCLK state 4: 1200Mhz, 1050mV
  SCLK state 5: 1401Mhz, 1100mV
  SCLK state 6: 1536Mhz, 1150mV
  SCLK state 7: 1630Mhz, 1200mV
  MCLK state 0: 167Mhz, 800mV
  MCLK state 1: 500Mhz, 800mV
  MCLK state 2: 800Mhz, 950mV
  MCLK state 3: 945Mhz, 1050mV
  Maximum clocks & voltages:
    SCLK clock 2400MHz
    MCLK clock 1500MHz
    VDDC voltage 1200mV
  Curent power cap: 220W
Verifying user state values at /etc/default/amdgpu-custom-state.card0:
  SCLK state 6: 1560MHz, 1050mV
  SCLK state 7: 1650MHz, 1130mV
  MCLK state 3: 1090MHz, 1050mV
  Force power cap to 260W
  Force performance level to auto
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
  Done
Finished Set custom amdgpu clocks & voltages.
sibradzic commented 4 years ago

Are you talking about this one: /usr/bin/amdgpu-clocks: ligne 123 : echo: erreur d'écriture : Argument invalide ? This is about restoring profile to 0 (default): echo 0 > /sys/class/drm/card0/device/pp_power_profile_mode and for some reason kernel is complaining. Try it manually, and pass the contents of your own /sys/class/drm/card0/device/pp_power_profile_mode...

terencode commented 4 years ago

Are you talking about this one: /usr/bin/amdgpu-clocks: ligne 123 : echo: erreur d'écriture : Argument invalide

Yes, sorry I don't know how to display it in english. I can't pass any number to it (tried [0..6]), it always says invalid argument.

 cat /sys/class/drm/card0/device/pp_power_profile_mode
NUM        MODE_NAME BUSY_SET_POINT FPS USE_RLC_BUSY MIN_ACTIVE_LEVEL
  0 BOOTUP_DEFAULT*:             70  60          0              0
  1 3D_FULL_SCREEN :             70  60          1              3
  2   POWER_SAVING :             90  60          0              0
  3          VIDEO :             70  60          0              0
  4             VR :             70  90          0              0
  5        COMPUTE :             30  60          0              6
  6         CUSTOM :              0   0          0              0
terencode commented 4 years ago

Ah, this is because /sys/class/drm/card0/device/power_dpm_force_performance_level needs to be set to manual.

In definitive, I think you should not be setting pp_power_profile_modebut instead make sure power_dpm_force_performance_level is set to auto.

terencode commented 4 years ago

I opened #12 which is mildly related because the script shouldn't restore after a resume from suspend anyway...

sibradzic commented 4 years ago

I think you should not be setting pp_power_profile_modebut instead make sure power_dpm_force_performance_level is set to auto.

The restore needs to set power_dpm_force_performance_level to whatever it was before the script's first run, which is not necessarily 'auto'. This is more of an ordering issue. I've change the order and added level check before forcing profile to 0. Can you try this now, followed with service start & stop:

diff --git a/amdgpu-clocks b/amdgpu-clocks
index 9f28a62..ab8f67e 100755
--- a/amdgpu-clocks
+++ b/amdgpu-clocks
@@ -146,9 +146,11 @@ function set_custom_states() {
 function backup_states() {
   if [ ! -r ${BACKUP_STATE_FILE} ]; then
     cp ${SYS_PP_OD_CLK} ${BACKUP_STATE_FILE}
+    if [ "x${PWR_LEVEL}" == "xmanual" ]; then
+      echo "FORCE_POWER_PROFILE: 0" >> ${BACKUP_STATE_FILE}
+    fi
     echo "FORCE_PERF_LEVEL: $(cat ${PWR_LEVEL})" >> ${BACKUP_STATE_FILE}
     echo "FORCE_POWER_CAP: $(cat ${PWR_CAP_FILE})" >> ${BACKUP_STATE_FILE}
-    echo "FORCE_POWER_PROFILE: 0" >> ${BACKUP_STATE_FILE}
     echo "Writen initial backup states to ${BACKUP_STATE_FILE}"
   else
     echo "Won't write initial state to ${BACKUP_STATE_FILE}, it already exists."

?

terencode commented 4 years ago

The restore needs to set power_dpm_force_performance_level to whatever it was before the script's first run, which is not necessarily 'auto'

If we carry on the reasoning, why would you force pp_power_profile_mode to 0 and not apply what it was set to before?

terencode commented 4 years ago

With your diff, I do not get the write error (although it seems like you made a typo "x${PWR_LEVEL}" == "xmanual" so I removed those extra 'x')

sibradzic commented 4 years ago

If we carry on the reasoning, why would you force pp_power_profile_mode to 0 and not apply what it was set to before?

Because it would probably be a mess as it would require some logic for parsing the pp_power_profile_mode which tends to have very different format across different GPUs & its format had changed more than once in different kernel versions (for the same hardware). Implementing such logic in bash would probably result in not so pretty and maybe even not so short shell code, I prefer to keep this script as readable and simple as reasonably possible. But nevertheless, feel free to push a PR, if the implementation is reasonable I'd be happy to merge it...

sibradzic commented 4 years ago

With your diff, I do not get the write error (although it seems like you made a typo "x${PWR_LEVEL}" == "xmanual" so I removed those extra 'x')

Merged & got rid of x-es. Please check it out and close the issue if it works for you.

terencode commented 4 years ago

Because it would probably be a mess as it would require some logic for parsing the pp_power_profile_mode which tends to have very different format across different GPUs & its format had changed more than once in different kernel versions (for the same hardware). Implementing such logic in bash would probably result in not so pretty and maybe even not so short shell code, I prefer to keep this script as readable and simple as reasonably possible.

Make sense, maybe just add a note in the README about it.

Otherwise, no error now.

sibradzic commented 4 years ago

updated README, cheers!