sibradzic / amdgpu-clocks

Simple script to control power states of amdgpu driven GPUs
GNU General Public License v2.0
390 stars 43 forks source link

Bad echo prevents systemd from functioning #46

Closed markstock closed 1 year ago

markstock commented 1 year ago

Thanks for making this tool, I've already found it very useful. I'm on a Fedora34 (5.11.12-300.fc34.x86_64 kernel) with one Radeon VII card (no GPU on the CPU).

Running sudo amdgpu-clocks does what I want (throttles the GPU to conserve power), but results in a few error messages:

[mstock@diaspora0 amdgpu-clocks]$ sudo amdgpu-clocks
Writen initial backup states to /tmp/amdgpu-custom-states.card0.initial
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 808Mhz
  SCLK state 1: 1801Mhz
  MCLK state 1: 1000Mhz
  VDDC Curve state 0: 808Mhz 722mV
  VDDC Curve state 1: 1304Mhz 826mV
  VDDC Curve state 2: 1801Mhz 1112mV
  Maximum clocks & voltages:
    SCLK clock 2200Mhz
    MCLK clock 1200Mhz
  Curent power cap: 250W
Verifying user state values at /etc/default/amdgpu-custom-states.card0:
  SCLK state 0: 808Mhz
  SCLK state 1: 1134Mhz
  SCLK state 2: 1372Mhz
  SCLK state 3: 1546Mhz
  SCLK state 4: 1683Mhz
  SCLK state 5: 1749Mhz
  SCLK state 6: 1773Mhz
  SCLK state 7: 1801Mhz
  MCLK state 1: 1000Mhz
  VDDC Curve state 0: 808Mhz 722mV
  VDDC Curve state 1: 1304Mhz 826mV
  VDDC Curve state 2: 1801Mhz 1112mV
  Maximum clocks & voltages:
    SCLK clock 2200Mhz
    MCLK clock 1200Mhz
  Force power cap to 100W
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
/usr/local/bin/amdgpu-clocks: line 153: echo: write error: Invalid argument
/usr/local/bin/amdgpu-clocks: line 153: echo: write error: Invalid argument
/usr/local/bin/amdgpu-clocks: line 153: echo: write error: Invalid argument
/usr/local/bin/amdgpu-clocks: line 153: echo: write error: Invalid argument
/usr/local/bin/amdgpu-clocks: line 153: echo: write error: Invalid argument
/usr/local/bin/amdgpu-clocks: line 153: echo: write error: Invalid argument
  Done

The only addition to my /etc/default/amdgpu-custom-states.card0 file is FORCE_POWER_CAP: 100000000.

Looking at line 153, my SYS_PP_OD_CLK file (/sys/class/drm/card0/device/pp_od_clk_voltage) exists, though does not contain the range of SCLK states that I provided in my /etc/default/amdgpu-custom-states.card0 file.

Anyways, the systemd command fails:

[mstock@diaspora0 amdgpu-clocks]$ sudo systemctl enable --now amdgpu-clocks
Created symlink /etc/systemd/system/multi-user.target.wants/amdgpu-clocks.service → /usr/lib/systemd/system/amdgpu-clocks.service.
Job for amdgpu-clocks.service failed because the control process exited with error code.
See "systemctl status amdgpu-clocks.service" and "journalctl -xeu amdgpu-clocks.service" for details.

Details are:

[mstock@diaspora0 amdgpu-clocks]$ sudo systemctl status amdgpu-clocks.service
× amdgpu-clocks.service - Set custom amdgpu clocks & voltages
     Loaded: loaded (/usr/lib/systemd/system/amdgpu-clocks.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Mon 2022-09-05 13:40:17 EDT; 12min ago
    Process: 3127 ExecStart=/usr/local/bin/amdgpu-clocks (code=exited, status=203/EXEC)
   Main PID: 3127 (code=exited, status=203/EXEC)
        CPU: 615us

Sep 05 13:40:17 diaspora0 systemd[1]: Starting Set custom amdgpu clocks & voltages...
Sep 05 13:40:17 diaspora0 systemd[3127]: amdgpu-clocks.service: Failed to locate executable /usr/local/bin/amdgpu-clocks: Permission denied
Sep 05 13:40:17 diaspora0 systemd[3127]: amdgpu-clocks.service: Failed at step EXEC spawning /usr/local/bin/amdgpu-clocks: Permission denied
Sep 05 13:40:17 diaspora0 systemd[1]: amdgpu-clocks.service: Main process exited, code=exited, status=203/EXEC
Sep 05 13:40:17 diaspora0 systemd[1]: amdgpu-clocks.service: Failed with result 'exit-code'.
Sep 05 13:40:17 diaspora0 systemd[1]: Failed to start Set custom amdgpu clocks & voltages.

[edited to show the first line of each text block]

sibradzic commented 1 year ago

Hi @markstock

Looking at the amdgou-clocks log, 1st part detects the current states, as in /sys/class/drm/card0/device/pp_od_clk_voltage:

Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
  SCLK state 0: 808Mhz
  SCLK state 1: 1801Mhz
  MCLK state 1: 1000Mhz
  VDDC Curve state 0: 808Mhz 722mV
  VDDC Curve state 1: 1304Mhz 826mV
  VDDC Curve state 2: 1801Mhz 1112mV
  Maximum clocks & voltages:
    SCLK clock 2200Mhz
    MCLK clock 1200Mhz
  Curent power cap: 250W

According to above part, your card only has 2 SCLK states, 0 & 1, but next part of the log sais;

Verifying user state values at /etc/default/amdgpu-custom-states.card0:
  SCLK state 0: 808Mhz
  SCLK state 1: 1134Mhz
  SCLK state 2: 1372Mhz
  SCLK state 3: 1546Mhz
  SCLK state 4: 1683Mhz
  SCLK state 5: 1749Mhz
  SCLK state 6: 1773Mhz
  SCLK state 7: 1801Mhz
  MCLK state 1: 1000Mhz
  VDDC Curve state 0: 808Mhz 722mV
  VDDC Curve state 1: 1304Mhz 826mV
  VDDC Curve state 2: 1801Mhz 1112mV
  Maximum clocks & voltages:
    SCLK clock 2200Mhz
    MCLK clock 1200Mhz

which indicates that you have 8 (0-7) SCLK states set in /etc/default/amdgpu-custom-states.card0. Since that "echo error" appears 6 times, I guess you had defined 6 SLCK states too many in your /etc/default/amdgpu-custom-states.card0. Can you please share the contents of the custom state file here?

. . .

Regarding

Sep 05 13:40:17 diaspora0 systemd[3127]: amdgpu-clocks.service: Failed to locate executable /usr/local/bin/amdgpu-clocks: Permission denied
Sep 05 13:40:17 diaspora0 systemd[3127]: amdgpu-clocks.service: Failed at step EXEC spawning /usr/local/bin/amdgpu-clocks: Permission denied

that seems pretty obvious. You need to either;

markstock commented 1 year ago

I thought that I would need more than the basic two states (idle and full-blast) if I wanted to fine-tune the power draw of the GPU. Hence, my custom state file looks like:

OD_SCLK:
0:        808Mhz
1:        1134Mhz 
2:        1372Mhz 
3:        1546Mhz 
4:        1683Mhz 
5:        1749Mhz 
6:        1773Mhz 
7:        1801Mhz
OD_MCLK:
1:       1000Mhz
OD_VDDC_CURVE:
0:        808Mhz        722mV
1:       1304Mhz        826mV
2:       1801Mhz       1112mV
OD_RANGE:
SCLK:     808Mhz       2200Mhz
MCLK:     800Mhz       1200Mhz
VDDC_CURVE_SCLK[0]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[0]:     738mV        1218mV
VDDC_CURVE_SCLK[1]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[1]:     738mV        1218mV
VDDC_CURVE_SCLK[2]:     808Mhz       2200Mhz
VDDC_CURVE_VOLT[2]:     738mV        1218mV
# Force power limit (in micro watts):
FORCE_POWER_CAP: 100000000

As for the systemd problem, the path shouldn't be a problem:

[mstock@diaspora0 autostart]$ which amdgpu-clocks
/usr/local/bin/amdgpu-clocks
[mstock@diaspora0 autostart]$ ls -l /usr/local/bin/amdgpu-clocks
-rwxr-xr-x. 1 root root 7765 Sep  5 13:52 /usr/local/bin/amdgpu-clocks
markstock commented 1 year ago

But I see now that after some reboots a few hours ago, the service is running properly!

[mstock@diaspora0 autostart]$ sudo systemctl status amdgpu-clocks.service
[sudo] password for mstock: 
● amdgpu-clocks.service - Set custom amdgpu clocks & voltages
     Loaded: loaded (/usr/lib/systemd/system/amdgpu-clocks.service; enabled; vendor preset: disabled)
     Active: active (exited) since Mon 2022-09-05 23:21:03 EDT; 1h 42min ago
    Process: 1141 ExecStart=/usr/local/bin/amdgpu-clocks (code=exited, status=0/SUCCESS)
   Main PID: 1141 (code=exited, status=0/SUCCESS)
        CPU: 9ms

Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]:   Force power cap to 100W
Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]: Committing custom states to /sys/class/drm/card0/device/pp_od_>
Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]: /usr/local/bin/amdgpu-clocks: line 153: echo: write error: Inv>
Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]: /usr/local/bin/amdgpu-clocks: line 153: echo: write error: Inv>
Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]: /usr/local/bin/amdgpu-clocks: line 153: echo: write error: Inv>
Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]: /usr/local/bin/amdgpu-clocks: line 153: echo: write error: Inv>
Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]: /usr/local/bin/amdgpu-clocks: line 153: echo: write error: Inv>
Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]: /usr/local/bin/amdgpu-clocks: line 153: echo: write error: Inv>
Sep 05 23:21:03 diaspora0 amdgpu-clocks[1141]:   Done
Sep 05 23:21:03 diaspora0 systemd[1]: Finished Set custom amdgpu clocks & voltages.

I must have fixed the permissions (copying the file instead of sym linking it) after creating the issue but before rebooting.

So, the only issue remaining is having too many SCLK entries.

sibradzic commented 1 year ago

So, the only issue remaining is having too many SCLK entries.

Yes, and the "issue" is that your hardware only has two SCLK "states", there's nothing one can do to convince it otherwise ;) When I say "states", well, it is more likely curve frequency limits. The fine tuning of Radeon VII and newer cards is supposed to be done with defining the desired values with OD_VDDC_CURVE.

markstock commented 1 year ago

I reduced the number of SCLK entries in my /etc/default/amdgpu-custom-states.card0 to match those in /sys/class/drm/card0/device/pp_od_clk_voltage (just two: 800 and 1800 MHz). All scripts work fine with no errors. The shader clock frequency is now about 1400 MHz and power draw is very close to my desired 100W - I did not need the extra states, as the AMD hardware adjusts to fit my indicated power envelope. Closing!