Closed Halornek closed 2 years ago
Ubuntu Server 20.04.3 LTS. I'm currently running the base kernel, 5.4.0-89-generic with AMDGPU open driver version 20.45.
That's pretty outdated. Try moving to recent kernel, anything from 5.1x will do, and uninstalling AMDGPU driver. AMDGPU open & pro drivers are often causing power management issues, especially when built against older kernels. Mainline drives are much better in that regard, google "kernel ppa" and check available options.
When I attempt to run the script, the terminal will hang for a few seconds then output the following
There should be no hang. It is possible that the driver is crashing, and the driver resets the GPU, dmesg
should tell you the details.
However, if I then wait a few seconds and run cat again, some of the settings appear to have been reverted.
There must be something in kernel dmesg
log when the "revert" happens? Please share the continuos output before applying amdgpu-clocks for the first time, until shortly after the revert happens. Are you absolutely sure you ain't running some other over/under clock/volt tool that is messing up with your power settings?
Oh, by the way, your issue has probably nothing to do with amdgpu-clocks itself, there is not much I can do to fix an actual driver provided by AMD, so switch to recent mainline kernel, remove the non-mainline provided "driver" and check everything again... That distro is also pretty old, try booting something like 21.10 for the test, everything should just work out of the box, no need to install any "driver" or anything.
Thanks for the quick response. I figured that this was somewhere along a driver issue, though I still do greatly appreciate the assistance and guidance. Was hoping to get it working with the official drivers for easy OpenCL support, but I can always work with ROCm.
I've attached a dmesg log from just before running the clocks script (At least I think this is the correct file).
It looks as if you are correct on the driver rebooting.
I shouldn't have anything else affecting clocks/voltage, as this was a fresh install as of about 8 hours ago.
I uninstalled the AMDGPU 20.45, rebooted, and attempted to run the script. This actually appeared to function, but left me without OpenCL support due to some issues installing ROCm on kernel 5.4. I updated to at 5.11, and even without the AMDGPU open drivers I once again experience the same behavior as before (Hanging for a few seconds on the script, saying it's successful, then checking pp_od_clk_voltage and some of the items have reverted.
I tested 5.11 both with and without AMDGPU 21.30 drivers and both experienced the issue.
A this point, this seems to be something driver related that I do not believe you would be able to fix. If you would be able to point me in the direction that I could research on my own I would appreciate it.
I've attached a dmesg log from just before running the clocks script (At least I think this is the correct file).
Yes, but what 's in there after you apply the script and the revert happening?
point me in the direction that I could research on my own I would appreciate it
To double check if you are really running into the driver issue, clean install Ubuntu 21.10 without any AMDGPU stuff, set amdgpu.ppfeaturemask=0xffffffff
, and try the script.
Yes, but what 's in there after you apply the script and the revert happening?
Sorry, I should have been more specific. That attachment was just after startup and running the script. Line 6 [ 112.390963] is when I ran the script. Anything after that happened during or after the script.
To double check if you are really running into the driver issue, clean install Ubuntu 21.10 without any AMDGPU stuff, set amdgpu.ppfeaturemask=0xffffffff, and try the script.
I was actually already working on that with a clean install of 21.10.
This functioned perfectly fine. The script ran with no issues and I was able to verify the states with cat on pp_od_clk_voltage.
Script output:
Writen initial backup states to /tmp/amdgpu-custom-states.card0.initial
Detecting the state values at /sys/class/drm/card0/device/pp_od_clk_voltage:
SCLK state 0: 800Mhz
SCLK state 1: 2079Mhz
MCLK state 1: 875MHz
VDDC Curve state 0: 800MHz 705mV
VDDC Curve state 1: 1439MHz 812mV
VDDC Curve state 2: 2079MHz 1162mV
Maximum clocks & voltages:
SCLK clock 2150Mhz
MCLK clock 950Mhz
Curent power cap: 190W
Verifying user state values at /etc/default/amdgpu-custom-states.card0:
SCLK state 0: 800Mhz
SCLK state 1: 1300MHz
MCLK state 1: 875MHz
VDDC Curve state 0: 800MHz 750mV
VDDC Curve state 1: 1000MHz 775mV
VDDC Curve state 2: 1300MHz 800mV
Force power cap to 165W
Force performance level to manual
Committing custom states to /sys/class/drm/card0/device/pp_od_clk_voltage:
Done
cat output:
OD_SCLK:
0: 800Mhz
1: 1300Mhz
OD_MCLK:
1: 875MHz
OD_VDDC_CURVE:
0: 800MHz 800mV
1: 1000MHz 800mV
2: 1300MHz 800mV
OD_RANGE:
SCLK: 800Mhz 2150Mhz
MCLK: 625Mhz 950Mhz
VDDC_CURVE_SCLK[0]: 800Mhz 2150Mhz
VDDC_CURVE_VOLT[0]: 750mV 1200mV
VDDC_CURVE_SCLK[1]: 800Mhz 2150Mhz
VDDC_CURVE_VOLT[1]: 750mV 1200mV
VDDC_CURVE_SCLK[2]: 800Mhz 2150Mhz
VDDC_CURVE_VOLT[2]: 750mV 1200mV
At this point, pretty confident we can say it's just a driver issue. I at least have a starting point to figure out OpenCL support.
Thank you once again for your help. This is a great tool and you have been very quick in your responses.
Thank you again for your help. Thought I would post an update in case anyone ever gets stuck in the specific scenario I am in, that being running Ubuntu with a Navi1 (5000 series) GPU, wants OpenCL support through AMDGPU drivers ("Open" or Pro), and needs some form of clock/voltage control.
I managed to get everything working with the following process (Done on Ubuntu Server, but should work on Ubuntu Desktop):
Install Ubuntu 20.04.3 LTS Compile Linux Kernel 5.11 from source and apply packages Reboot into Linux Kernel 5.11 and purge original 5.4 headers Add the amdgpu.ppfeaturemask=0xffffffff to GRUB Update Grub and reboot Verify custom states are loading and sticking Install AMDGPU version 21.30 with --opencl=rocr and reboot Verify custom states are still loading and sticking
This functioned fine for me after a clean install.
Thanks for making this software.
I'm attempting to apply some undervolt settings to my 5700 XT running in a headless install of Ubuntu Server 20.04.3 LTS.
I'm currently running the base kernel, 5.4.0-89-generic with AMDGPU open driver version 20.45.
I set up the amdgpu.ppfeaturemask=0xffffffff under GRUB, and set up both the amdgpu-clocks script and the amdgpu-custom-states file under /etc/default.
(OD_VDDGFX_OFFSET is commented out as it was a reference from the original file I used for my 6900 XT on my desktop)
Before running the script, my pp_od_clk_voltage file outputs this when I run cat:
When I attempt to run the script, the terminal will hang for a few seconds then output the following.
If I run cat immediately on pp_od_clk_voltage, everything appears normal.
However, if I then wait a few seconds and run cat again, some of the settings appear to have been reverted.
Once this has been done, I'm not able to run many applications that use OpenCL or Mesa. Doing some research it seems to be tied to unstable voltage (But that's a separate issue).
At one point in time, I had everything working with the exact process mentioned above. Overclocks/Undervolts would apply, and the system would run stable. I restarted the system to apply a patch and started experiencing this issue again. After performing a clean install, I'm not able to replicate a successful application.
Please let me know if I missed something somewhere along the line, such as in the custom states file, or if I can provide additional information.