Closed adolfintel closed 3 years ago
Service stop does nothing but amdgpu-clocks restore
(please confirm), which just sets the states to the values before amdgpu-clocks is run for the first time. Crash is very likely caused by particular values in the "restore" file, so please share the contents of /tmp/amdgpu-custom-state.card0.initial
(case your Polaris card is card0
).
Yes, amdgpu-clocks restore does the same thing.
The clocks and voltages in the restore file match the ones in the VBIOS:
cat /tmp/amdgpu-custom-states.card0.initial
OD_SCLK:
0: 214MHz 700mV
1: 551MHz 750mV
2: 734MHz 750mV
3: 980MHz 875mV
4: 1046MHz 900mV
5: 1098MHz 950mV
6: 1124MHz 1000mV
7: 1206MHz 1050mV
OD_MCLK:
0: 300MHz 700mV
1: 625MHz 700mV
2: 1500MHz 875mV
OD_RANGE:
SCLK: 214MHz 1800MHz
MCLK: 300MHz 2000MHz
VDDC: 700mV 1075mV
FORCE_PERF_LEVEL: auto
FORCE_POWER_CAP: 36000000
This is my custom states file, which is perfectly stable:
# Set custom GPU states:
OD_SCLK:
7: 1600MHz 1075mV
# Set custom memory states:
OD_MCLK:
2: 2000MHz 875mV
# Only allow SCLK state 7:
FORCE_SCLK: 7
# Force fixed memory state 2:
FORCE_MCLK: 2
# Force power limit (in micro watts):
FORCE_POWER_CAP: 50000000
# In order to allow FORCE_SCLK & FORCE_MCLK:
FORCE_PERF_LEVEL: manual
It looks like the driver or your card really does not like the transition of your custom states to defaults found at /tmp/amdgpu-custom-states.card0.initial
. Try commenting out all MCLK & FORCE_MCLK parts, first in custom states and then in initial states and report back.
There is probably nothing wrong with the amdgpu-clocks, it is your particular settings and the driver's or HW's inability to transition back to original state...
Apparently the problem is caused by FORCE_SCLK and FORCE_MCLK, if I force them both to 0 before resetting, it works as intended. Would it be possible to add this to the script?
Yeah, it looks like this is it, I can replicate the problem in Windows. If I force it to the highest power state with the radeon software and press the reset button in afterburner it does the same thing, apparently you can't write a power state while it's in use
OK, what seems to be happening in your case is that the dirver/card crashes when transitioning from manual
to auto
FORCE_PERF_LEVEL
when manual
is restricted to highest states only. Your patch would break other things (on Navi one can't set SYSDPM{S,M}CLK when not in manual
, doing it blindly in set_custom_states
brings risks of further breakage and commit is not necessary when setting SYSDPM{S,M}CLK). Can you test this slightly different approach:
@@ -124,6 +124,13 @@ function parse_states() {
}
function set_custom_states() {
+ if [ "${RESTORE}" == "true" ] && [ "${FORCE_LEVEL}" == "auto" ]; then
+ echo manual > ${PWR_LEVEL}
+ sleep 0.25
+ echo 0 > ${SYS_DPM_SCLK} 2>&1
+ echo 0 > ${SYS_DPM_MCLK} 2>&1
+ sleep 0.25
+ fi
if [ "${FORCE_LEVEL}" ]; then
echo ${FORCE_LEVEL} > ${PWR_LEVEL}
fi
?
Yes, that works. Those extra commits were there because I noticed that they removed the forced state, but your solution works too so I'll add that to my PR.
When the service is stopped, the system locks up, as shown in this video: https://user-images.githubusercontent.com/16307689/104839896-7f72db00-58c4-11eb-9039-2270bbb25d9c.mp4
Sometimes a loud buzzing noise is emitted from the HDMI/DP output.
This makes it impossible to properly restart or shut down the machine.
This is on Manjaro, Linux 5.10, Polaris GPU, AMDGPU driver