Closed Sqvid closed 4 months ago
I have kernel 6.6.29_1 and my system did not poweroff when I tried shutting it down with with my AGS widgets. I didn't debug whether it was the widgets or the system though, and I'm not in a mood to shutdown now.
I had this problem on my two Nvidia Optimus laptops: (Intel + Nvidia) and (Amd + Nvidia) while I was in the process of resuming the use of Void Linux, so it only gave me the error while I was using Nouveau. When I finished configuring the system with the Nvidia drivers, it no longer appeared.
I encountered this problem after I updated the kernel this afternoon. Exactly the same problem like @Sqvid.
My system spec:
uname -a
output: Linux void-btw 6.6.31_1 #1 SMP PREEMPT_DYNAMIC Sat May 18 01:07:55 UTC 2024 x86_64 GNU/Linux
Since I usually clear caches after an update, so I cannot downgrade the kernel. But I tried installing linux6.9
and it works just fine.
I'm on arch linux but I have the same behavior. I went in my cache and reverted to test:
linux-lts-6.6.30-2-x86_64.pkg.tar.zst
: shuts down fine
linux-lts-6.6.31-1-x86_64.pkg.tar.zst
: does not shut down
Facing same issue on Laptop with 8845hs during installation (no UI is installed)
I found this thread through a search engine (I'm on nixOS not void linux), but I have the same issue on nixOS on a framework amd laptop (AMD Ryzen 7 7840U). Shutdown appears to work at first, where the systemd logs zoom right past like always. Then the screen goes black, but the power "stays on" (power led stays lit) indefinitely. I need to manually hold the power button for several seconds to force the shutdown to go through. Rolling back the kernel from 6.6.31 to 6.6.30 fixes the issue.
Anyone affected by this can you check two things:
CC @knurd
Warning, the following is a totally wild guess. But there was one report about a shutdown problem caused by a commit that went into 6.6.31 as well; wonder if that might be related. https://lore.kernel.org/all/CAE4VaREzY%2Ba2PvQJYJbfh8DwB4OP7kucZG-e28H22xyWob1w_A@mail.gmail.com/
Bisected between 6.6.30 and 6.6.31 to find the problem commit:
$ git bisect start --no-checkout
status: waiting for both good and bad commits
$ git bisect good 5697d159afef8c475f13a0b7b85f09bd4578106c
status: waiting for bad commit, 1 good commit known
$ git bisect bad e3d332aaf898ed755b29c8cdf59be2cfba1cac4b
Bisecting: 154 revisions left to test after this (roughly 7 steps)
[6466a0f6d235c8a18c602cb587160d7e49876db9] uio_hv_generic: Don't free decrypted memory
$ git bisect good
Bisecting: 77 revisions left to test after this (roughly 6 steps)
[7a54e5052bde582fd0e7677334fe7a5be92e242c] usb: gadget: uvc: use correct buffer size when parsing configfs lists
$ git bisect bad
Bisecting: 38 revisions left to test after this (roughly 5 steps)
[2ee2fc6786bc5ff3c24798624ea3806c9662c26f] selftests/net: convert test_bridge_neigh_suppress.sh to run it in unique namespace
$ git bisect good
Bisecting: 19 revisions left to test after this (roughly 4 steps)
[7019a64165186fd3fb5ba928a6347558dc560093] net: dsa: mv88e6xxx: add phylink_get_caps for the mv88e6320/21 family
$ git bisect good
Bisecting: 9 revisions left to test after this (roughly 3 steps)
[0c7ed3ed35eec9138b88d42217b5a6b9a62bda4d] drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2
$ git bisect bad
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[95ca7c90eaf5ea8a8460536535101e3e81160e2a] gpiolib: cdev: Fix use after free in lineinfo_changed_notify
$ git bisect bad
Bisecting: 2 revisions left to test after this (roughly 1 step)
[fa2d2e2d8eae03acf49229793f2a6fddede92c4d] drm/meson: dw-hdmi: add bandgap setting for g12
$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[6c83a8f236ece78d5c2e60ae3dcfd1a64509410e] dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users
$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[1b9e8de483bcc230f6e922bdfa9d1c186c27dd3b] drm/connector: Add \n to message about demoting connector force-probes
$ git bisect good
6c83a8f236ece78d5c2e60ae3dcfd1a64509410e is the first bad commit
commit 6c83a8f236ece78d5c2e60ae3dcfd1a64509410e
Author: Mario Limonciello <mario.limonciello@amd.com>
Date: Thu May 2 13:32:17 2024 -0500
dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users
[ Upstream commit cd94d1b182d2986378550c9087571991bfee01d4 ]
Limit the workaround introduced by commit 31729e8c21ec ("drm/amd/pm: fixes
a random hang in S4 for SMU v13.0.4/11") to only run in the s4 path.
Cc: Tim Huang <Tim.Huang@amd.com>
Fixes: 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Ok... Does it also fail on 6.9?
It's best to bisect 6.6.30 to 6.6.31 to find which commit caused it.
I have bisected it to this commit in https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux.git :
commit 6c83a8f236ece78d5c2e60ae3dcfd1a64509410e (HEAD)
Author: Mario Limonciello <mario.limonciello@amd.com>
Date: Thu May 2 14:32:17 2024
dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users
[ Upstream commit cd94d1b182d2986378550c9087571991bfee01d4 ]
Limit the workaround introduced by commit 31729e8c21ec ("drm/amd/pm: fixes
a random hang in S4 for SMU v13.0.4/11") to only run in the s4 path.
Cc: Tim Huang <Tim.Huang@amd.com>
Fixes: 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3351
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
The issue is not present on 6.9.1. I would agree it looks like an incomplete/failed backport. @tomalexander good to have confirmation
Interesting. It's supposed to be quite self contained.
Can you guys please confirm your firmware versions from debugfs?
/sys/kernel/debug/dri/0/amdgpu_firmware_info
Its going to take me a while to confirm 6.9.1 because the release version of zfs only supports up to 6.8. I'll have to try the upstream zfs git and if that doesn't work, swap out my SSD and make a new non-zfs install.
My firmware on Arch Linux's build of 6.6.31-1-lts
:
$ doas cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 35, firmware version: 0x00000027
PFP feature version: 35, firmware version: 0x00000030
CE feature version: 0, firmware version: 0x00000000
RLC feature version: 1, firmware version: 0x00000080
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
RLCP feature version: 1, firmware version: 0x0000000f
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 35, firmware version: 0x00000027
IMU feature version: 0, firmware version: 0x0b012d00
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 553648339, firmware version: 0x210000d3
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x1700003e
TA DTM feature version: 0x00000000, firmware version: 0x12000016
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 0, firmware version: 0x004c5200 (76.82.0)
SDMA0 feature version: 60, firmware version: 0x00000011
VCN feature version: 0, firmware version: 0x0711300d
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x08003a00
TOC feature version: 0, firmware version: 0x0000000b
MES_KIQ feature version: 6, firmware version: 0x00000073
MES feature version: 1, firmware version: 0x00000059
VBIOS version: 113-PHXGENERIC-001
Nevermind, lucked out, upstream zfs git works for 6.9.1. I have confirmed that 6.9.1 shuts down properly.
Firmware as reported on Manjaro running 6.6.30): (128 also exists under dri, but reports identical info)
# mount -t debugfs none /sys/kernel/debug
# cat /sys/kernel/debug/dri/1/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 35, firmware version: 0x00000027
PFP feature version: 35, firmware version: 0x00000030
CE feature version: 0, firmware version: 0x00000000
RLC feature version: 1, firmware version: 0x00000080
RLC SRLC feature version: 0, firmware version: 0x00000000
RLC SRLG feature version: 0, firmware version: 0x00000000
RLC SRLS feature version: 0, firmware version: 0x00000000
RLCP feature version: 1, firmware version: 0x0000000f
RLCV feature version: 0, firmware version: 0x00000000
MEC feature version: 35, firmware version: 0x00000027
IMU feature version: 0, firmware version: 0x0b012d00
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 553648339, firmware version: 0x210000d3
TA XGMI feature version: 0x00000000, firmware version: 0x00000000
TA RAS feature version: 0x00000000, firmware version: 0x00000000
TA HDCP feature version: 0x00000000, firmware version: 0x1700003e
TA DTM feature version: 0x00000000, firmware version: 0x12000016
TA RAP feature version: 0x00000000, firmware version: 0x00000000
TA SECUREDISPLAY feature version: 0x00000000, firmware version: 0x00000000
SMC feature version: 0, program: 0, firmware version: 0x004c5200 (76.82.0)
SDMA0 feature version: 60, firmware version: 0x00000011
VCN feature version: 0, firmware version: 0x0711300d
DMCU feature version: 0, firmware version: 0x00000000
DMCUB feature version: 0, firmware version: 0x08003a00
TOC feature version: 0, firmware version: 0x0000000b
MES_KIQ feature version: 6, firmware version: 0x00000073
MES feature version: 1, firmware version: 0x00000059
VPE feature version: 0, firmware version: 0x00000000
VBIOS version: 113-PHXGENERIC-001
Can you guys try this? Ideally please try it both on 6.9.y and 6.6.y:
That patch works on 6.6.31
Will update once I've tested on 6.9.1 and 6.1.91 (it is also an issue on 6.1.91 - have not tested older LTS kernels).
Update: 6.9.1 continues to work with that patch, and 6.1.91 is fixed with it.
Also tested 5.10.217 and 5.15.159, but neither of those have the issue in question.
Thanks! I've posted it to the mailing list for review:
https://lore.kernel.org/amd-gfx/20240526125908.2742-1-mario.limonciello@amd.com/T/#u
If anyone else wants to add A Reported-by or Tested-by tag, link or anything else please respond to that thread.
I just wanted to report that this kernel also messed up the standby for me. I rarely shutdown the PC but stand by it regularly and now it works only 2 times and then reports "Another instance of zzz is already running".
I just wanted to report that this kernel also messed up the standby for me. I rarely shutdown the PC but stand by it regularly and now it works only 2 times and then reports "Another instance of zzz is already running".
Does it bisect to same result? If so please try the fix linked above. If it's bisecting to different commit it should be different issue.
should be fixed in
thanks @superm1 and all testers!
Is this a new report?
Yes
System Info
Void 6.6.31_1 x86_64 AuthenticAMD uptodate rrFFFF
Package(s) Affected
linux6.6-6.6.31_1
Does a report exist for this bug with the project's home (upstream) and/or another distro?
No response
Expected behaviour
After runit winds down services the laptop should switch off.
Actual behaviour
Upgrading to kernel 6.6.31_1 prevents the computer from shutting down. Runit seems to stop all services correctly and the screen goes black as expected; however, the keyboard backlight and power light stay on and the fans still run.
System info:
CPU: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics GPU: Nvidia GeForce RTX 4060 (disabled and powered off) WM: Sway
Downgrading to 6.6.30_1 or below fixes this issue and the laptop shuts down immediately as expected.
Not sure how to extract useful debug info but would appreciate any tips.
Steps to reproduce
sudo poweroff now
.