Closed Rebreda closed 3 years ago
What hardware are you using?
Can you please post a complete sudo dmesg
output after the issue has occurred? The portion you quoted doesn't show an error message, it just shows that system76_acpi
is loaded (but not necessarily where the problem occurred.)
BIOS Information
Vendor: coreboot
Version: 2021-03-11_50eedc2
Release Date: 03/11/2021
ROM Size: 16 MB
Characteristics:
PCI is supported
PC Card (PCMCIA) is supported
BIOS is upgradeable
Selectable boot is supported
ACPI is supported
Targeted content distribution is supported
BIOS Revision: 4.13
Firmware Revision: 0.0
System Information
Manufacturer: System76
Product Name: Lemur Pro
Version: lemp9
Serial Number: 123456789
UUID: Not Settable
Wake-up Type: Reserved
SKU Number: Not Specified
Family: Not Specified
Filtering dmesg
for system76_acpi:
[ +0.000004] BTRFS info (device dm-0): disk space caching is enabled
[ +0.200732] system76_acpi: loading out-of-tree module taints kernel.
[ +0.004751] input: Intel HID events as /devices/platform/INT33D5:00/input/input14
[ +0.001200] system76_acpi: module verification failed: signature and/or required key missing - tainting kernel
[ +0.018221] input: System76 ACPI Hotkeys as /devices/LNXSYSTM:00/LNXSYBUS:00/17761776:00/input/input15
[ +0.000048] ACPI: battery: new extension: System76 Battery Extension
[ +0.063071] mc: Linux media interface: v0.10
I've included some of the logging before and after the messages for context. Thanks for the help!
@gabehab Please provide an unfiltered dmesg
so we can see the kernel oops.
System76 customers can reach out to support for technical assistance. For non-System76 hardware, you can seek community support on Reddit or Mattermost.
@gabehab I'm not seeing a kernel oops in that log. Where is the kernel crash that you're seeing?
sorry wrong file, lots of reboots these days.
Thank you. Here is the crash:
Aug 07 14:08:27.442818 fedora kernel: page dumped because: VM_BUG_ON_PAGE(PageTail(page))
Aug 07 14:08:27.442836 fedora kernel: ------------[ cut here ]------------
Aug 07 14:08:27.442854 fedora kernel: kernel BUG at include/linux/pagemap.h:247!
Aug 07 14:08:27.442937 fedora kernel: invalid opcode: 0000 [#1] SMP NOPTI
Aug 07 14:08:27.443044 fedora kernel: CPU: 4 PID: 50278 Comm: systemd-userwor Tainted: G S OE 5.13.6-200.fc34.x86_64 #1
Aug 07 14:08:27.443075 fedora kernel: Hardware name: System76 Lemur Pro/Lemur Pro, BIOS 2021-03-11_50eedc2 03/11/2021
Aug 07 14:08:27.443095 fedora kernel: RIP: 0010:next_uptodate_page+0x23e/0x2a0
Aug 07 14:08:27.443114 fedora kernel: Code: 01 83 f8 01 0f 87 0e ff ff ff e8 fd f8 00 00 e9 19 ff ff ff e8 73 f8 00 00 e9 0f ff ff ff 48 c7 c6 00 69 5d 9d e8 a2 72 03 00 <0f> 0b 48 8b 03 48 8b 40 08 e9 6d fe ff ff 48 c7 c6 90 ab 59 9d e8
...
Aug 07 14:08:27.443292 fedora kernel: Call Trace:
Aug 07 14:08:27.443309 fedora kernel: filemap_map_pages+0x435/0x700
Aug 07 14:08:27.443324 fedora kernel: __handle_mm_fault+0x126c/0x1570
Aug 07 14:08:27.443339 fedora kernel: handle_mm_fault+0xd5/0x2b0
Aug 07 14:08:27.443358 fedora kernel: do_user_addr_fault+0x1b7/0x670
Aug 07 14:08:27.443376 fedora kernel: exc_page_fault+0x78/0x160
Aug 07 14:08:27.443401 fedora kernel: ? asm_exc_page_fault+0x8/0x30
Aug 07 14:08:27.443421 fedora kernel: asm_exc_page_fault+0x1e/0x30
Aug 07 14:08:27.443436 fedora kernel: RIP: 0033:0x7fcd698d7000
I'm no kernel engineer, but the references to pages make this sound like a potential RAM issue. I see you're running kernel 5.13.6. Do these crashes also occur if you run Pop!_OS (with kernel 5.11) from a live disk? If so, this could be defective hardware. If not, then it could simply be a kernel bug.
Just to be clear, system76_acpi
tainting the kernel is expected and not an issue in itself. If the crash was related to system76_acpi
, then I would expect to see it referenced in the error messages/traces. Do the crashes still occur if you remove system76_acpi
from your Fedora installation?
So, I don't think it is defective RAM as I've never had this problem before upgrading from kernel 5.8 (as mentioned here https://github.com/pop-os/linux/issues/45). It looks like there's a more serious issue with kernel 5.11.
The reason I switched off PopOS (and 5.11) was specifically to see if I could get my machine to be stable and stop crashing every couple hours. Fedora with 5.13 seemed to be pretty solid, at least not crashing regularly (KDE was freezing a lot, so switched to gnome 40 and alls well that side of things). I then installed the corresponding S76 modules to get better battery performance, etc. However, it then started crashing more regularly. As a result, I did uninstall system76_acpi
yesterday and it seemingly hasn't crashed since... although just a small period of time has passed, so far so good without system76_acpi
.
@gabehab Do you have a support ticket open? If your system is not stable running Pop!_OS, then there is most likely a hardware issue, or else all lemp9 owners would be having the problem, which is not the case. With the only machines that we were able to recreate the 5.11 issues on in our lab, the issues went away after replacing the RAM (even when the old, defective RAM was appearing to pass a RAM test.) Even if it's not the RAM itself, it could be the RAM slot or the motherboard. Just because some versions of the kernel don't interact with the hardware in the way that triggers the problem doesn't mean there's not a problem.
(Just emphasizing this because I wouldn't want you to get stuck with bad hardware thinking that a workaround has solved the problem, only for it to come back later.)
Got it - I'll open a support ticket and see how it plays out.
Thanks for your help @jacobgkau!
Distribution (run
cat /etc/os-release
): Fedora 34 Gnome 40.3.0 WaylandIssue/Bug Description: I sometimes get crashes that look like below: Error message as follows:
Seemingly happens at random, sometimes in a burst (5 in 10 mins) or one off. Can't seem to find much else about it other than system76_acpi being called out. Would love some more info on it.