xbianonpi / xbian

XBMC on Raspberry Pi, Bleeding Edge
https://xbian.org
GNU General Public License v3.0
294 stars 46 forks source link

Xbian repeatedly crash: Kernel panic - system is deadlocked on memory. swapper/0 tainted , HW: BCM2711 #928

Open slrslr opened 1 year ago

slrslr commented 1 year ago

Linux xbian 6.1.24+ ... armv7l XBian 11.0 - Bullseye - 20230419-0 - Bleeding Edge, 2012-2023 Raspberry Pi 4

I updated Xbian software a few times in last month (Inside Kodi, System update...) and setup 2 new Kodi addons from 2 new repository.

Every couple of days (i think that one day it happened multiple times) i find out internet stop working in home LAN, even unable to connect/ping router or Xbian. Router restart not helps and i find out Xbian deadlocked as per the screenshot attached:

Xbian kernel panic deadlocked on memory - internet fail (this deadklock is what happen repeatedly as mentioned)

So i power off/on the Xbian device. Then somehow router device/internet start working.

mkreisl commented 1 year ago

I recommend you to install the previous version of the kernel package (linux-image-bcm2836) to see if the kernel panic disappears with it

slrslr commented 1 year ago

install the previous version of the kernel package (linux-image-bcm2836)

# apt install linux-image-bcm2836* linux-image-bcm2836 is already the newest version (6.1.24+-1681410943)

# apt search linux-image-bcm2*

linux-image-bcm2836/stable,now 6.1.24+-1681410943 armhf [installed]
  Latest XBian kernel (rpi2/6.1.y 6.1.24+)

linux-image-bcm2837/stable 6.1.24+-1681414802 armhf
  Latest XBian kernel (rpi3/6.1.y 6.1.24+)

# dpkg --list 'linux-image-*'

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                Version            Architecture Description
+++-===================-==================-============-========================================
un  linux-image-armmp   <none>             <none>       (no description available)
ii  linux-image-bcm2836 6.1.24+-1681410943 armhf        Latest XBian kernel (rpi2/6.1.y 6.1.24+)
un  linux-image-bcm2837 <none>             <none>       (no description available)

Which command to run to install previous version please and what to do if issue (not) appear again?

mkreisl commented 1 year ago

Why am I not surprised? I find it sad that nowadays fewer and fewer are able or want to be able to solve problems themselves. There are certainly countless examples on the net of how to do this.

Here again in short form:

apt-cache policy linux-image-bcm2836 gives you a list of all available version of the package

sudo apt-get install linux-image-bcm2836=<theversionyouwant> installs the desired version

sudo apt-mark hold linux-image-bcm2836 protects the package from being overwritten

mkreisl commented 1 year ago

I assume the cause is the same problem as in this issue: https://github.com/raspberrypi/linux/issues/5395

Kernel 6.1 has a new feature called MGLRU, this seems to be very buggy, but is unfortunately enabled by default I have not been able to detect this until today, maybe because I am running the 64 bit kernel on my development Pi4

But today I was working with an installation that had the 32bit 6.1.24 kernel installed and I got these kernel panics all the time

To disable MGLRU, a simple command sudo echo 0 > /sys/kernel/mm/lru_gen/enabled should disable this feature on a running system

yuzhaogoogle commented 1 year ago

I assume the cause is the same problem as in this issue: raspberrypi/linux#5395

Kernel 6.1 has a new feature called MGLRU, this seems to be very buggy, but is unfortunately enabled by default I have not been able to detect this until today, maybe because I am running the 64 bit kernel on my development Pi4

There is only one known problem: MGLRU caused OOM kills on Pi 4 when running 32-bit kernels, because Pi 4 has to use CONFIG_ VMSPLIT_3G. This was fixed by disabling MGLRU by default on Pi 4 32-bit kernels. The recommendation is to switch to 64-bit kernels on Pi 4, which has MGLRU on by default.

See https://github.com/raspberrypi/linux/issues/5395#issuecomment-1512247475.

slrslr commented 12 months ago

To disable MGLRU, a simple command sudo echo 0 > /sys/kernel/mm/lru_gen/enabled should disable this feature on a running system

I have tried this and likely did reboot after which it crashed again and made router not respond to most of LAN computers during the crash, with same or similar kernel panic, then after boot i have found that the lru_gen option was reset back to:

cat /sys/kernel/mm/lru_gen/enabled 0x0001

I have failed to find how to make "0x0000" persistent/survive reboot. Can you suggest command for this please?

yuzhaogoogle commented 12 months ago

To disable MGLRU, a simple command sudo echo 0 > /sys/kernel/mm/lru_gen/enabled should disable this feature on a running system

I have tried this and likely did reboot after which it crashed again and made router not respond to most of LAN computers during the crash, with same or similar kernel panic, then after boot i have found that the lru_gen option was reset back to:

Your crash doesn't seem to be related to this issue.

cat /sys/kernel/mm/lru_gen/enabled 0x0001

I have failed to find how to make "0x0000" persistent/survive reboot. Can you suggest command for this please?

The latest 32-bit kernel disabled MGLRU by default. If you are using the 32-bit kernel, please make sure it's the latest. And please also make sure you don't have modifications to the initscript or services that enable MGLRU, e.g., grep -r lru_gen /etc/.

slrslr commented 11 months ago

The latest 32-bit kernel disabled MGLRU by default.

# uname -r 6.1.24+ # uname -m armv7l People there says it means i am on 32bit kernel, where you say that MGLRU should be disabled in latest kernel (i am unsure if i am on latest version and if/how @mkreisl suggests to upgrade/proceed? # apt search linux-|grep -i installed

binutils-arm-linux-gnueabihf/oldstable,now 2.35.2-2 armhf [installed,automatic]
libpam0g/oldstable,now 1.4.0-9+deb11u1 armhf [installed]
libselinux1/oldstable,now 3.1-3 armhf [installed]
linux-base/oldstable,now 4.6 all [installed,automatic]
linux-libc-dev/now 6.1.24-1681410943 armhf [installed,local] <-------
parted/oldstable,now 3.4-1 armhf [installed,automatic]

# grep -r lru_gen /etc/ ..empty result..

I do not know regarding initscript (not know right command), but tried "grep -Ria lru /etc/" and found nothing that would appear like config. option and such.

What do you suggest?

mkreisl commented 11 months ago

@slrslr You are not on the latest kernel

slrslr commented 11 months ago

After "apt update;apt upgrade", i may be on latest:

Linux xbian 6.1.28+
linux-image-bcm2836/stable,now 6.1.28+-1684360121 armhf [installed]

This question of mine is unanswered:

I have failed to find how to make "0x0000" persistent/survive reboot. Can you suggest command for this please?

Yet i see that after reboot under this newer kernel, it is disabled already: cat /sys/kernel/mm/lru_gen/enabled 0x0000

And i will keep you updated on next crash if you have no more instruction on what to do now.

mkreisl commented 11 months ago

Yes of course, this was fixed promptly after your issue and no longer occurs with 6.1.28. The kernel update is correct, I was wrong, a newer kernel is only available for Debian Bookworm.