Open gavinbeatty opened 1 year ago
If someone manually switches to 32-bit kernel on Pi4 (for compatibility with code or drivers that handle mixed userland/kernel badly) then it seems there are two options (that probably avoid OOM): CONFIG_ VMSPLIT3G with no MGLRU CONFIG VMSPLIT_2G with MGLRU
Sounds good. Thanks.
This sounds like further evidence that the OOM issue here doesn't affect Pi0-3
Correct.
so I'm still keen to enable MGLRU again on those platforms (in rpi-update and we'll keep an eye out for issues reported).
Please tag me in case there are other issues I could help with.
Also while on this topic, if anyone prefer zswap
over zram
on 32-bit kernels, please use zsmalloc
as its backend (zswap
supports multiple backends while zram
is coupled with zsmalloc
). Otherwise it's possible to run into the same problem because only zsmalloc
can allocate from the high mem zone. IOW, other zswap
backends, e.g., zbud
, can only allocate from lower zones and cause unnecessary OOM kills when they deplete the lower zones.
Any more datapoints would be useful. Please state model, memory, 32- or 64-bit, OOM yes or no.
[karl@schwarzschild] ~% uname -a
Linux schwarzschild 6.1.24-2-rpi-ARCH #1 SMP Fri Apr 21 07:31:33 MDT 2023 armv7l GNU/Linux
[karl@schwarzschild] ~% free -h
total used free shared buff/cache available
Mem: 3.7Gi 270Mi 3.1Gi 1.0Mi 401Mi 3.4Gi
Swap: 0B 0B 0B
[karl@schwarzschild] ~% grep Model /proc/cpuinfo
Model : Raspberry Pi 4 Model B Rev 1.1
[karl@schwarzschild] ~% sudo vcgencmd version
Mar 21 2023 17:18:16
Copyright (c) 2012 Broadcom
version 3cc1c2dfc5460da9e1a0a4f48b48ab508c48bfe5 (clean) (release) (start)
[karl@schwarzschild] ~% sudo rpi-eeprom-update
BOOTLOADER: up to date
CURRENT: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)
LATEST: Wed 11 Jan 17:40:52 UTC 2023 (1673458852)
RELEASE: critical (/lib/firmware/raspberrypi/bootloader/critical)
VL805_FW: Dedicated VL805 EEPROM
VL805: up to date
CURRENT: 000138c0
LATEST: 000138c0
Getting very aggressive OOM-kills. Even just adding a torrent to transmission can bring down the entire system as the transmission daemon restarting after being killed uses a fair amount of memory, triggering more OOM-kills including itself, and so on. Never seen memory usage >20% with top
.
Yes, this one has the same root cause (Pi 4 + 32-bit kernel + MGLRU).
As of now MGLRU is disabled on 32-bit kernel for Pi 4 (but not other models since they don't have this problem).
They seem to be running Arch ARM which should already have MGLRU disabled on armv7l since 6.1.21-2. @graysky2 I haven't rebooted into any new kernel updates since your fix worked for me, but maybe the MGLRU default=disable was lost or became ineffective in 6.1.24-2
For those using Arch ARM, please update and report back. linux-rpi-6.1.21-2 for armv7h has MGLRU compiled in but disabled by default.
@rsekman Try the procedure in https://github.com/raspberrypi/linux/issues/5395#issuecomment-1479948663
@rsekman Try the procedure in #5395 (comment)
[karl@schwarzschild] ~% cat /sys/kernel/mm/lru_gen/enabled
0x0000
The 6.1.21-2 update made my system more stable. Previously it was entering OOM-kill loops just a few minutes after booting. They could also be triggered by very lightweight programs. But I am still getting OOM-kills during peak loads (which aren't that high).
@rsekman it's not completely clear what you are reporting. I think you are saying that with a previous kernel mglru was enabled an OOM was very aggressive. With current kernel OOM is less common but still occurs.
If you think this is a kernel issue, then are you able to revert to an older kernel that has no OOMs when running a similar load?
Indeed. Before kernel version 6, never any OOM problems. After kernel version 6 but before 6.1.21-2 (MGLRU enabled), exceedingly aggressive OOM killing to where the system struggled to stay stable for minutes after booting. After 6.1.21-2 (MGLRU disabled), OOM killer less aggressive than before but still occassionally kneecaps or brings the entire system down during load spikes (which may be no more stressing than adding a torrent to transmission). Load profile has stayed unchanged throughout this entire history. Since the OOM kills happen way before actually running OOM this seems consistent with this issue and no other explanation. Someone else above was also reporting better but still not fixed behaviour since disabling MGLRU.
Are you able to say what the CMA settings were on the 5.15 and 6.1 kernels? The default in both cases is 64MB, but using the vc4-kms-v3d
, vc4-fkms-v3d
or cma
overlays changes that.
Some more data points, using the 32 bit Arch Linux ARM kernel with a 32 bit userspace. It is the same kernel build across both Pi 3 and Pi 4, built with CONFIG_VMSPLIT_3G=y
.
Pi 4 Model B Rev 1.2, 4 GB RAM:
5.15.84-1-rpi-ARCH
was up 52 days with no OOM issues6.1.21-2
with MGLRU enabled had almost immediate OOM trouble6.1.27-1-rpi-ARCH
with MGLRU disabled has been stabledtoverlay=vc4-kms-v3d
and I see CmaTotal: 524288 kB
in /proc/meminfo
.Pi 3 Model B Rev 1.2, 1 GB RAM:
6.1.16-3-rpi-ARCH
with MGRLU enabled has been fully stable for 52 daysdtoverlay=vc4-kms-v3d
and I see CmaTotal: 262144 kB
in /proc/meminfo
.Pi 3 Model B Rev 1.2, 1 GB RAM (different one):
6.1.19-2-rpi-ARCH
with MGRLU enabled has been fully stable for 48 daysdtoverlay=vc4-kms-v3d
and I see CmaTotal: 262144 kB
in /proc/meminfo
.Is everyone that is affected by this on a 4 GB or greater RPi 4?
Just bumped into it on a 2GB one. Was zeroing out my 10T USB HDD with dd
and it happened after having written 4~5T, I think. Interestingly when I tried to re-run the dd
command the OOM happen right away, even when free -h
looks fine. (I re-tried twice, and the second time I lost ssh.)
Heading to reboot and disable MGLRU and see if I can complete the job.
Are you able to say what the CMA settings were on the 5.15 and 6.1 kernels?
They would have been whatever they are in the 32 bit Arch Linux ARM kernel. I haven't compiled my own kernel or changed anything about it. I could grab the packages and check if you let me know where to look (in the packages, not for them).
Is everyone that is affected by this on a 4 GB or greater RPi 4?
Just bumped into it on a 2GB one. Was zeroing out my 10T USB HDD with
dd
and it happened after having written 4~5T, I think. Interestingly when I tried to re-run thedd
command the OOM happen right away, even whenfree -h
looks fine. (I re-tried twice, and the second time I lost ssh.)Heading to reboot and disable MGLRU and see if I can complete the job.
The latest 32-bit kernel for Pi 4 should already have MGLRU disable by default. Are you using an old Pi 4 32-bit kernel or other kernels?
Yeah I am on 6.1.19-2-rpi-ARCH
, since the latest Arch ARM linux-rpi has some build script problem and I decided to revert and stick to this older cached version I have for now, which has led me to confirm that it happens on a 2G model as well, heh. I am on armv7 build btw.
alarmpi /boot # uname -a
Linux alarmpi 6.1.27-1-rpi-ARCH #1 SMP Tue May 2 18:51:06 MDT 2023 armv7l GNU/Linux
alarmpi /boot # free -h
total used free shared buff/cache available
Mem: 7.7Gi 334Mi 6.6Gi 0.0Ki 851Mi 7.3Gi
Swap: 0B 0B 0B
alarmpi /boot # cat /sys/kernel/mm/lru_gen/enabled
0x0000
alarmpi /boot # zgrep -i lru /proc/config.gz
CONFIG_LRU_GEN=y
# CONFIG_LRU_GEN_ENABLED is not set
# CONFIG_LRU_GEN_STATS is not set
CONFIG_LRU_CACHE=m
My 5 cents with 8G pi
Have quite often OOMs during io (downloading stuff, converting video, watching over DLNA)
alarmpi /boot # uname -a Linux alarmpi 6.1.27-1-rpi-ARCH #1 SMP Tue May 2 18:51:06 MDT 2023 armv7l GNU/Linux alarmpi /boot # free -h total used free shared buff/cache available Mem: 7.7Gi 334Mi 6.6Gi 0.0Ki 851Mi 7.3Gi Swap: 0B 0B 0B alarmpi /boot # cat /sys/kernel/mm/lru_gen/enabled 0x0000 alarmpi /boot # zgrep -i lru /proc/config.gz CONFIG_LRU_GEN=y # CONFIG_LRU_GEN_ENABLED is not set # CONFIG_LRU_GEN_STATS is not set CONFIG_LRU_CACHE=m
My 5 cents with 8G pi
Have quite often OOMs during io (downloading stuff, converting video, watching over DLNA)
This can happen with or without MGLRU (MGLRU only makes it worse).
The only real solution is to switch to the 64-bit kernel.
@popcornmix - on Arch ARM armv7h this is still happening but only in RPi4B with 8G. The very same uSD card in RPi4B with 2G runs just fine. The kernel is the latest in your rpi-6.1.y branch. Again, only 8G running armv7h.
@popcornmix - on Arch ARM armv7h this is still happening but only in RPi4B with 8G. The very same uSD card in RPi4B with 2G runs just fine. The kernel is the latest in your rpi-6.1.y branch. Again, only 8G running armv7h.
@graysky2 I can take a close look if you have the kernel from the OOM killer. Thanks.
Here is the dmesg output when the OOM panic is triggered:
[ +0.002367] CPU: 0 PID: 34 Comm: kworker/0:1 Tainted: G C 6.1.66-2-rpi-ARCH #1
[ +0.002349] Hardware name: BCM2711
[ +0.001976] Workqueue: events request_firmware_work_func
[ +0.001780] unwind_backtrace from show_stack+0x18/0x1c
[ +0.001831] show_stack from dump_stack_lvl+0x90/0xac
[ +0.001839] dump_stack_lvl from warn_alloc+0x110/0x19c
[ +0.001805] warn_alloc from __alloc_pages+0xe9c/0xfe8
[ +0.001686] __alloc_pages from __kmalloc_large_node+0x70/0x14c
[ +0.002096] __kmalloc_large_node from kmalloc_large+0x24/0xcc
[ +0.001741] kmalloc_large from brcmf_fws_attach+0x38/0x39c [brcmfmac]
[ +0.001751] brcmf_fws_attach [brcmfmac] from brcmf_proto_bcdc_init_done+0x18/0x30 [brcmfmac]
[ +0.001722] brcmf_proto_bcdc_init_done [brcmfmac] from brcmf_attach+0x15c/0x4e8 [brcmfmac]
[ +0.001923] brcmf_attach [brcmfmac] from brcmf_sdio_firmware_callback+0x84c/0x974 [brcmfmac]
[ +0.002157] brcmf_sdio_firmware_callback [brcmfmac] from brcmf_fw_request_done+0x160/0x18c [brcmfmac]
[ +0.002742] brcmf_fw_request_done [brcmfmac] from request_firmware_work_func+0x58/0x9c
[ +0.002201] request_firmware_work_func from process_one_work+0x21c/0x4d0
[ +0.001950] process_one_work from worker_thread+0x58/0x560
[ +0.001826] worker_thread from kthread+0xd8/0xf4
[ +0.001835] kthread from ret_from_fork+0x14/0x30
[ +0.001974] Exception stack(0xf08c5fb0 to 0xf08c5ff8)
[ +0.002038] 5fa0: 00000000 00000000 00000000 00000000
[ +0.001936] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ +0.001972] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ +0.001907] Mem-Info:
[ +0.002220] active_anon:59 inactive_anon:4414 isolated_anon:0
active_file:2336 inactive_file:3449 isolated_file:0
unevictable:0 dirty:62 writeback:0
slab_reclaimable:2081 slab_unreclaimable:4542
mapped:4063 shmem:125 pagetables:592
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:1999138 free_pcp:277 free_cma:129541
[ +0.015478] Node 0 active_anon:236kB inactive_anon:17828kB active_file:9344kB inactive_file:13796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:16252kB dirty:248kB writeback:0kB shmem:500kB writeback_tmp:0kB kernel_stack:1276kB pagetables:2540kB sec_pagetables:0kB all_unreclaimable? yes
[ +0.004466] DMA free:588112kB boost:0kB min:65536kB low:81920kB high:98304kB reserved_highatomic:4096KB active_anon:0kB inactive_anon:0kB active_file:472kB inactive_file:148kB unevictable:0kB writepending:0kB present:786432kB managed:626972kB mlocked:0kB bounce:0kB free_pcp:1052kB local_pcp:0kB free_cma:518164kB
[ +0.004757] lowmem_reserve[]: 0 0 7284 7284
[ +0.002203] DMA: 184*4kB (UMHC) 149*8kB (UMEHC) 67*16kB (UMEHC) 30*32kB (UMEHC) 18*64kB (UMEC) 10*128kB (UMEC) 7*256kB (UME) 8*512kB (MEC) 6*1024kB (MC) 2*2048kB (ME) 138*4096kB (MC) = 587768kB
[ +0.002268] 5943 total pagecache pages
[ +0.002379] 0 pages in swap cache
[ +0.002765] Free swap = 0kB
[ +0.002277] Total swap = 0kB
[ +0.002319] 2061312 pages RAM
[ +0.001997] 1864704 pages HighMem/MovableOnly
[ +0.001743] 39865 pages reserved
[ +0.001885] 131072 pages cma reserved
[ +0.001789] ieee80211 phy0: brcmf_bus_started: failed: -12
[ +0.001885] ieee80211 phy0: brcmf_attach: dongle is not responding: err=-12
[ +0.054322] FAT-fs (mmcblk0p6): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[ +0.015741] brcmfmac: brcmf_sdio_firmware_callback: brcmf_attach failed
[ +0.042698] bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
[ +0.005783] bcmgenet fd580000.ethernet end0: Link is Down
[ +0.009034] systemd invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
Here is the dmesg output when the OOM panic is triggered:
[ +0.015478] Node 0 active_anon:236kB inactive_anon:17828kB active_file:9344kB inactive_file:13796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:16252kB dirty:248kB writeback:0kB shmem:500kB writeback_tmp:0kB kernel_stack:1276kB pagetables:2540kB sec_pagetables:0kB all_unreclaimable? yes [ +0.004466] DMA free:588112kB boost:0kB min:65536kB low:81920kB high:98304kB reserved_highatomic:4096KB active_anon:0kB inactive_anon:0kB active_file:472kB inactive_file:148kB unevictable:0kB writepending:0kB present:786432kB managed:626972kB mlocked:0kB bounce:0kB free_pcp:1052kB local_pcp:0kB free_cma:518164kB [ +0.004757] lowmem_reserve[]: 0 0 7284 7284 [ +0.002203] DMA: 184*4kB (UMHC) 149*8kB (UMEHC) 67*16kB (UMEHC) 30*32kB (UMEHC) 18*64kB (UMEC) 10*128kB (UMEC) 7*256kB (UME) 8*512kB (MEC) 6*1024kB (MC) 2*2048kB (ME) 138*4096kB (MC) = 587768kB [ +0.002268] 5943 total pagecache pages [ +0.002379] 0 pages in swap cache [ +0.002765] Free swap = 0kB [ +0.002277] Total swap = 0kB [ +0.002319] 2061312 pages RAM [ +0.001997] 1864704 pages HighMem/MovableOnly [ +0.001743] 39865 pages reserved [ +0.001885] 131072 pages cma reserved [ +0.009034] systemd invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
This one seems to be a legitimate OOM kill -- there was only about 600KB reclaimable memory left (active_file:472kB + inactive_file:148kB) from the only zone (DMA) where the allocation (GFP_KERNEL) can be from, and the 600KB reclaimable memory probably was all hot.
Does OOM kills happen often? If so, can you still reproduce it after echo 0 >/sys/kernel/mm/lru_gen/enabled
? Thanks.
@yuzhaogoogle - Yes, OOM happens very frequently with armv7h (but not at all with aarach64). At the time this happened, the system was idle with really nothing running beyond sshd, ufw, rngd, and systemd. To trigger this, I was using the package manager to simply download new packages. When I glanced at htop
<100M of memory was used at the time of the OOM.
Regarding lru_gen
, it is compiled into the kernel but disabled by default on armv7h: https://github.com/archlinuxarm/PKGBUILDs/blob/master/core/linux-rpi/PKGBUILD#L75
I had very similar OOM behavior with root on f2fs. Are you using that any chance?
No, it is ext4.
<100M of memory was used at the time of the OOM.
You only have about 100M memory that the kernel can use:
[ +0.002319] 2061312 pages RAM [ +0.001997] 1864704 pages HighMem/MovableOnly [ +0.001743] 39865 pages reserved [ +0.001885] 131072 pages cma reserved
(2061312-1864704-39865-131072)*4KB=~100MB
This is a limitation of the 32-bit kernel -- it can't map HighMem, where the majority of DRAM sits. So I'd recommend switching to the 64-bit kernel.
This is a limitation of the 32-bit kernel -- it can't map HighMem, where the majority of DRAM sits. So I'd recommend switching to the 64-bit kernel.
I cannot follow this argument. The Raspberry Pi has been running on 32-bit kernel for years without this issue. There is a simple rule for this: no regressions.
Edit: in case a new feature doesn't work for a specific architecture it shouldn't be selectable.
There is a simple rule for this: no regressions.
Edit: in case a new feature doesn't work for a specific architecture it shouldn't be selectable.
This is a wonderfully idealistic view of the world, where test coverage is perfect and all users have the same requirements and experience.
Linus as made his views of LPAE very clear, so are you suggesting that it should not be possible to build a 32-bit kernel for the Pi 4?
There is a simple rule for this: no regressions. Edit: in case a new feature doesn't work for a specific architecture it shouldn't be selectable.
This is a wonderfully idealistic view of the world, where test coverage is perfect and all users have the same requirements and experience.
This rule applies after a regression has been discovered. I didn't want to say every developer needs to test every possible setup.
Linus as made his views of LPAE very clear, so are you suggesting that it should not be possible to build a 32-bit kernel for the Pi 4?
Sorry, i don't know which statement of Linus Torvalds you are refering to, but it's a fact that armv7 LPAE has a dedicated defconfig. Please try with a recent kernel version:
make help
...
Architecture-specific targets (arm):
* zImage - Compressed kernel image (arch/arm/boot/zImage)
Image - Uncompressed kernel image (arch/arm/boot/Image)
* xipImage - XIP kernel image, if configured (arch/arm/boot/xipImage)
uImage - U-Boot wrapped zImage
bootpImage - Combined zImage and initial RAM disk
(supply initrd image via make variable INITRD=<path>)
install - Install uncompressed kernel
zinstall - Install compressed kernel
uinstall - Install U-Boot wrapped compressed kernel
Install using (your) ~/bin/installkernel or
(distribution) /sbin/installkernel or
install to $(INSTALL_PATH) and run lilo
multi_v7_lpae_defconfig - multi_v7_defconfig with CONFIG_ARM_LPAE enabled
In case CONFIG_LRU_GEN_ENABLED absolutely doesn't work with CONFIG_ARM_LPAE, then LRU_GEN_ENABLED shouldn't be selectable. But this isn't the right place to discuss.
Sorry, i don't know which statement of Linus Torvalds you are refering to
Likely this: https://cl4ssic4l.wordpress.com/2011/05/24/linus-torvalds-about-pae/
I can confirm this 32bit kernel has zero unjustified OOM kills: linux-rpi 5.10.83 armv7Why?El dia 11 des. 2023 11:16, Stefan Wahren @.***> va escriure:
There is a simple rule for this: no regressions. Edit: in case a new feature doesn't work for a specific architecture it shouldn't be selectable.
This is a wonderfully idealistic view of the world, where test coverage is perfect and all users have the same requirements and experience.
This rule applies after a regression has been discovered. I didn't want to say every developer needs to test every possible setup.
Linus as made his views of LPAE very clear, so are you suggesting that it should not be possible to build a 32-bit kernel for the Pi 4?
Sorry, i don't know which statement of Linus Tovalds you are refering to, but it's a fact that armv7 LPAE has a dedicated defconfig. Please try with a recent kernel version: make help ... Architecture-specific targets (arm):
xipImage - XIP kernel image, if configured (arch/arm/boot/xipImage)
uImage - U-Boot wrapped zImage
bootpImage - Combined zImage and initial RAM disk
(supply initrd image via make variable INITRD=
multi_v7_lpae_defconfig - multi_v7_defconfig with CONFIG_ARM_LPAE enabled
In case CONFIG_LRU_GEN_ENABLED absolutely doesn't work with CONFIG_ARM_LPAE, then LRU_GEN_ENABLED shouldn't be selectable. But this is the right place to discuss.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
As I test, I build the armv7h kernel with LRU_GEN disabled and I still get the OOM on RPi4B 8G. Same image runs fine on RPi4B 2G.
% zgrep -i lru /proc/config.gz
# CONFIG_LRU_GEN is not set
CONFIG_LRU_CACHE=m
@graysky2 Could you please provide a complete config which shows the OOM?
This is a limitation of the 32-bit kernel -- it can't map HighMem, where the majority of DRAM sits. So I'd recommend switching to the 64-bit kernel.
I cannot follow this argument.
I wasn't presenting any argment. I stated a fact and made a recommendation based on that fact.
The Raspberry Pi has been running on 32-bit kernel for years without this issue.
Are you referring to 32-bit kernel on 32-bit or 64-bit Pi? I don't know about 32-bit kernel running on 64-bit Pi "for years".
There is a simple rule for this: no regressions.
Edit: in case a new feature doesn't work for a specific architecture it shouldn't be selectable.
Are you referring to any specific new feature? If you are referring to MGLRU, then as the reporter mentioned, it wasn't enabled in the first place.
The Raspberry Pi has been running on 32-bit kernel for years without this issue.
Are you referring to 32-bit kernel on 32-bit or 64-bit Pi? I don't know about 32-bit kernel running on 64-bit Pi "for years".
I don't know which Raspberry Pi you consider as 64-bit Pi. The BCM SoCs 2835, 2836, 2837 and 2711 are available in Mainline for both arm and arm64. And at least the Raspberry Pi 4 is available for 32-bit since 2019. Nobody complained during this process.
There is a simple rule for this: no regressions.
1. There is no "simple rule": this is a collective decision (h/w vendors, s/w developers, end users, etc) based on the ecosystem encompassing many other factors other than individual interest. (A good LWN article [here](https://lwn.net/Articles/838807/)).
Thanks for providing the link. But i think we talk about different things. I was refering about this and you were refering deprecation of arm platforms.
I can confirm this 32bit kernel has zero unjustified OOM kills: linux-rpi 5.10.83 armv7 Why? What was different back then that changed at some point later?
I can confirm this 32bit kernel has zero unjustified OOM kills: linux-rpi 5.10.83 armv7 Why? What was different back then that changed at some point later?
@josepmaria79 when did you start noticing OOM kills on the 32-bit 6.1 kernel? I was under the impression that premature OOM kills stopped since MGLRU was disabled on the 32-bit 6.1 kernel.
@graysky2 could please you try enabling MGLRU and see if you still can reproduce OOM kills?
Some people says it still happens with that disabled. In my case it started when I changed from 4GB to 8GB.El dia 12 des. 2023 07:57, Yu Zhao @.***> va escriure:
I can confirm this 32bit kernel has zero unjustified OOM kills: linux-rpi 5.10.83 armv7 Why? What was different back then that changed at some point later?
@josepmaria79 when did you start noticing OOM kills on the 32-bit 6.1 kernel? I was under the impression that premature OOM kills stoped since MGLRU was disabled on the 32-bit 6.1 kernel. @graysky2 could please you try enabling MGLRU and see if you still can reproduce OOM kills?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
@graysky2
Yes: https://github.com/archlinuxarm/PKGBUILDs/blob/master/core/linux-rpi/config
I tested this configuration with Mainline Linux 6.7-rc5 and 6.1.67 on a Raspberry Pi 4 with 8 GB RAM. The rootfs was Raspberry Pi OS 32-bit (buster). I was able to boot and checkout the Linux kernel via git. No OOM occured.
Is there any specific scenario to trigger these OOMs without using ARCH-specific stuff?
@lategoodbye - just connecting over ssh triggers the panic. Other times booting itself will.
@josepmaria79 - it is built but not enabled, https://github.com/archlinuxarm/PKGBUILDs/blob/master/core/linux-rpi/config#L854
If I add etc/tmpfiles.d/ok.conf
like below and boot OOM is triggered:
w- /sys/kernel/mm/lru_gen/enabled - - - - 1
Is there a parameter I can add to /boot/cmdline.txt
to have it enabled?
I built the latest from rpi-6.6.y (armv7h) and am still getting the OOM. With lru enabled or disabled (but compiled in just not enabled by default.)
I will say that if leave it disabled, the OOM messages are displayed on screen, but I can still ssh into the box. If it is enabled via that tmpfile shown above, the box is not reachable. If I try using programs like installing a package or anything memory intensive, the OOM agency kills the process.
[ +0.000067] CPU: 3 PID: 351 Comm: sshd Tainted: G C 6.6.6-2-rpi-ARCH #1
[ +0.000034] Hardware name: BCM2711
[ +0.000025] unwind_backtrace from show_stack+0x18/0x1c
[ +0.000043] show_stack from dump_stack_lvl+0x90/0xac
[ +0.000037] dump_stack_lvl from dump_header+0x54/0x1fc
[ +0.000037] dump_header from oom_kill_process+0x23c/0x248
[ +0.000040] oom_kill_process from out_of_memory+0x100/0x344
[ +0.000039] out_of_memory from __alloc_pages+0xa30/0xf28
[ +0.000038] __alloc_pages from __pmd_alloc+0x44/0x224
[ +0.000037] __pmd_alloc from pgd_alloc+0x254/0x2a0
[ +0.000034] pgd_alloc from mm_init+0xf0/0x270
[ +0.000025] mm_init from copy_process+0xed4/0x1dfc
[ +0.000026] copy_process from kernel_clone+0xac/0x3a8
[ +0.000027] kernel_clone from sys_clone+0x78/0x9c
[ +0.000026] sys_clone from ret_fast_syscall+0x0/0x1c
[ +0.000026] Exception stack(0xf0cc5fa8 to 0xf0cc5ff0)
[ +0.000025] 5fa0: b6f1e308 00000001 01200011 00000000 00000000 00000000
[ +0.000032] 5fc0: b6f1e308 00000001 b699ae58 00000078 00000001 0062b008 beaa04d0 01219128
[ +0.000029] 5fe0: b6f1e820 beaa0338 b68cd260 b68cd684
[ +0.000023] Mem-Info:
[ +0.000016] active_anon:85 inactive_anon:5125 isolated_anon:0
active_file:2773 inactive_file:5095 isolated_file:0
unevictable:0 dirty:93 writeback:0
slab_reclaimable:813 slab_unreclaimable:5685
mapped:5897 shmem:195 pagetables:551
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:1992102 free_pcp:0 free_cma:125082
[ +0.000117] Node 0 active_anon:340kB inactive_anon:20500kB active_file:11092kB inactive_file:20380kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:23588kB dirty:372kB writeback:0kB shmem:780kB writeback_tmp:0kB kernel_stack:1480kB pagetables:2204kB sec_pagetables:0kB all_unreclaimable? no
[ +0.000085] DMA free:571032kB boost:0kB min:65536kB low:81920kB high:98304kB reserved_highatomic:8192KB active_anon:0kB inactive_anon:0kB active_file:1056kB inactive_file:140kB unevictable:0kB writepending:48kB present:786432kB managed:626936kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:500328kB
[ +0.000088] lowmem_reserve[]: 0 0 7284 7284
[ +0.000040] DMA: 146*4kB (UMEHC) 121*8kB (UMEHC) 67*16kB (UMEHC) 12*32kB (UMEHC) 7*64kB (UMHC) 9*128kB (UMEC) 8*256kB (MEC) 10*512kB (UMEHC) 10*1024kB (UMEHC) 6*2048kB (UMEHC) 131*4096kB (MC) = 570880kB
[ +0.000151] 8053 total pagecache pages
[ +0.000019] 0 pages in swap cache
[ +0.000018] Free swap = 0kB
[ +0.000016] Total swap = 0kB
[ +0.000016] 2061312 pages RAM
[ +0.000017] 1864704 pages HighMem/MovableOnly
[ +0.000020] 39874 pages reserved
[ +0.000018] 131072 pages cma reserved
[ +0.000018] Tasks state (memory values in pages):
[ +0.000021] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ +0.000048] [ 151] 0 151 7744 1689 61440 0 -250 systemd-journal
[ +0.000039] [ 152] 0 152 7293 1792 57344 0 -1000 systemd-udevd
[ +0.000035] [ 176] 0 176 7230 1449 69632 0 0 (udev-worker)
[ +0.000035] [ 177] 0 177 7195 1387 69632 0 0 (udev-worker)
[ +0.000036] [ 178] 0 178 8374 1577 77824 0 0 (udev-worker)
[ +0.000034] [ 179] 981 179 3951 1952 53248 0 0 systemd-network
[ +0.000036] [ 180] 0 180 7195 1296 65536 0 0 (udev-worker)
[ +0.000034] [ 182] 0 182 7195 1287 65536 0 0 (udev-worker)
[ +0.000034] [ 183] 0 183 7195 1257 61440 0 0 (udev-worker)
[ +0.000034] [ 184] 0 184 7212 1635 69632 0 0 (udev-worker)
[ +0.000034] [ 186] 0 186 7207 1313 69632 0 0 (udev-worker)
[ +0.000034] [ 187] 0 187 7229 1435 69632 0 0 (udev-worker)
[ +0.000035] [ 188] 0 188 7221 1672 69632 0 0 (udev-worker)
[ +0.000034] [ 189] 0 189 7195 1276 61440 0 0 (udev-worker)
[ +0.000034] [ 191] 0 191 7202 1389 69632 0 0 (udev-worker)
[ +0.000034] [ 193] 0 193 8425 1796 81920 0 0 (udev-worker)
[ +0.000035] [ 194] 0 194 7217 1602 69632 0 0 (udev-worker)
[ +0.000034] [ 196] 0 196 8424 1649 77824 0 0 (udev-worker)
[ +0.000034] [ 197] 0 197 7272 1491 69632 0 0 (udev-worker)
[ +0.000034] [ 199] 0 199 7195 1402 69632 0 0 (udev-worker)
[ +0.000034] [ 200] 0 200 7195 1306 65536 0 0 (udev-worker)
[ +0.000034] [ 201] 0 201 7274 1392 65536 0 0 (udev-worker)
[ +0.000034] [ 202] 0 202 7230 1661 77824 0 0 (udev-worker)
[ +0.000035] [ 203] 0 203 7228 1314 65536 0 0 (udev-worker)
[ +0.000034] [ 204] 0 204 7228 1297 61440 0 0 (udev-worker)
[ +0.000035] [ 205] 0 205 7229 1348 61440 0 0 (udev-worker)
[ +0.000035] [ 206] 0 206 7220 1354 65536 0 0 (udev-worker)
[ +0.000042] [ 311] 980 311 4564 2784 61440 0 0 systemd-resolve
[ +0.000036] [ 312] 979 312 6058 1632 61440 0 0 systemd-timesyn
[ +0.000035] [ 317] 81 317 2853 1120 49152 0 -900 dbus-daemon
[ +0.000036] [ 318] 0 318 2414 1504 45056 0 -1000 sshd
[ +0.000033] [ 319] 0 319 3630 1568 53248 0 0 systemd-logind
[ +0.000035] [ 322] 0 322 1098 384 36864 0 0 agetty
[ +0.000034] [ 326] 0 326 3573 1472 57344 0 0 systemd-hostnam
[ +0.000035] [ 351] 0 351 2414 1632 49152 0 0 sshd
[ +0.000033] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=systemd-resolve,pid=311,uid=980
[ +0.000090] Out of memory: Killed process 311 (systemd-resolve) total-vm:18256kB, anon-rss:1408kB, file-rss:9728kB, shmem-rss:0kB, UID:980 pgtables:60kB oom_score_adj:0
[ +0.000085] DMA free:571032kB boost:0kB min:65536kB low:81920kB high:98304kB reserved_highatomic:8192KB active_anon:0kB inactive_anon:0kB active_file:1056kB inactive_file:140kB unevictable:0kB writepending:48kB present:786432kB managed:626936kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:500328kB
There is free_cma:500328kB
, which might be Pi specific and the reason why no OOM kills with the mainline stable. If you don't use UI, you could disable CMA by not setting CONFIG_CMA
.
I think I found the reason this OOM is occurring... I had /etc/sysctl.d/settings.conf
which contained:
vm.min_free_kbytes=65536
Removing that file and reboot gives armv7h on RPi4B with 8 GB that doesn't yet trigger an OOM problem.
In fact, I can trigger the OOM problem simply by running:
# echo 65536 > /proc/sys/vm/min_free_kbytes
Connection to rpi4.lan closed by remote host.
Connection to rpi4.lan closed.
EDIT: further, I compiled a kernel from rpi-6.1.y setting the option to enable lru_gen and have not experienced a problem.
% zgrep -i lru /proc/config.gz
CONFIG_LRU_GEN=y
CONFIG_LRU_GEN_ENABLED=y
# CONFIG_LRU_GEN_STATS is not set
CONFIG_LRU_CACHE=m
@pelwell @popcornmix - you might want to revisit disabling CONFIG_LRU_GEN_ENABLED
for armv7h.
Describe the bug
On armv7 builds (but not aarch64 ones), OOM killer triggers and kills numerous processes when memory usage goes above a certain threshold, but that threshold is far below the volatile memory available (the threshold is not consistent, I have experienced it anywhere from 10-50% of volatile memory). This seems to coincide with my distribution changing to a 6.1.x kernel in the last 2 months, and definitely did not happen for any 5.15.x kernel. (And to repeat, this does not occur for 6.1.x aarch64)
The threshold is really very low: just running a zsh shell with zimfw might trigger it. Just running a low usage gitea service occasionally triggers it. It is impossible to run "heavier" software such as media servers as they always trigger a cascade of OOM kills. While creating this issue just now, I experienced it when running
inxi -F
, and was locked out of the system (it's common for both systemd and ssh to be killed when it triggers), and had to hard reset power.Steps to reproduce the behaviour
pacman -S linux-rpi linux-rpi-headers
)pacman -S ffmpeg
)ffmpeg -i ./example.mkv ./example1.mp4 &
, and asleep 1
in betweenDevice (s)
Raspberry Pi 4 Mod. B
System
Archlinuxarm armv7h
Logs
From journalctl (wrapped)
Additional context
There are several other reports of this issue in this archlinuxarm forum post, along with lots more logs and
/boot/*.txt
, etc., and some info from one of the maintainers, who cannot find a distro specific cause: https://archlinuxarm.org/forum/viewtopic.php?f=23&t=16377