rfjakob / earlyoom

earlyoom - Early OOM Daemon for Linux
MIT License
2.83k stars 149 forks source link

MemAvailable inaccurate, causing earlyoom to not kill processes in low memory scenarios #320

Open L1Z3 opened 3 weeks ago

L1Z3 commented 3 weeks ago

I'm running a Fedora 40 system with 16GiB of RAM, and have been running earlyoom instead of systemd-oomd for a long time now with no issues. However, recently I have begun to notice freezing up in low memory situations, requiring a force shutdown or Alt+SysRq+F to recover. (earlyoom version 1.8.2)

Upon further investigation by setting -r 1 in the config and testing with tail /dev/zero and watching the logs and MemAvailable with watch -n 1 systemctl status earlyoom and watch -n 0.1 free -m, it seems that my system freezes up entirely with MemAvailable at around 1500MiB-1700MiB (example line from earlyoom log when system is frozen: 1709 of 11556 MiB (14.80%), swap free: 0 of 8191 MiB ( 0.00%)).

Is there some deeper issue going on here, or is the answer just to set -m to a much higher value?

rfjakob commented 3 weeks ago

Tip: Temporarily do systemctl stop earlyoom and just run earlyoom on the command line to get the live output.

Fedora 40 here, too. Looks like this with tail /dev/zero in another terminal:

$ earlyoom 
earlyoom 1.8.2
mem total: 23888 MiB, user mem total: 21266 MiB, swap total: 8191 MiB
sending SIGTERM when mem avail <= 10.00% and swap free <= 10.00%,
        SIGKILL when mem avail <=  5.00% and swap free <=  5.00%
mem avail: 13158 of 21263 MiB (61.88%), swap free: 7248 of 8191 MiB (88.48%)
mem avail: 11332 of 21257 MiB (53.31%), swap free: 7248 of 8191 MiB (88.48%)
mem avail:  9358 of 21092 MiB (44.37%), swap free: 7248 of 8191 MiB (88.48%)
mem avail:  7759 of 21307 MiB (36.41%), swap free: 7248 of 8191 MiB (88.48%)
mem avail:  5698 of 21022 MiB (27.11%), swap free: 7248 of 8191 MiB (88.48%)
mem avail:  4113 of 21291 MiB (19.32%), swap free: 7248 of 8191 MiB (88.48%)
mem avail:  2446 of 21390 MiB (11.44%), swap free: 7195 of 8191 MiB (87.84%)
mem avail:  1581 of 21401 MiB ( 7.39%), swap free: 6849 of 8191 MiB (83.61%)
mem avail:  1407 of 21200 MiB ( 6.64%), swap free: 6248 of 8191 MiB (76.28%)
mem avail:  1265 of 20957 MiB ( 6.04%), swap free: 5584 of 8191 MiB (68.17%)
mem avail:   960 of 20824 MiB ( 4.61%), swap free: 5111 of 8191 MiB (62.40%)
mem avail:   686 of 20790 MiB ( 3.30%), swap free: 4629 of 8191 MiB (56.51%)
mem avail:   652 of 20565 MiB ( 3.17%), swap free: 3948 of 8191 MiB (48.20%)
mem avail:   554 of 20560 MiB ( 2.70%), swap free: 3278 of 8191 MiB (40.03%)
mem avail:   556 of 20507 MiB ( 2.71%), swap free: 2520 of 8191 MiB (30.77%)
mem avail:   550 of 20382 MiB ( 2.70%), swap free: 1597 of 8191 MiB (19.51%)
mem avail:   548 of 20171 MiB ( 2.72%), swap free:  783 of 8191 MiB ( 9.57%)
low memory! at or below SIGTERM limits: mem 10.00%, swap 10.00%
sending SIGTERM to process 107164 uid 1026 "tail": oom_score 1160, VmRSS 18739 MiB, cmdline "tail /dev/zero"
kill_release: pid=107164: process_mrelease pidfd=4 success
process 107164 exited after 0.100 seconds
mem avail: 19651 of 20540 MiB (95.67%), swap free: 2552 of 8191 MiB (31.16%)
mem avail: 19621 of 20569 MiB (95.39%), swap free: 2613 of 8191 MiB (31.91%)
mem avail: 19581 of 20545 MiB (95.31%), swap free: 2634 of 8191 MiB (32.16%)
rfjakob commented 3 weeks ago

No hangs here. What does your cat /proc/swaps say?

Looks like here zram got activated during some Fedora upgrade:

$ cat /proc/swaps 
Filename                Type        Size        Used        Priority
/dev/zram0                              partition   8388604     4928776     100
L1Z3 commented 3 weeks ago

Thanks for the quick response! I can't reproduce the issue at the moment (tail /dev/zero is killed by earlyoom just fine now). I have a hunch that it was related to what was running on my system at the time--I was restoring a cloud backup, so there was a lot of file I/O going on. I'll try and reproduce again later under the same conditions.

But yep, looks like I have zram as well:

Filename                Type        Size        Used        Priority
/dev/zram0                              partition   8388604     6127364     100
L1Z3 commented 2 weeks ago

I have not been able to reproduce this for the time being, so I'll go ahead and close this for now.

CanNuhlar commented 2 weeks ago

I've seen this happen on a device that I own, running "echo 3 > /proc/sys/vm/drop_caches" would make the eoom behave correctly. I ended up running it periodically

Check the output of free -m, free -m will show that you have a couple of MBs but eoom will show you more available ram.

You might be able to provoke this behaviour by forcing the cache to be filled up. find / -type f -exec md5sum {} \; -> Run this on a couple of shells depending on your HW, you might need a couple of instances of this to fill the cache.

cc: @rfjakob maybe this could help you in further issues.

L1Z3 commented 2 weeks ago

Thanks for the info! I'm no longer using the same system configuration as when I first encountered the issue (I've switched from Fedora to NixOS), so this may be a factor, but I can't reproduce this with that find command even in an extreme scenario like:

$ for i in $(seq 1 100); do find / -type f -exec md5sum {} \; & done
$ tail /dev/zero

This seems to behave fine for me, with earlyoom killing tail like it should. I'll go ahead and reopen this for now though since this issue wasn't unique to me.

rfjakob commented 1 week ago

Hi @CanNuhlar, can you post what free -m says vs what earlyoom says?

Anything is possible if there are bugs, but in principle, earlyoom directly uses the MemAvailable value from /proc/meminfo.