Open osleg opened 5 months ago
@osleg can you post /proc/spl/kmem/slab
from before and after the OOM event? Doesn't need to be exact, but I'd like to see what happens as more files are deleted, into the kernel attempting to reclaim memory, before finally giving up and killing something.
@robn sorry took me a bit of time to get those, here's 3 logs: first from before rm -f /mnt/dir2/*
started, second is right after rm
returned and third one is the last one I was able to fetch before the kernel panic
slab_1711966416_169.log
slab_1711966417_170.log
slab_1711966419_171.log
Got this issue as well... :-(
I've also hit this recently and it looks like it is similar if not the same as https://github.com/openzfs/zfs/issues/6783
Problem
Upon testing OpenZFS versions 2.1.13-2.1.15 and 2.2.2-2.2.3 on CentOS 8 Stream with various kernel versions ranging from 4.18.0-408 to .547, and utilizing Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz with 8GB ECC RAM, we encountered a memory consumption issue which leads to kerno panic during disk usage stress testing.
Test setup
Utilizing zpool with multiple configurations:
The test involves running multiple writers to fill the disk with random-sized files ranging from 1KB to 2GB. Once the disks are filled, all files are removed, and the process is repeated.
Observed issue
Across all tested versions, particularly pronounced in versions prior to 2.2.3, significant memory consumption occurs when files are removed.
Memory usage spikes, consuming all available memory.
The OOM killer activates in an attempt to free memory, resulting in kernel panics when no further resources are available for the OOM killer to release.
With 8GB RAM, the issue consistently occurs in every test instance before version 2.2.3, with a decreased frequency in version 2.2.3 (5 out of 20 CentOS test instances experienced kernel panics).
Logs
Machine info
Current instance is the only one that I left with for testing rn:
issue demo
After re-ssh directory still has all the files
zpool status
zpool list
zpool config
zfs config
vmcore dmesg
dmesg.txt
maybe related:
14732
15776
14914