Closed Bronek closed 7 years ago
Hi,
what are the system details? Especially memory and ZFS settings you might have adjusted. Also please post the arc stats.
From the OOM i can see some VMs running. Maybe this is too much for the system.
I have 128GB RAM, of which some 40GB are used by VMs (i.e. two machines with 16GB allocated and 1 machine with 4GB, plus a little overhead). You can see these in total_vm, e.g. 4631679 of 4K pages ~= 18092 MB. Also, all VMs are configured to only use hugepages, where I have reserved some 56GB total. This still normally leaves plenty of RAM to work with, especially since arc_max is limited to 16GB
$ cat /etc/sysctl.d/80-hugepages.conf
# Reserve this many 2MB pages for virtual machine
vm.nr_hugepages = 28000
$ free -h
total used free shared buff/cache available
Mem: 125G 73G 49G 2.0M 2.5G 51G
Swap: 159G 0B 159G
$ cat /etc/modprobe.d/zfs.conf
# Enforce max ZFS ARC size to 16GB = 16*1024*1024*1024 = 17179869184
options zfs zfs_arc_max=17179869184
There is also this part of configuration:
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
I will try to reproduce it with RC3 and then report on arc stats
After upgrade to kernel 4.9.4 and current HEAD (which is zfsonlinux/zfs@2dbf1bf8296f66f24d5e404505c991bfbeec7808 at this time) I can no longer reproduce this problem
@Bronek thanks for the update. It sounds as if the ABD changes which will appear in rc3 did address this issue. If you like we can leave this issue open for a while to verify that is in fact the case. When you're satisfied this issue has been resolved go ahead and close this issue.
Testing now with kernel 4.9.5 and RC3 , looking good so far.
Did not see it again after upgrade to RC3, presumed fixed.
@Bronek great news. Thanks for following up in the issue.
NOTE I'd be happy to present kernel configuration in a more readable form, but not sure what is available and preferred - please advice
When testing performance of a VM where disk is stored in a ZVOL , the host machine has crashed due to OOM. I tried to salvage the host by performing Alt+SysReq+f and later ensure graceful shutdown with Alt+SysReq+e , but this did not help.
The kernel configuration is