openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.63k stars 1.75k forks source link

swap zvol, high memory pressure, task blocked #5835

Closed bunder2015 closed 5 years ago

bunder2015 commented 7 years ago

System information

Type Version/Name
Distribution Name gentoo
Distribution Version
Linux Kernel 4.8.11
Architecture amd64
ZFS Version 0.7.0-rc2_68_ge8a2014
SPL Version 0.7.0-rc2_3_g4934925

Describe the problem you're observing

I've been testing swap on zvol and was finally able to get a stack trace out of it. My laptop has 4gb of memory, and with firefox, thunderbird and spotify open, I tend to wind up in swap. Sometimes when compiling packages in the background the system will completely hardlock and I'm forced to reboot. ABD helps, but sometimes this still happens. Today my big consumers all froze while compiling packages and for some reason the system was still responsive. The system was roughly 1.2gb into swap when this happened. I don't see anything zfs related on first glance, but if the system is going to give me something, I might as well pass it along. Thanks

a-pool/swap  primarycache          none                   local
a-pool/swap  secondarycache        none                   local
a-pool/swap  logbias               throughput             local
a-pool/swap  sync                  always                 local

Include any warning/errors/backtraces from the system logs

[1030693.478816] INFO: task conky:23179 blocked for more than 120 seconds.
[1030693.478820]       Tainted: P           O    4.8.11-gentoo #2
[1030693.478821] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.478822] conky           D ffff88000c883cc8     0 23179   7760 0x00000000
[1030693.478827]  ffff88000c883cc8 ffffffff8110439e 0000000000000000 ffff88000c884000
[1030693.478830]  ffff8800abf6a600 ffff88000c883d00 0000000000000000 ffff880052978ca8
[1030693.478832]  ffff88000c883ce0 ffffffff81541612 ffff880052978ca8 ffff88000c883d60
[1030693.478835] Call Trace:
[1030693.478844]  [<ffffffff8110439e>] ? mntput+0x28/0x2a
[1030693.478849]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.478852]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.478858]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.478861]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.478863]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.478866]  [<ffffffff81139913>] proc_pid_cmdline_read+0x93/0x3c0
[1030693.478872]  [<ffffffff810ea072>] __vfs_read+0x21/0xe1
[1030693.478874]  [<ffffffff810eab15>] vfs_read+0xa2/0x12d
[1030693.478876]  [<ffffffff810ebc41>] SyS_read+0x49/0x83
[1030693.478879]  [<ffffffff81543fdb>] entry_SYSCALL_64_fastpath+0x13/0x8f
[1030693.478887] INFO: task firefox:16602 blocked for more than 120 seconds.
[1030693.478888]       Tainted: P           O    4.8.11-gentoo #2
[1030693.478889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.478890] firefox         D ffff8800116fbdc8     0 16602   7760 0x00000004
[1030693.478893]  ffff8800116fbdc8 ffff8800cc945f18 ffff8800116fbe00 ffff8800116fc000
[1030693.478896]  ffff88000f1c8940 ffff8800116fbe00 0000000000000000 ffff88000f1c8940
[1030693.478898]  ffff8800116fbde0 ffffffff81541612 ffff880052978ca8 ffff8800116fbe60
[1030693.478901] Call Trace:
[1030693.478903]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.478906]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.478909]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.478911]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.478913]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.478918]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.478920]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.478923]  [<ffffffff81545532>] page_fault+0x22/0x30
[1030693.478927] INFO: task JS Watchdog:16613 blocked for more than 120 seconds.
[1030693.478928]       Tainted: P           O    4.8.11-gentoo #2
[1030693.478929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.478930] JS Watchdog     D ffff8800c11a3dc8     0 16613   7760 0x00000000
[1030693.478933]  ffff8800c11a3dc8 ffff88000ea30540 ffff8800c11a3e00 ffff8800c11a4000
[1030693.478936]  ffff880064fce840 ffff8800c11a3e00 0000000000000000 ffff880064fce840
[1030693.478939]  ffff8800c11a3de0 ffffffff81541612 ffff880052978ca8 ffff8800c11a3e60
[1030693.478941] Call Trace:
[1030693.478944]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.478946]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.478949]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.478951]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.478953]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.478955]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.478957]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.478959]  [<ffffffff81545532>] page_fault+0x22/0x30
[1030693.478965] INFO: task Timer:16632 blocked for more than 120 seconds.
[1030693.478966]       Tainted: P           O    4.8.11-gentoo #2
[1030693.478967] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.478967] Timer           D ffff88004168fdc8     0 16632   7760 0x00000000
[1030693.478971]  ffff88004168fdc8 ffffffff8107f20c 00000001b8e0f000 ffff880041690000
[1030693.478973]  ffff8801205fea00 ffff88004168fe00 0000000000000000 ffff8801205fea00
[1030693.478976]  ffff88004168fde0 ffffffff81541612 ffff880052978ca8 ffff88004168fe60
[1030693.478979] Call Trace:
[1030693.478984]  [<ffffffff8107f20c>] ? futex_wake+0x106/0x120
[1030693.478986]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.478988]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.478991]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.478993]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.478995]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.478997]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.478999]  [<ffffffff810810c9>] ? SyS_futex+0xb8/0x11e
[1030693.479002]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.479003]  [<ffffffff81545532>] page_fault+0x22/0x30
[1030693.479006] INFO: task Compositor:16669 blocked for more than 120 seconds.
[1030693.479008]       Tainted: P           O    4.8.11-gentoo #2
[1030693.479008] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.479009] Compositor      D ffff8800c2107dc8     0 16669   7760 0x00000000
[1030693.479012]  ffff8800c2107dc8 ffffffff8107f20c 00000001a89fe000 ffff8800c2108000
[1030693.479015]  ffff88008ea0cb80 ffff8800c2107e00 0000000000000000 ffff88008ea0cb80
[1030693.479017]  ffff8800c2107de0 ffffffff81541612 ffff880052978ca8 ffff8800c2107e60
[1030693.479019] Call Trace:
[1030693.479022]  [<ffffffff8107f20c>] ? futex_wake+0x106/0x120
[1030693.479024]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.479026]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.479029]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.479031]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.479033]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.479035]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.479038]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.479039]  [<ffffffff81545532>] page_fault+0x22/0x30
[1030693.479043] INFO: task SoftwareVsyncTh:16676 blocked for more than 120 seconds.
[1030693.479044]       Tainted: P           O    4.8.11-gentoo #2
[1030693.479045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.479045] SoftwareVsyncTh D ffff88001df9fdc8     0 16676   7760 0x00000000
[1030693.479048]  ffff88001df9fdc8 ffff8800cc945f18 ffff88001df9fe00 ffff88001dfa0000
[1030693.479051]  ffff88000df7f1c0 ffff88001df9fe00 0000000000000000 ffff88000df7f1c0
[1030693.479053]  ffff88001df9fde0 ffffffff81541612 ffff880052978ca8 ffff88001df9fe60
[1030693.479056] Call Trace:
[1030693.479058]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.479060]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.479063]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.479065]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.479067]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.479069]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.479071]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.479073]  [<ffffffff81545532>] page_fault+0x22/0x30
[1030693.479076] INFO: task DOM Worker:16695 blocked for more than 120 seconds.
[1030693.479077]       Tainted: P           O    4.8.11-gentoo #2
[1030693.479078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.479078] DOM Worker      D ffff8800b03bfdc8     0 16695   7760 0x00000000
[1030693.479081]  ffff8800b03bfdc8 ffff8800b871c840 ffff8800b03bfe00 ffff8800b03c0000
[1030693.479084]  ffff88003beaeb40 ffff8800b03bfe00 0000000000000000 ffff88003beaeb40
[1030693.479087]  ffff8800b03bfde0 ffffffff81541612 ffff880052978ca8 ffff8800b03bfe60
[1030693.479089] Call Trace:
[1030693.479092]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.479094]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.479097]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.479099]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.479101]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.479103]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.479105]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.479107]  [<ffffffff81545532>] page_fault+0x22/0x30
[1030693.479110] INFO: task ImageBridgeChil:16752 blocked for more than 120 seconds.
[1030693.479111]       Tainted: P           O    4.8.11-gentoo #2
[1030693.479112] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.479112] ImageBridgeChil D ffff8800a5c63dc8     0 16752   7760 0x00000000
[1030693.479115]  ffff8800a5c63dc8 ffff8800cc945f18 ffff8800a5c63e00 ffff8800a5c64000
[1030693.479118]  ffff8800902ecd80 ffff8800a5c63e00 0000000000000000 ffff8800902ecd80
[1030693.479120]  ffff8800a5c63de0 ffffffff81541612 ffff880052978ca8 ffff8800a5c63e60
[1030693.479123] Call Trace:
[1030693.479125]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.479127]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.479130]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.479132]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.479134]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.479137]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.479139]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.479141]  [<ffffffff81545532>] page_fault+0x22/0x30
[1030693.479145] INFO: task DOM Worker:16769 blocked for more than 120 seconds.
[1030693.479146]       Tainted: P           O    4.8.11-gentoo #2
[1030693.479147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.479148] DOM Worker      D ffff8800283d3dc8     0 16769   7760 0x00000000
[1030693.479151]  ffff8800283d3dc8 ffff88010dc6a300 ffff8800283d3e00 ffff8800283d4000
[1030693.479154]  ffff8800b871c840 ffff8800283d3e00 0000000000000000 ffff8800b871c840
[1030693.479156]  ffff8800283d3de0 ffffffff81541612 ffff880052978ca8 ffff8800283d3e60
[1030693.479158] Call Trace:
[1030693.479161]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.479163]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.479165]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.479167]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.479170]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.479172]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.479174]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.479175]  [<ffffffff81545532>] page_fault+0x22/0x30
[1030693.479179] INFO: task DOM Worker:17208 blocked for more than 120 seconds.
[1030693.479180]       Tainted: P           O    4.8.11-gentoo #2
[1030693.479180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1030693.479181] DOM Worker      D ffff88008a0abdc8     0 17208   7760 0x00000000
[1030693.479184]  ffff88008a0abdc8 000000000000386a ffff88008a0abe00 ffff88008a0ac000
[1030693.479186]  ffff88010dc6a300 ffff88008a0abe00 0000000000000000 ffff88010dc6a300
[1030693.479189]  ffff88008a0abde0 ffffffff81541612 ffff880052978ca8 ffff88008a0abe60
[1030693.479192] Call Trace:
[1030693.479194]  [<ffffffff81541612>] schedule+0x79/0x87
[1030693.479196]  [<ffffffff815430e4>] rwsem_down_read_failed+0xe2/0x102
[1030693.479199]  [<ffffffff8124b318>] call_rwsem_down_read_failed+0x18/0x30
[1030693.479201]  [<ffffffff8124b318>] ? call_rwsem_down_read_failed+0x18/0x30
[1030693.479204]  [<ffffffff81542acc>] down_read+0x12/0x23
[1030693.479205]  [<ffffffff8102efc3>] __do_page_fault+0x283/0x3ee
[1030693.479207]  [<ffffffff8102f15b>] do_page_fault+0xc/0xe
[1030693.479209]  [<ffffffff81545532>] page_fault+0x22/0x30
acertain commented 7 years ago

Workaround: set vm.swappiness = 0, use cgroups with memory.swappiness > 0

behlendorf commented 7 years ago

This was originally reported against 0.7.0-rc2. Is it still and issue with 0.7.2?

bunder2015 commented 7 years ago

I'm not sure. Since I posted this I doubled the memory in my laptop, and have since bought a new laptop. I rarely swap anymore. FWIW, I had already set swappiness to "0" when this happened, that doesn't change this issue if it still does exist.

ocehugo commented 6 years ago

I'm still getting this when memory pressure is high and swap is on and lying in the root disk/ZoL --> ZFS 0.7.8.

had to remove swap and/or put in another disk

bunder2015 commented 6 years ago

I was hoping this was behind us, but I had a hang three weeks ago (but couldn't get the logs at the time), and I had another one today. I assume its probably related to memory tuning parameters and my aversion to swap unless absolutely necessary.

https://gist.github.com/bunder2015/e6832aff9528467e5ccf0d288236dc87

vm.swappiness = 1
zfs_arc_max=2147483648 zfs_arc_pc_percent=500

System has 8gb of memory, had 2gb free when I kicked off a firefox upgrade. Typical suspects, firefox, thunderbird, spotify.

Luckily I was able to catch it before the system completely locked up, and I was able to get logs out of it again this time.

HRio commented 5 years ago

does sync=always really help in the swap on zvol case? is the data not just written to SLOG and then anyhow staying used in ram until the next TXG flush? i.e. sync=always might just create more work (and latency until the ram can be made available) and hence increase the risk of problems?

FreeBSD recommends: https://wiki.freebsd.org/RootOnZFS#ZFS_Swap_Volume

bunder2015 commented 5 years ago

i believe my thought process was that if you're swapping, you want the data on disk as soon as possible so the memory can be freed, any delay could mean more memory allocation, which you don't have, and you wind up in a situation where you need memory to get data out to swap.

HRio commented 5 years ago

@bunder2015 considering that the SLOG is write only (only read when recovering from a crash), it does not add any real value to write it in the SWAP case, as the data is still in ram until TXG flush.

I consider the FreeBSD recommendations for swap zvol, do be a bit more safe than the zfs on linux recommendations, FreeBSD:

zfs create -V 2G -o org.freebsd:swap=on -o checksum=off -o compression=off -o dedup=off -o sync=disabled -o primarycache=none zroot/swap

The only thing I would probably drop is -o checksum=off, as that nulls a big part of the reason to use ZFS for swap in Linux. If the server is using ECC memory I do not want to get bit errors in the swap.

bunder2015 commented 5 years ago

Closing as old/stale, I can't easily reproduce without removing half the memory in my laptop