task kswapd blocked for more than 120 seconds, zpl_evict_inode in backtrace.

Hi, my apologies if this bug has already been reported, I looked through quite a bit to find something related but wasn't really sure and decided to file a new bug report.

I was running a marginally large Rsync task last night (20-30Gbytes) from a remote server to my local backup machine which is running the latest ZFS available in Arch Linux. I have 4x3TB hard drives in a zraid1 and 1x2TB removable backup drive, both formatted using ZFS. The OS drive is a separate spinning disk running EXT4 and the system is the HP MicroServer with 8GB of ram.

The system uses the default limits for the arc cache which appear to be 4Gbytes of memory. There is also 4Gbytes of swap available.

I've never had problems with Rsync in the past, but last night the entire system ground to a halt and became completely unresponsive by the time I got to it at 11AM. I'm still not particularly sure what the actual problem was - all my ssh sessions locked up and physical access to the computer was no better. I started the backup around 2:15AM and it appears to start having problems around 3AM.

Feb 14 03:00:53 hinoki kernel: INFO: task kswapd0:26 blocked for more than 120 seconds.
Feb 14 03:00:53 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:54 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:54 hinoki kernel: kswapd0         D 0000000000000246     0    26      2 0x00000000
Feb 14 03:00:54 hinoki kernel:  ffff880223e47a90 0000000000000046 00000000000144c0 ffff880223e47fd8
Feb 14 03:00:55 hinoki kernel:  ffff880223e47fd8 00000000000144c0 ffff8802241622c0 ffff880223e479f8
Feb 14 03:00:55 hinoki kernel:  ffffffffa0304896 ffff8801185c8830 ffff8802082d5a00 ffff880053f7aa80
Feb 14 03:00:55 hinoki kernel: Call Trace:
Feb 14 03:00:56 hinoki kernel:  [<ffffffffa0304896>] ? dnode_setdirty+0x126/0x170 [zfs]
Feb 14 03:00:56 hinoki kernel:  [<ffffffffa02ed7ed>] ? dbuf_dirty+0x44d/0x920 [zfs]
Feb 14 03:00:56 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:56 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:57 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:57 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:57 hinoki kernel:  [<ffffffffa037fded>] zfs_zinactive+0x6d/0x150 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0378cd4>] zfs_inactive+0x64/0x210 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811452f9>] ? truncate_pagecache+0x59/0x60
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa038e924>] zpl_evict_inode+0x24/0x30 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811befe0>] evict+0xb0/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811bf119>] dispose_list+0x39/0x50
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811c00d7>] prune_icache_sb+0x47/0x60
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811a839c>] super_cache_scan+0xfc/0x170
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811478ef>] shrink_slab+0x1df/0x3e0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b5a6>] balance_pgdat+0x3d6/0x5f0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b929>] kswapd+0x169/0x480
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81085ce0>] ? wake_up_atomic_t+0x30/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b7c0>] ? balance_pgdat+0x5f0/0x5f0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task khugepaged:28 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: khugepaged      D 0000000000000246     0    28      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff880223721850 0000000000000046 00000000000144c0 ffff880223721fd8
Feb 14 03:00:58 hinoki kernel:  ffff880223721fd8 00000000000144c0 ffff880224162b70 ffffffffa0304896
Feb 14 03:00:58 hinoki kernel:  ffff880093859c08 ffff880173f8a600 ffff880090e0af00 ffff880093859cf8
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0304896>] ? dnode_setdirty+0x126/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa037fded>] zfs_zinactive+0x6d/0x150 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0378cd4>] zfs_inactive+0x64/0x210 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811452f9>] ? truncate_pagecache+0x59/0x60
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa038e924>] zpl_evict_inode+0x24/0x30 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811befe0>] evict+0xb0/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811bf119>] dispose_list+0x39/0x50
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811c00d7>] prune_icache_sb+0x47/0x60
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811a839c>] super_cache_scan+0xfc/0x170
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811478ef>] shrink_slab+0x1df/0x3e0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114aa72>] do_try_to_free_pages+0x412/0x530
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114ac67>] try_to_free_pages+0xd7/0x1f0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8113eeb3>] __alloc_pages_nodemask+0x763/0xad0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8118056a>] alloc_pages_vma+0x9a/0x190
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811918ec>] khugepaged+0x5bc/0x11f0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81085ce0>] ? wake_up_atomic_t+0x30/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81191330>] ? maybe_pmd_mkwrite+0x30/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task z_wr_int/0:330 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: z_wr_int/0      D 0000000000000246     0   330      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff8800cd921cc0 0000000000000046 00000000000144c0 ffff8800cd921fd8
Feb 14 03:00:58 hinoki kernel:  ffff8800cd921fd8 00000000000144c0 ffff8800cea41160 00000000000242e8
Feb 14 03:00:58 hinoki kernel:  ffff8800cd921c38 ffffffff8109b9fc 0000d1086e1c5260 ffff8800cea411c8
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109b9fc>] ? update_curr+0xdc/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109bf52>] ? dequeue_entity+0x142/0x4c0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109aef5>] ? set_next_entity+0x95/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa034d628>] vdev_queue_io_done+0x58/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa03852e8>] zio_vdev_io_done+0x88/0x180 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0386226>] zio_execute+0xa6/0x140 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236ba7>] taskq_thread+0x237/0x4b0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff810951a0>] ? wake_up_process+0x40/0x40
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236970>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task z_wr_int/1:331 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: z_wr_int/1      D 0000000000000246     0   331      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff8800cd919cc0 0000000000000046 00000000000144c0 ffff8800cd919fd8
Feb 14 03:00:58 hinoki kernel:  ffff8800cd919fd8 00000000000144c0 ffff8800cea408b0 000000000003377f
Feb 14 03:00:58 hinoki kernel:  ffff8800cd919c38 ffffffff8109b9fc 0000fd5918cbd65f ffff8800cea40918
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109b9fc>] ? update_curr+0xdc/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109bf52>] ? dequeue_entity+0x142/0x4c0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109aef5>] ? set_next_entity+0x95/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa034d628>] vdev_queue_io_done+0x58/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa03852e8>] zio_vdev_io_done+0x88/0x180 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0386226>] zio_execute+0xa6/0x140 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236ba7>] taskq_thread+0x237/0x4b0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff810951a0>] ? wake_up_process+0x40/0x40
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236970>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task z_wr_int/2:332 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: z_wr_int/2      D 0000000000000246     0   332      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff8800ce17fcc0 0000000000000046 00000000000144c0 ffff8800ce17ffd8
Feb 14 03:00:58 hinoki kernel:  ffff8800ce17ffd8 00000000000144c0 ffff8802241656e0 0000000000009f93
Feb 14 03:00:58 hinoki kernel:  ffff8800ce17fc38 ffffffff8109b9fc 0000d1086e2bad93 ffff880224165748
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109b9fc>] ? update_curr+0xdc/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109bf52>] ? dequeue_entity+0x142/0x4c0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109aef5>] ? set_next_entity+0x95/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa034d628>] vdev_queue_io_done+0x58/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa03852e8>] zio_vdev_io_done+0x88/0x180 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0386226>] zio_execute+0xa6/0x140 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236ba7>] taskq_thread+0x237/0x4b0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff810951a0>] ? wake_up_process+0x40/0x40
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236970>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task z_wr_int/3:333 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: z_wr_int/3      D 0000000000000246     0   333      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff8800ce163cc0 0000000000000046 00000000000144c0 ffff8800ce163fd8
Feb 14 03:00:58 hinoki kernel:  ffff8800ce163fd8 00000000000144c0 ffff880221986840 0000000000003b56
Feb 14 03:00:58 hinoki kernel:  ffff8800ce163c38 ffffffff8109b9fc 0000fd591be52b26 ffff8802219868a8
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109b9fc>] ? update_curr+0xdc/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109bf52>] ? dequeue_entity+0x142/0x4c0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109aef5>] ? set_next_entity+0x95/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa034d628>] vdev_queue_io_done+0x58/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa03852e8>] zio_vdev_io_done+0x88/0x180 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0386226>] zio_execute+0xa6/0x140 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236ba7>] taskq_thread+0x237/0x4b0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff810951a0>] ? wake_up_process+0x40/0x40
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236970>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task z_wr_int/4:334 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: z_wr_int/4      D 0000000000000246     0   334      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff8800cd585cc0 0000000000000046 00000000000144c0 ffff8800cd585fd8
Feb 14 03:00:58 hinoki kernel:  ffff8800cd585fd8 00000000000144c0 ffff880221985f90 000000000005f068
Feb 14 03:00:58 hinoki kernel:  ffff8800cd585c38 ffffffff8109b9fc 0000d1086f176371 ffff880221985ff8
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109b9fc>] ? update_curr+0xdc/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109bf52>] ? dequeue_entity+0x142/0x4c0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109aef5>] ? set_next_entity+0x95/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa034d628>] vdev_queue_io_done+0x58/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa03852e8>] zio_vdev_io_done+0x88/0x180 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0386226>] zio_execute+0xa6/0x140 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236ba7>] taskq_thread+0x237/0x4b0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff810951a0>] ? wake_up_process+0x40/0x40
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236970>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task z_wr_int/5:335 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: z_wr_int/5      D 0000000000000246     0   335      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff8800ce165cc0 0000000000000046 00000000000144c0 ffff8800ce165fd8
Feb 14 03:00:58 hinoki kernel:  ffff8800ce165fd8 00000000000144c0 ffff8802219870f0 00000000000172f1
Feb 14 03:00:58 hinoki kernel:  ffff8800ce165c38 ffffffff8109b9fc 0000fd591910a013 ffff880221987158
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109b9fc>] ? update_curr+0xdc/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109bf52>] ? dequeue_entity+0x142/0x4c0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109aef5>] ? set_next_entity+0x95/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa034d628>] vdev_queue_io_done+0x58/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa03852e8>] zio_vdev_io_done+0x88/0x180 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0386226>] zio_execute+0xa6/0x140 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236ba7>] taskq_thread+0x237/0x4b0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff810951a0>] ? wake_up_process+0x40/0x40
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236970>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task z_wr_int/6:336 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: z_wr_int/6      D 0000000000000246     0   336      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff8800cd98dcc0 0000000000000046 00000000000144c0 ffff8800cd98dfd8
Feb 14 03:00:58 hinoki kernel:  ffff8800cd98dfd8 00000000000144c0 ffff88022172c580 000000000000c2ea
Feb 14 03:00:58 hinoki kernel:  ffff8800cd98dc38 ffffffff8109b9fc 0000d1086df6aee8 ffff88022172c5e8
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109b9fc>] ? update_curr+0xdc/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109bf52>] ? dequeue_entity+0x142/0x4c0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109aef5>] ? set_next_entity+0x95/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa034d628>] vdev_queue_io_done+0x58/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa03852e8>] zio_vdev_io_done+0x88/0x180 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0386226>] zio_execute+0xa6/0x140 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236ba7>] taskq_thread+0x237/0x4b0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff810951a0>] ? wake_up_process+0x40/0x40
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236970>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: INFO: task z_wr_int/7:338 blocked for more than 120 seconds.
Feb 14 03:00:58 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:58 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:58 hinoki kernel: z_wr_int/7      D 0000000000000246     0   338      2 0x00000000
Feb 14 03:00:58 hinoki kernel:  ffff8800cda67cc0 0000000000000046 00000000000144c0 ffff8800cda67fd8
Feb 14 03:00:58 hinoki kernel:  ffff8800cda67fd8 00000000000144c0 ffff8800cda508b0 0000000000041732
Feb 14 03:00:58 hinoki kernel:  ffff8800cda67c38 ffffffff8109b9fc 0000fd591c85c29e ffff8800cda50918
Feb 14 03:00:58 hinoki kernel: Call Trace:
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109b9fc>] ? update_curr+0xdc/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109bf52>] ? dequeue_entity+0x142/0x4c0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8109aef5>] ? set_next_entity+0x95/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa034d628>] vdev_queue_io_done+0x58/0x170 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa03852e8>] zio_vdev_io_done+0x88/0x180 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0386226>] zio_execute+0xa6/0x140 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236ba7>] taskq_thread+0x237/0x4b0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff810951a0>] ? wake_up_process+0x40/0x40
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0236970>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120

I've also got a process log from 11AM, I'll try to attach it separately as it is quite long.

Here is the process log at 11am, note some processes appear to be stuck in ZFS related functions https://gist.github.com/ioquatix/2764a81d4d4928fb6c8b

Forgot to include kernel version:

uname -a
Linux hinoki 3.12.9-2-ARCH #1 SMP PREEMPT Fri Jan 31 10:22:54 CET 2014 x86_64 GNU/Linux

@ioquatix Thanks for opening a new issue for this. Can you include what version of zfs you were using.

@behlendorf How do I find the ZFS version from the packages that have been installed?

I set up every 10 minutes to get memory usage stats from the server, so if something goes wrong again I would have some records. Apart from free -h and cat /proc/spl/kstat/zfs/arcstats what other details are useful to diagnose a failure?

Here is my pool configuration:

hinoki# zpool status
  pool: backup
 state: ONLINE
  scan: none requested
config:

    NAME                                        STATE     READ WRITE CKSUM
    backup                                      ONLINE       0     0     0
      ata-eSATA-2_ExternalRAID_WD-WCAZA6489285  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 22h47m with 0 errors on Fri Jan 31 02:57:05 2014
config:

    NAME                                          STATE     READ WRITE CKSUM
    tank                                          ONLINE       0     0     0
      raidz1-0                                    ONLINE       0     0     0
        ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2263503  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00MMMB0_WD-WMAWZ0353148  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00MMMB0_WD-WCAWZ2144377  ONLINE       0     0     0
        ata-WDC_WD30EZRX-00MMMB0_WD-WMAWZ0365295  ONLINE       0     0     0

errors: No known data errors

Here are the mount points:

hinoki# zfs list
NAME                                  USED  AVAIL  REFER  MOUNTPOINT
backup                                232G  1.56T    31K  /backup
backup/home                           232G  1.56T   232G  /backup/home
tank                                 6.30T  1.48T   488K  /tank
tank/backup                          3.41T  1.48T  3.41T  /tank/backup
tank/home                             233G  1.48T   233G  /home
tank/incoming                        58.0G  1.48T  58.0G  /tank/incoming
tank/media                           2.59T  1.48T  2.59T  /tank/media
tank/servers                         12.7G  1.48T   418K  /tank/servers
tank/servers/foobar.oriontransfer.org  12.7G  1.48T  12.7G  /tank/servers/foobar.oriontransfer.org
hinoki# zfs list -t snapshot
NAME                                 USED  AVAIL  REFER  MOUNTPOINT
backup/home@backup-20140212-205910    45K      -   232G  -
tank/home@backup-20140212-205910    2.16M      -   233G  -

So, after running for a while, it seems like ZFS is forcing other parts of the system to use the swap disk:

            total       used       free     shared    buffers     cached
Mem:          7.8G       7.3G       538M       140K        22M       888M
-/+ buffers/cache:       6.4G       1.4G
Swap:         4.0G        18M       4.0G

Is this normal behaviour?

Can you post in the output of /proc/spl/kstat/zfs/arcstats. It should shed some light on this.

arcstats

4 1 0x01 84 4032 8330400538 183157505535763
name                            type data
hits                            4    83896589
misses                          4    2176645
demand_data_hits                4    81129320
demand_data_misses              4    546977
demand_metadata_hits            4    2701975
demand_metadata_misses          4    1015622
prefetch_data_hits              4    12261
prefetch_data_misses            4    492329
prefetch_metadata_hits          4    53033
prefetch_metadata_misses        4    121717
mru_hits                        4    1251853
mru_ghost_hits                  4    216344
mfu_hits                        4    82579492
mfu_ghost_hits                  4    741426
deleted                         4    1249828
recycle_miss                    4    952021
mutex_miss                      4    1336
evict_skip                      4    120938268
evict_l2_cached                 4    0
evict_l2_eligible               4    73377835008
evict_l2_ineligible             4    27560876544
hash_elements                   4    283679
hash_elements_max               4    345251
hash_collisions                 4    1467291
hash_chains                     4    84187
hash_chain_max                  4    11
p                               4    613803485
c                               4    4188841984
c_min                           4    523605248
c_max                           4    4188841984
size                            4    4188596448
hdr_size                        4    96258912
data_size                       4    3594837504
other_size                      4    497500032
anon_size                       4    163840
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    613562368
mru_evict_data                  4    288745984
mru_evict_metadata              4    37897728
mru_ghost_size                  4    3575205376
mru_ghost_evict_data            4    2827847680
mru_ghost_evict_metadata        4    747357696
mfu_size                        4    2981111296
mfu_evict_data                  4    2852808192
mfu_evict_metadata              4    96002048
mfu_ghost_size                  4    606542336
mfu_ghost_evict_data            4    241041408
mfu_ghost_evict_metadata        4    365500928
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    149
memory_indirect_count           4    18495
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    789671
arc_meta_used                   4    1047042272
arc_meta_limit                  4    1047210496
arc_meta_max                    4    1348345296

free -h

            total       used       free     shared    buffers     cached
Mem:          7.8G       6.1G       1.7G       140K        13M        62M
-/+ buffers/cache:       6.0G       1.8G
Swap:         4.0G        18M       4.0G

@ioquatix For current versions of master you can get the running ZFS version from /sys/module/zfs/version.

As for this issue unfortunately I don't have a quick solution for you. The arcstats you posted don't so anything obviously wrong. We're going to have to dig in to this issue to find the root cause at some point it may take us a while to get a chance to investigate this.

Running RSync tonight, starting to see some strange numbers in arcstats:

arc_meta_used                   4    2609155760
arc_meta_limit                  4    1047210496
arc_meta_max                    4    2754040000

Surely arc_meta_used <= arc_meta_limit under normal circumstances? Swap usage is slowly increasing.

I'm running a second Rsync task from a remote server to increase the pressure. Seems like the kernel is under significant memory pressure, looking at the churn by running free.'

% free -h
             total       used       free     shared    buffers     cached
Mem:          7.8G       7.6G       170M        88K       468K        24M
-/+ buffers/cache:       7.6G       194M
Swap:         4.0G        52M       3.9G

Slowly but surely, over the past 10 minutes, the arc_meta_max has grown past the limit significantly:

arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    3698401
arc_meta_used                   4    2813880960
arc_meta_limit                  4    1047210496
arc_meta_max                    4    2834768496

System is now swapping to disk and the swap usage is growing by about 5Mbytes/minute.

% free -h
             total       used       free     shared    buffers     cached
Mem:          7.8G       7.7G       141M        88K       164K        18M
-/+ buffers/cache:       7.6G       160M
Swap:         4.0G        82M       3.9G

I found I could free up a significant chunk of space by the following:

hinoki# echo 2 >/proc/sys/vm/drop_caches
hinoki# free -h                         
             total       used       free     shared    buffers     cached
Mem:          7.8G       5.3G       2.5G        52K       2.5M        16M
-/+ buffers/cache:       5.3G       2.5G
Swap:         4.0G        95M       3.9G

However I get the feeling this will only delay the issue.

Seems like this also forced the arc_meta_used to reduce:

arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    4245664
arc_meta_used                   4    1194139920
arc_meta_limit                  4    1047210496
arc_meta_max                    4    2866771904

@ioquatix Just out of curiosity, what does cat /sys/kernel/mm/transparent_hugepage/enabled show? If "always" is selected, it might be interesting to boot with the transparent_hugepage=madvise option to see whether it makes much difference. Are you still able to regularly reproduce the deadlock?

Can you post the full arcstats when the system is in this state. Pull request #2110 was in part designed to address some issues related to meta data heavy workload like rsync. It would be very helpful if you could run your workload with the patches in #2110 applied and log arcstats every 15 seconds. You should see a significant improvement.

These changes weren't designed to address the deadlock your seeing. But the improved behavior of the ARC might avoid this issue.

If we can get some more real world testing on these changes it would be helpful. The changes are safe to apply to your existing pool and I was hoping to merge them to master this week. A few more data points would be welcome.

I'm logging arcstats and free -m every 10 minutes via email (cron).

I'm not sure I'm willing to update to the patch as it is a reasonably important backup server. Is there anything else I can do [to help]?

I couldn't see the version file in my current build sorry.

% cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

I haven't attempted to reproduce the deadlock, I'm carefully watching the free memory usage. It's a moderately important backup server, rather avoid deadlocks if possible.

@ioquatix The my suggestion would be just to wait a bit until 0.6.3 comes up. Then update to the official tag and we'll see if the issue remains. There's a decent chance this will be addressed.

Seems like I come in to the same issue:

             total       used       free     shared    buffers     cached
Mem:           11G        11G       119M       424K        70M       1.3G
-/+ buffers/cache:        10G       1.5G
Swap:         8.0G         0B       8.0G

Can't do [probably] anything disk i/o related. dmesg seems prety similar.

[2886698.041803] INFO: task kswapd0:62 blocked for more than 120 seconds.
[2886698.041840] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.041895] kswapd0         D ffff88033fcd3ec0     0    62      2 0x00000000
[2886698.041950]  ffff88032ff9e040 0000000000000046 ffff8803317f9140 0000000000013ec0
[2886698.042004]  ffff88032ffc3fd8 ffff88032ffc3fd8 ffff88032ff9e040 ffff88032f164db8
[2886698.042059]  ffff88032ff9e040 ffff88032f164dbc 0000000000000002 0000000000000000
[2886698.042113] Call Trace:
[2886698.042142]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.042175]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.042207]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.042255]  [<ffffffffa05608d4>] ? zfs_zinactive+0x4a/0xc6 [zfs]
[2886698.042298]  [<ffffffffa055a4f5>] ? zfs_inactive+0xff/0x134 [zfs]
[2886698.042328]  [<ffffffff8111cfaf>] ? evict+0xa3/0x153
[2886698.042358]  [<ffffffff8111d086>] ? dispose_list+0x27/0x31
[2886698.042389]  [<ffffffff8111e14d>] ? prune_icache_sb+0x2a9/0x2c4
[2886698.042422]  [<ffffffff8110d3ee>] ? prune_super+0xc1/0x127
[2886698.042455]  [<ffffffff810ce6da>] ? shrink_slab+0x19b/0x287
[2886698.042486]  [<ffffffff810d104b>] ? balance_pgdat+0x2a1/0x443
[2886698.042518]  [<ffffffff810d14ab>] ? kswapd+0x2be/0x30e
[2886698.042550]  [<ffffffff81057ec0>] ? abort_exclusive_wait+0x79/0x79
[2886698.042583]  [<ffffffff810d11ed>] ? balance_pgdat+0x443/0x443
[2886698.042614]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.042644]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.042677]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.042708]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.042741] INFO: task spl_kmem_cache/:750 blocked for more than 120 seconds.
[2886698.042787] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.042835] spl_kmem_cache/ D ffff88033fc13ec0     0   750      2 0x00000000
[2886698.042885]  ffff88032f64c7f0 0000000000000046 ffffffff81613400 0000000000013ec0
[2886698.042938]  ffff880331af1fd8 ffff880331af1fd8 ffff88032f64c7f0 ffff88032f164db8
[2886698.042992]  ffff88032f64c7f0 ffff88032f164dbc 0000000000000002 0000000000000000
[2886698.043046] Call Trace:
[2886698.043072]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.043105]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.043137]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.043177]  [<ffffffffa05608d4>] ? zfs_zinactive+0x4a/0xc6 [zfs]
[2886698.043220]  [<ffffffffa055a4f5>] ? zfs_inactive+0xff/0x134 [zfs]
[2886698.043252]  [<ffffffff8111cfaf>] ? evict+0xa3/0x153
[2886698.043282]  [<ffffffff8111d086>] ? dispose_list+0x27/0x31
[2886698.043313]  [<ffffffff8111e14d>] ? prune_icache_sb+0x2a9/0x2c4
[2886698.043345]  [<ffffffff8110d3ee>] ? prune_super+0xc1/0x127
[2886698.043375]  [<ffffffff810ce6da>] ? shrink_slab+0x19b/0x287
[2886698.043405]  [<ffffffff810d0816>] ? do_try_to_free_pages+0x277/0x401
[2886698.043438]  [<ffffffff810c6cbf>] ? __free_one_page+0x1dc/0x25e
[2886698.043470]  [<ffffffff810d0c4e>] ? try_to_free_pages+0xd2/0x122
[2886698.043502]  [<ffffffff810c7fb3>] ? __alloc_pages_nodemask+0x4ab/0x78e
[2886698.043535]  [<ffffffff810f5488>] ? alloc_pages_current+0xb1/0xca
[2886698.043568]  [<ffffffff810c481a>] ? __get_free_pages+0x5/0x3e
[2886698.043599]  [<ffffffff810debbd>] ? __pte_alloc_kernel+0x11/0x97
[2886698.043631]  [<ffffffff810ea6b1>] ? vmap_page_range_noflush+0x16f/0x27c
[2886698.043664]  [<ffffffff810ea7e3>] ? map_vm_area+0x25/0x36
[2886698.043695]  [<ffffffff810ec265>] ? __vmalloc_node_range+0x17a/0x1e4
[2886698.043730]  [<ffffffffa03748c6>] ? kv_alloc.isra.8+0x36/0x37 [spl]
[2886698.043762]  [<ffffffff810ec2fb>] ? __vmalloc_node+0x2c/0x31
[2886698.043795]  [<ffffffffa03748c6>] ? kv_alloc.isra.8+0x36/0x37 [spl]
[2886698.043828]  [<ffffffffa03748c6>] ? kv_alloc.isra.8+0x36/0x37 [spl]
[2886698.043862]  [<ffffffffa037778a>] ? spl_cache_grow_work+0x36/0x39d [spl]
[2886698.043895]  [<ffffffff81385f86>] ? __schedule+0x4dc/0x532
[2886698.043928]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.043960]  [<ffffffff8105f346>] ? mmdrop+0xd/0x1c
[2886698.043990]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.044024]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.044056]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.044086]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.044117]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.044148]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.044183] INFO: task z_wr_iss/0:929 blocked for more than 120 seconds.
[2886698.044214] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.044258] z_wr_iss/0      D ffff88033fc13ec0     0   929      2 0x00000000
[2886698.044308]  ffff88032b0237f0 0000000000000046 ffff88032f64c7f0 0000000000013ec0
[2886698.044362]  ffff88032b055fd8 ffff88032b055fd8 ffff88032b0237f0 ffff880331fb3728
[2886698.044416]  ffff88032b0237f0 ffff880331fb372c 0000000000000002 0000000000000000
[2886698.044469] Call Trace:
[2886698.044495]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.044528]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.044561]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.044604]  [<ffffffffa0539923>] ? vdev_queue_io+0x5d/0xd8 [zfs]
[2886698.044645]  [<ffffffffa05679d9>] ? zio_vdev_io_start+0x1b0/0x215 [zfs]
[2886698.044687]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.044731]  [<ffffffffa053b080>] ? vdev_raidz_io_start+0x368/0x524 [zfs]
[2886698.044777]  [<ffffffffa0539c15>] ? vdev_raidz_asize+0x4d/0x4d [zfs]
[2886698.044818]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.044858]  [<ffffffffa0538ed8>] ? vdev_mirror_io_start+0x3d3/0x3f0 [zfs]
[2886698.044904]  [<ffffffffa05387b4>] ? vdev_config_sync+0x139/0x139 [zfs]
[2886698.044946]  [<ffffffffa05648f2>] ? zio_execute+0xdb/0xfd [zfs]
[2886698.044980]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.045012]  [<ffffffff8105f346>] ? mmdrop+0xd/0x1c
[2886698.045042]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.045076]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.045107]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.045138]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.045169]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.045200]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.045232] INFO: task z_wr_iss/2:931 blocked for more than 120 seconds.
[2886698.045263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.045311] z_wr_iss/2      D ffff88033fc93ec0     0   931      2 0x00000000
[2886698.045360]  ffff880331aaa830 0000000000000046 ffff8803328ef7b0 0000000000013ec0
[2886698.045414]  ffff88032b059fd8 ffff88032b059fd8 ffff880331aaa830 ffff880331fb3728
[2886698.045468]  ffff880331aaa830 ffff880331fb372c 0000000000000002 0000000000000000
[2886698.045522] Call Trace:
[2886698.045548]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.045581]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.045614]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.045714]  [<ffffffffa0539923>] ? vdev_queue_io+0x5d/0xd8 [zfs]
[2886698.045759]  [<ffffffffa05679d9>] ? zio_vdev_io_start+0x1b0/0x215 [zfs]
[2886698.045803]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.045847]  [<ffffffffa053b11b>] ? vdev_raidz_io_start+0x403/0x524 [zfs]
[2886698.045891]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.045935]  [<ffffffffa0538ed8>] ? vdev_mirror_io_start+0x3d3/0x3f0 [zfs]
[2886698.045980]  [<ffffffffa05387b4>] ? vdev_config_sync+0x139/0x139 [zfs]
[2886698.046023]  [<ffffffffa05648f2>] ? zio_execute+0xdb/0xfd [zfs]
[2886698.046057]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.046089]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.046123]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.046154]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.046185]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.046216]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.046248]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.046279] INFO: task z_wr_iss/3:932 blocked for more than 120 seconds.
[2886698.046310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.046358] z_wr_iss/3      D ffff88033fcd3ec0     0   932      2 0x00000000
[2886698.046408]  ffff880331aaa100 0000000000000046 ffff8803328ef080 0000000000013ec0
[2886698.051540]  ffff88032b05bfd8 ffff88032b05bfd8 ffff880331aaa100 ffff8803308a6728
[2886698.051594]  ffff880331aaa100 ffff8803308a672c 0000000000000002 0000000000000000
[2886698.051648] Call Trace:
[2886698.051674]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.051706]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.051740]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.051782]  [<ffffffffa0539923>] ? vdev_queue_io+0x5d/0xd8 [zfs]
[2886698.051824]  [<ffffffffa05679d9>] ? zio_vdev_io_start+0x1b0/0x215 [zfs]
[2886698.051866]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.051908]  [<ffffffffa053b080>] ? vdev_raidz_io_start+0x368/0x524 [zfs]
[2886698.051954]  [<ffffffffa0539c15>] ? vdev_raidz_asize+0x4d/0x4d [zfs]
[2886698.051996]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.052039]  [<ffffffffa0538ed8>] ? vdev_mirror_io_start+0x3d3/0x3f0 [zfs]
[2886698.052085]  [<ffffffffa05387b4>] ? vdev_config_sync+0x139/0x139 [zfs]
[2886698.052128]  [<ffffffffa05648f2>] ? zio_execute+0xdb/0xfd [zfs]
[2886698.052161]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.052193]  [<ffffffff8105f346>] ? mmdrop+0xd/0x1c
[2886698.052223]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.052257]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.052286]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.052315]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.052346]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.052376]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.052406] INFO: task z_wr_iss/5:934 blocked for more than 120 seconds.
[2886698.052436] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.052484] z_wr_iss/5      D ffff88033fd53ec0     0   934      2 0x00000000
[2886698.052534]  ffff88032b024870 0000000000000046 ffff88032b028040 0000000000013ec0
[2886698.052588]  ffff88032b05ffd8 ffff88032b05ffd8 ffff88032b024870 ffff8803308a6728
[2886698.052642]  ffff88032b024870 ffff8803308a672c 0000000000000002 0000000000000000
[2886698.052696] Call Trace:
[2886698.052722]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.052754]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.052787]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.052830]  [<ffffffffa0539923>] ? vdev_queue_io+0x5d/0xd8 [zfs]
[2886698.052871]  [<ffffffffa05679d9>] ? zio_vdev_io_start+0x1b0/0x215 [zfs]
[2886698.052914]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.052958]  [<ffffffffa053b11b>] ? vdev_raidz_io_start+0x403/0x524 [zfs]
[2886698.053000]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.053044]  [<ffffffffa0538ed8>] ? vdev_mirror_io_start+0x3d3/0x3f0 [zfs]
[2886698.053090]  [<ffffffffa05387b4>] ? vdev_config_sync+0x139/0x139 [zfs]
[2886698.053132]  [<ffffffffa05648f2>] ? zio_execute+0xdb/0xfd [zfs]
[2886698.053166]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.053198]  [<ffffffff8105f346>] ? mmdrop+0xd/0x1c
[2886698.053228]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.053262]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.053294]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.053324]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.053355]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.053387]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.053418] INFO: task z_wr_iss/6:935 blocked for more than 120 seconds.
[2886698.053449] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.053497] z_wr_iss/6      D ffff88033fd93ec0     0   935      2 0x00000000
[2886698.053546]  ffff88032b024140 0000000000000046 ffff88032b02b040 0000000000013ec0
[2886698.053600]  ffff88032b061fd8 ffff88032b061fd8 ffff88032b024140 ffff880331fb3728
[2886698.053734]  ffff88032b024140 ffff880331fb372c 0000000000000002 0000000000000000
[2886698.053795] Call Trace:
[2886698.053821]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.053854]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.053887]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.053930]  [<ffffffffa0539923>] ? vdev_queue_io+0x5d/0xd8 [zfs]
[2886698.053971]  [<ffffffffa05679d9>] ? zio_vdev_io_start+0x1b0/0x215 [zfs]
[2886698.054014]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.054057]  [<ffffffffa053b080>] ? vdev_raidz_io_start+0x368/0x524 [zfs]
[2886698.054100]  [<ffffffffa0539c15>] ? vdev_raidz_asize+0x4d/0x4d [zfs]
[2886698.054142]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.054186]  [<ffffffffa0538ed8>] ? vdev_mirror_io_start+0x3d3/0x3f0 [zfs]
[2886698.054231]  [<ffffffffa05387b4>] ? vdev_config_sync+0x139/0x139 [zfs]
[2886698.054274]  [<ffffffffa05648f2>] ? zio_execute+0xdb/0xfd [zfs]
[2886698.054308]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.054340]  [<ffffffff8105f346>] ? mmdrop+0xd/0x1c
[2886698.054370]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.054403]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.054435]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.054465]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.054497]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.054528]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.054559] INFO: task z_wr_iss/7:936 blocked for more than 120 seconds.
[2886698.054591] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.054638] z_wr_iss/7      D ffff88033fdd3ec0     0   936      2 0x00000000
[2886698.054688]  ffff88032b026770 0000000000000046 ffff8803328f1100 0000000000013ec0
[2886698.054742]  ffff88032b063fd8 ffff88032b063fd8 ffff88032b026770 ffff8803308a6728
[2886698.054795]  ffff88032b026770 ffff8803308a672c 0000000000000002 0000000000000000
[2886698.054849] Call Trace:
[2886698.054875]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.054908]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.054941]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.054984]  [<ffffffffa0539923>] ? vdev_queue_io+0x5d/0xd8 [zfs]
[2886698.055025]  [<ffffffffa05679d9>] ? zio_vdev_io_start+0x1b0/0x215 [zfs]
[2886698.055067]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.055111]  [<ffffffffa053b080>] ? vdev_raidz_io_start+0x368/0x524 [zfs]
[2886698.055156]  [<ffffffffa0539c15>] ? vdev_raidz_asize+0x4d/0x4d [zfs]
[2886698.055198]  [<ffffffffa05668be>] ? zio_nowait+0xfe/0x120 [zfs]
[2886698.055239]  [<ffffffffa0538ed8>] ? vdev_mirror_io_start+0x3d3/0x3f0 [zfs]
[2886698.055284]  [<ffffffffa05387b4>] ? vdev_config_sync+0x139/0x139 [zfs]
[2886698.055327]  [<ffffffffa05648f2>] ? zio_execute+0xdb/0xfd [zfs]
[2886698.055361]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.055393]  [<ffffffff8105f346>] ? mmdrop+0xd/0x1c
[2886698.055423]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.055456]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.055488]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.055518]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.055549]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.055581]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.055612] INFO: task z_wr_int/0:942 blocked for more than 120 seconds.
[2886698.055644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.055691] z_wr_int/0      D ffff88033fc13ec0     0   942      2 0x00000000
[2886698.055741]  ffff88032b027100 0000000000000046 ffffffff81613400 0000000000013ec0
[2886698.055795]  ffff88032b071fd8 ffff88032b071fd8 ffff88032b027100 ffff880331fb3728
[2886698.055849]  ffff88032b027100 ffff880331fb372c 0000000000000002 0000000000000000
[2886698.055902] Call Trace:
[2886698.055928]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.055961]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.055994]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.056037]  [<ffffffffa05399de>] ? vdev_queue_io_done+0x40/0xe6 [zfs]
[2886698.056078]  [<ffffffffa05643a4>] ? zio_vdev_io_done+0x5b/0x135 [zfs]
[2886698.056120]  [<ffffffffa05648f2>] ? zio_execute+0xdb/0xfd [zfs]
[2886698.056154]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.056186]  [<ffffffff8105f346>] ? mmdrop+0xd/0x1c
[2886698.056215]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.056248]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.056277]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.056305]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.056336]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.056366]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.056396] INFO: task z_wr_int/1:943 blocked for more than 120 seconds.
[2886698.056427] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[2886698.056474] z_wr_int/1      D ffff88033fc53ec0     0   943      2 0x00000000
[2886698.056524]  ffff88032f1e9080 0000000000000046 ffff8803328ed040 0000000000013ec0
[2886698.056578]  ffff88032b073fd8 ffff88032b073fd8 ffff88032f1e9080 ffff8803308a6728
[2886698.056632]  ffff88032f1e9080 ffff8803308a672c 0000000000000002 0000000000000000
[2886698.056686] Call Trace:
[2886698.056711]  [<ffffffff81386427>] ? schedule_preempt_disabled+0x5/0x6
[2886698.056744]  [<ffffffff813854ea>] ? __mutex_lock_common.isra.6+0x145/0x161
[2886698.056777]  [<ffffffff8138531c>] ? mutex_lock+0x17/0x27
[2886698.056820]  [<ffffffffa05399de>] ? vdev_queue_io_done+0x40/0xe6 [zfs]
[2886698.056859]  [<ffffffffa05643a4>] ? zio_vdev_io_done+0x5b/0x135 [zfs]
[2886698.056901]  [<ffffffffa05648f2>] ? zio_execute+0xdb/0xfd [zfs]
[2886698.056935]  [<ffffffffa0379426>] ? taskq_thread+0x29a/0x417 [spl]
[2886698.056967]  [<ffffffff8105f346>] ? mmdrop+0xd/0x1c
[2886698.056997]  [<ffffffff810636cd>] ? try_to_wake_up+0x192/0x192
[2886698.057029]  [<ffffffffa037918c>] ? task_done+0x115/0x115 [spl]
[2886698.057057]  [<ffffffff81057631>] ? kthread+0x7d/0x85
[2886698.057085]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59
[2886698.057117]  [<ffffffff8138bd7c>] ? ret_from_fork+0x7c/0xb0
[2886698.057146]  [<ffffffff810575b4>] ? __kthread_parkme+0x59/0x59

Arcstats:

4 1 0x01 84 4032 3406155346 3457633662274528
name                            type data
hits                            4    2014466512
misses                          4    192169903
demand_data_hits                4    8409642
demand_data_misses              4    190042
demand_metadata_hits            4    1374151137
demand_metadata_misses          4    117335857
prefetch_data_hits              4    727
prefetch_data_misses            4    898935
prefetch_metadata_hits          4    631905006
prefetch_metadata_misses        4    73745069
mru_hits                        4    148802733
mru_ghost_hits                  4    78755925
mfu_hits                        4    1233758046
mfu_ghost_hits                  4    23009589
deleted                         4    95774879
recycle_miss                    4    84777271
mutex_miss                      4    119517
evict_skip                      4    28492017134
evict_l2_cached                 4    0
evict_l2_eligible               4    1510064494080
evict_l2_ineligible             4    1076965627904
hash_elements                   4    376525
hash_elements_max               4    1057322
hash_collisions                 4    98320978
hash_chains                     4    110160
hash_chain_max                  4    17
p                               4    154279
c                               4    5896273888
c_min                           4    788982272
c_max                           4    6311858176
size                            4    5896261848
hdr_size                        4    138468672
data_size                       4    5184984064
other_size                      4    572809112
anon_size                       4    167511040
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    1335905280
mru_evict_data                  4    1163052032
mru_evict_metadata              4    65536
mru_ghost_size                  4    4526376960
mru_ghost_evict_data            4    2720257536
mru_ghost_evict_metadata        4    1806119424
mfu_size                        4    3681567744
mfu_evict_data                  4    3637477888
mfu_evict_metadata              4    278528
mfu_ghost_size                  4    926946304
mfu_ghost_evict_data            4    12189696
mfu_ghost_evict_metadata        4    914756608
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    360
memory_indirect_count           4    586728
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    1430192
arc_meta_used                   4    928655576
arc_meta_limit                  4    1577964544
arc_meta_max                    4    1756215752

Kernel:

3.9-1-amd64 #1 SMP Debian 3.9.8-1 x86_64 GNU/Linux

cat /sys/kernel/mm/transparent_hugepage/enabled"
always [madvise] never

I haven't seen the issue happen again. It happened when I did a fresh RSycn to a new ZFS partition. Since then, I have seen the memory usage over around 1GB free.

I got a similar problem.

I woke the system up from a suspend, after approximately 3 minutes it locked up completely...

Note that I have not enabled any swap so I wonder why the task kswapd0 appears in the stacktrace.

Apr 28 00:48:35 localhost kernel: SPLError: 15886:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'external' has encountered an uncorrectable I/O failure and has been suspended.\x0a
Apr 28 00:48:35 localhost kernel: SPLError: 15886:0:(spl-err.c:67:vcmn_err()) Skipped 1 previous similar message
Apr 28 00:51:05 localhost kernel: INFO: task kswapd0:58 blocked for more than 120 seconds.
Apr 28 00:51:05 localhost kernel:       Tainted: P           O 3.14.1-1-ARCH #1
Apr 28 00:51:05 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:51:05 localhost kernel: kswapd0         D 0000000000000002     0    58      2 0x00000000
Apr 28 00:51:05 localhost kernel:  ffff88030af739b8 0000000000000046 ffff88030d8aebf0 ffff88030af73fd8
Apr 28 00:51:05 localhost kernel:  00000000000142c0 00000000000142c0 ffff88030d8aebf0 ffff88030af73928
Apr 28 00:51:05 localhost kernel:  ffffffff8109205b 000000031f4d42c0 ffff880200a3a740 0000000000000001
Apr 28 00:51:05 localhost kernel: Call Trace:
Apr 28 00:51:05 localhost kernel:  [<ffffffff8109205b>] ? ttwu_stat+0x9b/0x110
Apr 28 00:51:05 localhost kernel:  [<ffffffff81097d3a>] ? try_to_wake_up+0x1fa/0x2c0
Apr 28 00:51:05 localhost kernel:  [<ffffffff81097e52>] ? default_wake_function+0x12/0x20
Apr 28 00:51:05 localhost kernel:  [<ffffffff814d7eb9>] schedule+0x29/0x70
Apr 28 00:51:05 localhost kernel:  [<ffffffffa06d0325>] cv_wait_common+0x105/0x1a0 [spl]
Apr 28 00:51:05 localhost kernel:  [<ffffffff810aa350>] ? __wake_up_sync+0x20/0x20
Apr 28 00:51:05 localhost kernel:  [<ffffffffa06d03d5>] __cv_wait+0x15/0x20 [spl]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa07d1c33>] txg_wait_synced+0xb3/0x190 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa0796d5c>] dmu_tx_wait+0x1bc/0x2a0 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa0796ecc>] dmu_tx_assign+0x8c/0x460 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa0794fab>] ? dmu_tx_count_dnode+0x5b/0xa0 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa0809493>] zfs_inactive+0x163/0x1f0 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffff81143729>] ? truncate_pagecache+0x59/0x60
Apr 28 00:51:05 localhost kernel:  [<ffffffffa081f374>] zpl_evict_inode+0x24/0x30 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffff811bdf20>] evict+0xb0/0x1b0
Apr 28 00:51:05 localhost kernel:  [<ffffffff811be059>] dispose_list+0x39/0x50
Apr 28 00:51:05 localhost kernel:  [<ffffffff811bf017>] prune_icache_sb+0x47/0x60
Apr 28 00:51:05 localhost kernel:  [<ffffffff811a7c09>] super_cache_scan+0xf9/0x160
Apr 28 00:51:05 localhost kernel:  [<ffffffff811458b8>] shrink_slab_node+0x138/0x2e0
Apr 28 00:51:05 localhost kernel:  [<ffffffff810eb7f4>] ? css_next_descendant_pre+0x24/0x60
Apr 28 00:51:05 localhost kernel:  [<ffffffff811464ab>] shrink_slab+0x8b/0x160
Apr 28 00:51:05 localhost kernel:  [<ffffffff81149ef6>] balance_pgdat+0x3d6/0x5f0
Apr 28 00:51:05 localhost kernel:  [<ffffffff8114a272>] kswapd+0x162/0x420
Apr 28 00:51:05 localhost kernel:  [<ffffffff810aa350>] ? __wake_up_sync+0x20/0x20
Apr 28 00:51:05 localhost kernel:  [<ffffffff8114a110>] ? balance_pgdat+0x5f0/0x5f0
Apr 28 00:51:05 localhost kernel:  [<ffffffff810872a2>] kthread+0xd2/0xf0
Apr 28 00:51:05 localhost kernel:  [<ffffffff810871d0>] ? kthread_create_on_node+0x180/0x180
Apr 28 00:51:05 localhost kernel:  [<ffffffff814e2ffc>] ret_from_fork+0x7c/0xb0
Apr 28 00:51:05 localhost kernel:  [<ffffffff810871d0>] ? kthread_create_on_node+0x180/0x180
Apr 28 00:51:05 localhost kernel: INFO: task txg_sync:15968 blocked for more than 120 seconds.
Apr 28 00:51:05 localhost kernel:       Tainted: P           O 3.14.1-1-ARCH #1
Apr 28 00:51:05 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 28 00:51:05 localhost kernel: txg_sync        D 0000000000000002     0 15968      2 0x00000000
Apr 28 00:51:05 localhost kernel:  ffff8800659f5bd0 0000000000000046 ffff880200a3a740 ffff8800659f5fd8
Apr 28 00:51:05 localhost kernel:  00000000000142c0 00000000000142c0 ffff880200a3a740 ffff8800659f5b40
Apr 28 00:51:05 localhost kernel:  ffffffffa06c92da ffffffffa0813e15 ffff88030c9341e0 ffffffffa08168a0
Apr 28 00:51:05 localhost kernel: Call Trace:
Apr 28 00:51:05 localhost kernel:  [<ffffffffa06c92da>] ? taskq_dispatch_ent+0x6a/0x1d0 [spl]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa0813e15>] ? zil_sync+0xa5/0x4f0 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa08168a0>] ? zio_taskq_member.isra.4+0x90/0x90 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa07c283d>] ? spa_taskq_dispatch_ent+0x5d/0x80 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffff8101d3c9>] ? read_tsc+0x9/0x20
Apr 28 00:51:05 localhost kernel:  [<ffffffff814d7eb9>] schedule+0x29/0x70
Apr 28 00:51:05 localhost kernel:  [<ffffffff814d815f>] io_schedule+0x8f/0xe0
Apr 28 00:51:05 localhost kernel:  [<ffffffffa06d02bd>] cv_wait_common+0x9d/0x1a0 [spl]
Apr 28 00:51:05 localhost kernel:  [<ffffffff810aa350>] ? __wake_up_sync+0x20/0x20
Apr 28 00:51:05 localhost kernel:  [<ffffffffa06d0418>] __cv_wait_io+0x18/0x20 [spl]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa0818873>] zio_wait+0x103/0x1c0 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa07aa661>] dsl_pool_sync+0xb1/0x460 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa07c1ec5>] spa_sync+0x425/0xb00 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffff810cbbf8>] ? ktime_get_ts+0x48/0xe0
Apr 28 00:51:05 localhost kernel:  [<ffffffffa07d27ae>] txg_sync_thread+0x37e/0x5c0 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa07d2430>] ? txg_quiesce_thread+0x340/0x340 [zfs]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa06c89aa>] thread_generic_wrapper+0x7a/0x90 [spl]
Apr 28 00:51:05 localhost kernel:  [<ffffffffa06c8930>] ? __thread_exit+0xa0/0xa0 [spl]
Apr 28 00:51:05 localhost kernel:  [<ffffffff810872a2>] kthread+0xd2/0xf0
Apr 28 00:51:05 localhost kernel:  [<ffffffff810871d0>] ? kthread_create_on_node+0x180/0x180
Apr 28 00:51:05 localhost kernel:  [<ffffffff814e2ffc>] ret_from_fork+0x7c/0xb0
Apr 28 00:51:05 localhost kernel:  [<ffffffff810871d0>] ? kthread_create_on_node+0x180/0x180
Apr 28 00:51:38 localhost kernel: SysRq : Emergency Sync
Apr 28 00:51:39 localhost kernel: Emergency Sync complete

since when does ZFS support any suspend operation ?

if so when is it fully usable ? - currently I am also using it on my laptop and have (out of fear of any trouble) omitted suspending :/

Oh, so ZFS does not support suspend?

if I remember correctly it does not - it's mentioned in another issue and was classified as low priority, it also was mentioned that it's non-trivial to implement

I haven't had this issue show up since upgrading to the latest arch package and I've been running lots of data through it over the past couple of months.. I'm going to close it.

@behlendorf @ioquatix Not sure if this should be reopened but I just saw very similar behavior on (note: I am really new to ZFS admin)

distro: Ubuntu 14.04.1 LTS
kernel:  Linux ubuntu 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
zfs: 0.6.3-5~trusty
mem: 16GB

free -H (although, I couldn't grab this until the system had come back, was completely frozen)

             total       used       free     shared    buffers     cached
Mem:           15G        11G       4.1G       960K       1.8M        10M
-/+ buffers/cache:        11G       4.2G
Swap:          15G       972K        15G

While doing a du -sch . on a 1.7TB file system during a scrub.

lots of red in htop

[122524.243207] INFO: task z_wr_iss/4:6812 blocked for more than 120 seconds.
[122524.243224]       Tainted: PF          O 3.13.0-32-generic #57-Ubuntu
[122524.243239] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[122524.243258] z_wr_iss/4      D 0000000000000000     0  6812      2 0x00000000
[122524.243260]  ffff880293edbb30 0000000000000202 ffff8800bc10dfc0 ffff880293edbfd8
[122524.243264]  0000000000014440 0000000000014440 ffff8800bc10dfc0 ffff880078c1c7d0
[122524.243269]  ffff880078c1c7d4 ffff8800bc10dfc0 00000000ffffffff ffff880078c1c7d8
[122524.243273] Call Trace:
[122524.243277]  [<ffffffff817205f9>] schedule_preempt_disabled+0x29/0x70
[122524.243280]  [<ffffffff81722465>] __mutex_lock_slowpath+0x135/0x1b0
[122524.243283]  [<ffffffff817224ff>] mutex_lock+0x1f/0x2f
[122524.243312]  [<ffffffffa021e0f1>] vdev_queue_io+0x71/0x160 [zfs]
[122524.243344]  [<ffffffffa0257417>] zio_vdev_io_start+0x1c7/0x2e0 [zfs]
[122524.243376]  [<ffffffffa025a2d6>] zio_nowait+0xb6/0x180 [zfs]
[122524.243405]  [<ffffffffa0220c8a>] vdev_raidz_io_start+0x21a/0x2f0 [zfs]
[122524.243434]  [<ffffffffa021e5e0>] ? vdev_raidz_asize+0x60/0x60 [zfs]
[122524.243466]  [<ffffffffa02572e4>] zio_vdev_io_start+0x94/0x2e0 [zfs]
[122524.243498]  [<ffffffffa025a2d6>] zio_nowait+0xb6/0x180 [zfs]
[122524.243526]  [<ffffffffa021d117>] vdev_mirror_io_start+0xa7/0x1a0 [zfs]
[122524.243553]  [<ffffffffa021c730>] ? vdev_config_sync+0x150/0x150 [zfs]
[122524.243582]  [<ffffffffa0257467>] zio_vdev_io_start+0x217/0x2e0 [zfs]
[122524.243611]  [<ffffffffa02580b6>] zio_execute+0xa6/0x140 [zfs]
[122524.243617]  [<ffffffffa008bb57>] taskq_thread+0x237/0x4b0 [spl]
[122524.243620]  [<ffffffff81097437>] ? finish_task_switch+0x57/0x170
[122524.243623]  [<ffffffff8109a800>] ? wake_up_state+0x20/0x20
[122524.243629]  [<ffffffffa008b920>] ? taskq_cancel_id+0x1f0/0x1f0 [spl]
[122524.243632]  [<ffffffff8108b3d2>] kthread+0xd2/0xf0
[122524.243634]  [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0
[122524.243637]  [<ffffffff8172c5bc>] ret_from_fork+0x7c/0xb0
[122524.243639]  [<ffffffff8108b300>] ? kthread_create_on_node+0x1d0/0x1d0

After reading through #1657

some suspect entries from zpool events -v

Feb 23 2015 17:18:20.302526202 ereport.fs.zfs.delay
        class = "ereport.fs.zfs.delay"
        ena = 0xdc44da4c3d304801
        detector = (embedded nvlist)
                version = 0x0 
                scheme = "zfs"
                pool = 0x38f96646c5dbe967
                vdev = 0x1336fc8fb8c2cf65
        (end detector)
        pool = "three"
        pool_guid = 0x38f96646c5dbe967
        pool_context = 0x0 
        pool_failmode = "wait"
        vdev_guid = 0x1336fc8fb8c2cf65
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-id/scsi-1ATA_Hitachi_HUA723020ALA641_YFG2JM4C-part1"
        vdev_ashift = 0x9 
        vdev_complete_ts = 0xadc44da1b284
        vdev_delta_ts = 0x10a8c0891b2
        vdev_read_errors = 0x0 
        vdev_write_errors = 0x0 
        vdev_cksum_errors = 0x0 
        parent_guid = 0x50a30c2c6978ee9c
        parent_type = "raidz"
        vdev_spare_paths = 
        vdev_spare_guids = 
        zio_err = 0x0 
        zio_flags = 0x60460
        zio_stage = 0x20000
        zio_pipeline = 0x2e0000
        zio_delay = 0x102d7afb9
        zio_timestamp = 0xacb9c3062913
        zio_delta = 0x0 
        zio_offset = 0x33ed902c00
        zio_size = 0xc00
        zio_objset = 0x56
        zio_object = 0x67
        zio_level = 0x2 
        zio_blkid = 0x0 
        time = 0x54ebd15c 0x12082efa 
        eid = 0x36

Feb 23 2015 17:25:27.316912820 ereport.fs.zfs.delay
        class = "ereport.fs.zfs.delay"
        ena = 0xe27b9af4aad00801
        detector = (embedded nvlist)
                version = 0x0 
                scheme = "zfs"
                pool = 0xc32934c44572c952
                vdev = 0xc514356861a945ba
        (end detector)
        pool = "jobdz"
        pool_guid = 0xc32934c44572c952
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0xc514356861a945ba
        vdev_type = "disk"
        vdev_path = "/dev/disk/by-id/scsi-SAdaptec_Morphed_JBOD_01_A181E1A7-part1"
        vdev_ashift = 0x9
        vdev_complete_ts = 0xaccb2c5fa309
        vdev_delta_ts = 0x1e03efd
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        parent_guid = 0x43c37d246dc0521
        parent_type = "raidz"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x0
        zio_flags = 0x60440
        zio_stage = 0x20000
        zio_pipeline = 0x2e0000
        zio_delay = 0x102d950bb
        zio_timestamp = 0xacd24481a3af
        zio_delta = 0x0
        zio_offset = 0xe2d2284800
        zio_size = 0x400
        zio_objset = 0x0
        zio_object = 0x91
        zio_level = 0x0
        zio_blkid = 0x1
        time = 0x54ebd307 0x12e3b4b4
        eid = 0x37

Similar issue with ext4:

563663 Apr 22 20:06:57 AI02 kernel: [94488.814125] INFO: task kswapd0:139 blocked for more than 120 seconds.
563664 Apr 22 20:06:57 AI02 kernel: [94488.814132]       Tainted: P           OE    4.15.0-96-generic #97-Ubuntu
563665 Apr 22 20:06:57 AI02 kernel: [94488.814134] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
563666 Apr 22 20:06:57 AI02 kernel: [94488.814137] kswapd0         D    0   139      2 0x80000000
563667 Apr 22 20:06:57 AI02 kernel: [94488.814141] Call Trace:
563668 Apr 22 20:06:57 AI02 kernel: [94488.814151]  __schedule+0x24e/0x880
563669 Apr 22 20:06:57 AI02 kernel: [94488.814158]  ? blk_flush_plug_list+0xea/0x270
563670 Apr 22 20:06:57 AI02 kernel: [94488.814161]  schedule+0x2c/0x80
563671 Apr 22 20:06:57 AI02 kernel: [94488.814165]  schedule_timeout+0x1cf/0x350
563672 Apr 22 20:06:57 AI02 kernel: [94488.814219]  ? _xfs_buf_ioapply+0x396/0x4e0 [xfs]
563673 Apr 22 20:06:57 AI02 kernel: [94488.814223]  wait_for_completion+0xba/0x140
563674 Apr 22 20:06:57 AI02 kernel: [94488.814227]  ? wake_up_q+0x80/0x80
563675 Apr 22 20:06:57 AI02 kernel: [94488.814267]  ? xfs_bwrite+0x24/0x60 [xfs]
563676 Apr 22 20:06:57 AI02 kernel: [94488.814302]  xfs_buf_submit_wait+0x81/0x210 [xfs]
563677 Apr 22 20:06:57 AI02 kernel: [94488.814335]  xfs_bwrite+0x24/0x60 [xfs]
563678 Apr 22 20:06:57 AI02 kernel: [94488.814373]  xfs_reclaim_inode+0x327/0x350 [xfs]
563679 Apr 22 20:06:57 AI02 kernel: [94488.814407]  xfs_reclaim_inodes_ag+0x1eb/0x340 [xfs]
563680 Apr 22 20:06:57 AI02 kernel: [94488.814416]  ? check_preempt_curr+0x2d/0x90
563681 Apr 22 20:06:57 AI02 kernel: [94488.814418]  ? ttwu_do_wakeup+0x1e/0x140
563682 Apr 22 20:06:57 AI02 kernel: [94488.814421]  ? ttwu_do_activate+0x77/0x80
563683 Apr 22 20:06:57 AI02 kernel: [94488.814424]  ? try_to_wake_up+0x59/0x4a0
563684 Apr 22 20:06:57 AI02 kernel: [94488.814427]  ? wake_up_process+0x15/0x20
563685 Apr 22 20:06:57 AI02 kernel: [94488.814462]  xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
563686 Apr 22 20:06:57 AI02 kernel: [94488.814505]  xfs_fs_free_cached_objects+0x19/0x20 [xfs]
563687 Apr 22 20:06:57 AI02 kernel: [94488.814510]  super_cache_scan+0x165/0x1b0
563688 Apr 22 20:06:57 AI02 kernel: [94488.814514]  shrink_slab.part.51+0x1e7/0x440
563689 Apr 22 20:06:57 AI02 kernel: [94488.814518]  shrink_slab+0x29/0x30
563690 Apr 22 20:06:57 AI02 kernel: [94488.814521]  shrink_node+0x11e/0x300
563691 Apr 22 20:06:57 AI02 kernel: [94488.814525]  kswapd+0x2ae/0x730
563692 Apr 22 20:06:57 AI02 kernel: [94488.814529]  kthread+0x121/0x140
563693 Apr 22 20:06:57 AI02 kernel: [94488.814532]  ? mem_cgroup_shrink_node+0x190/0x190
563694 Apr 22 20:06:57 AI02 kernel: [94488.814534]  ? kthread_create_worker_on_cpu+0x70/0x70
563695 Apr 22 20:06:57 AI02 kernel: [94488.814538]  ret_from_fork+0x35/0x40
563696 Apr 22 20:06:57 AI02 kernel: [94488.814568] INFO: task kubelet:3361 blocked for more than 120 seconds.
563697 Apr 22 20:06:57 AI02 kernel: [94488.814571]       Tainted: P           OE    4.15.0-96-generic #97-Ubuntu
563698 Apr 22 20:06:57 AI02 kernel: [94488.814573] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
563699 Apr 22 20:06:57 AI02 kernel: [94488.814575] kubelet         D    0  3361      1 0x00000000
563700 Apr 22 20:06:57 AI02 kernel: [94488.814578] Call Trace:
563701 Apr 22 20:06:57 AI02 kernel: [94488.814582]  __schedule+0x24e/0x880
563702 Apr 22 20:06:57 AI02 kernel: [94488.814587]  ? ___slab_alloc+0xf2/0x4b0
563703 Apr 22 20:06:57 AI02 kernel: [94488.814589]  schedule+0x2c/0x80
563704 Apr 22 20:06:57 AI02 kernel: [94488.814592]  schedule_preempt_disabled+0xe/0x10

The thing worth noted is that I have disabled the swap before.

openzfs / zfs

task kswapd blocked for more than 120 seconds, zpl_evict_inode in backtrace. #2125