Closed DeHackEd closed 11 years ago
Unfortunately, I seem to have given you the wrong pull request number in IRC. The one that you should have tried was #618, not #669.
How did you manage to create pool version 26 with zfs-fuse?
Perhaps it is a typo and he meant pool version 23.
I was going to suggest that you try the patch in issue #618 but you beat me too it. I'm a bit surprised that it doesn't resolve the issue, clearly there's more to be done.
I haven't had the chance to apply #618 yet. I'll try it tomorrow, time permitting.
No love. With #618 applied the same scenario still deadlocks after ignoring the arc size limit
These outputs are all taken after ZFS jammed up.
# free total used free shared buffers cached Mem: 8182236 7540728 641508 0 28528 43364 -/+ buffers/cache: 7468836 713400 Swap: 4194300 14492 4179808 # history | grep modprobe modprobe zfs zfs_vdev_max_pending=20 zfs_arc_max=2147483648 # cat /proc/spl/kstat/zfs/arcstats 10 1 0x01 77 3696 141145250104682 144360035578750 name type data hits 4 5947062 misses 4 939766 demand_data_hits 4 215008 demand_data_misses 4 13180 demand_metadata_hits 4 4365140 demand_metadata_misses 4 469195 prefetch_data_hits 4 20650 prefetch_data_misses 4 53521 prefetch_metadata_hits 4 1346264 prefetch_metadata_misses 4 403870 mru_hits 4 895846 mru_ghost_hits 4 277838 mfu_hits 4 3684335 mfu_ghost_hits 4 189079 deleted 4 248071 recycle_miss 4 488895 mutex_miss 4 8046 evict_skip 4 362916066 evict_l2_cached 4 0 evict_l2_eligible 4 14598742528 evict_l2_ineligible 4 5304508416 hash_elements 4 253442 hash_elements_max 4 253519 hash_collisions 4 240599 hash_chains 4 66173 hash_chain_max 4 7 p 4 134225408 c 4 2147483648 c_min 4 293601280 c_max 4 2147483648 size 4 3320870416 hdr_size 4 85214688 data_size 4 2363930112 other_size 4 871725616 anon_size 4 27773952 anon_evict_data 4 0 anon_evict_metadata 4 0 mru_size 4 161863680 mru_evict_data 4 0 mru_evict_metadata 4 3620864 mru_ghost_size 4 1977015808 mru_ghost_evict_data 4 908544000 mru_ghost_evict_metadata 4 1068471808 mfu_size 4 2174292480 mfu_evict_data 4 0 mfu_evict_metadata 4 1738240 mfu_ghost_size 4 166193152 mfu_ghost_evict_data 4 0 mfu_ghost_evict_metadata 4 166193152 l2_hits 4 0 l2_misses 4 0 l2_feeds 4 0 l2_rw_clash 4 0 l2_read_bytes 4 0 l2_write_bytes 4 0 l2_writes_sent 4 0 l2_writes_done 4 0 l2_writes_error 4 0 l2_writes_hdr_miss 4 0 l2_evict_lock_retry 4 0 l2_evict_reading 4 0 l2_free_on_write 4 0 l2_abort_lowmem 4 0 l2_cksum_bad 4 0 l2_io_error 4 0 l2_size 4 0 l2_hdr_size 4 0 memory_throttle_count 4 0 memory_direct_count 4 0 memory_indirect_count 4 0 arc_no_grow 4 0 arc_tempreserve 4 475136 arc_loaned_bytes 4 262144 arc_prune 4 2313 arc_meta_used 4 3309051920 arc_meta_limit 4 536870912 arc_meta_max 4 3309157328
Stack traces, some are additional kernel threads.
z_wr_int/15 D ffff880113a49408 0 18763 2 0x00000000 ffff880113a49160 0000000000000046 ffffffff811ddf2f ffff880100000000 ffff88022e8e67d0 000000000000ffc0 ffff8801a8d05fd8 0000000000004000 ffff8801a8d05fd8 000000000000ffc0 ffff880113a49160 ffff8801a8d04010 Call Trace: [] ? scsi_request_fn+0x314/0x3ef [ ] ? __make_request+0x22a/0x243 [ ] ? generic_make_request+0x201/0x262 [ ] ? __mutex_lock_slowpath+0xe2/0x128 [ ] ? mutex_lock+0x12/0x25 [ ] ? vdev_cache_write+0x57/0x11d [zfs] [ ] ? zio_vdev_io_done+0x6f/0x141 [zfs] [ ] ? zio_execute+0xad/0xd1 [zfs] [ ] ? taskq_thread+0x2c2/0x508 [spl] [ ] ? try_to_wake_up+0x1d9/0x1eb [ ] ? try_to_wake_up+0x1eb/0x1eb [ ] ? spl_taskq_init+0x159/0x159 [spl] [ ] ? spl_taskq_init+0x159/0x159 [spl] [ ] ? kthread+0x7a/0x82 [ ] ? kernel_thread_helper+0x4/0x10 [ ] ? kthread_worker_fn+0x139/0x139 [ ] ? gs_change+0xb/0xb txg_sync D ffff88018865b898 0 18781 2 0x00000000 ffff88018865b5f0 0000000000000046 ffff88022fc10028 ffff88011160c248 ffff88011160c200 000000000000ffc0 ffff88015dd89fd8 0000000000004000 ffff88015dd89fd8 000000000000ffc0 ffff88018865b5f0 ffff88015dd88010 Call Trace: [ ] ? check_preempt_curr+0x25/0x62 [ ] ? ttwu_do_wakeup+0x11/0x83 [ ] ? try_to_wake_up+0x1d9/0x1eb [ ] ? __wake_up_common+0x41/0x78 [ ] ? cv_wait_common+0xb8/0x141 [spl] [ ] ? wake_up_bit+0x23/0x23 [ ] ? zio_wait+0xe8/0x11c [zfs] [ ] ? dsl_pool_sync+0xbf/0x428 [zfs] [ ] ? spa_sync+0x47d/0x829 [zfs] [ ] ? txg_sync_thread+0x29a/0x3f6 [zfs] [ ] ? set_user_nice+0x115/0x139 [ ] ? txg_thread_exit+0x2b/0x2b [zfs] [ ] ? __thread_create+0x2df/0x2df [spl] [ ] ? thread_generic_wrapper+0x6a/0x75 [spl] [ ] ? kthread+0x7a/0x82 [ ] ? kernel_thread_helper+0x4/0x10 [ ] ? kthread_worker_fn+0x139/0x139 [ ] ? gs_change+0xb/0xb rsync D ffff88020b19f008 0 19506 19505 0x00000000 ffff88020b19ed60 0000000000000082 0000000000000001 0000000000000007 ffff880151257930 000000000000ffc0 ffff88015f65dfd8 0000000000004000 ffff88015f65dfd8 000000000000ffc0 ffff88020b19ed60 ffff88015f65c010 Call Trace: [ ] ? extract_buf+0x76/0xc2 [ ] ? invalidate_interrupt1+0xe/0x20 [ ] ? __mutex_lock_slowpath+0xe2/0x128 [ ] ? mutex_lock+0x12/0x25 [ ] ? vdev_cache_read+0x92/0x2de [zfs] [ ] ? zio_vdev_io_start+0x1c1/0x228 [zfs] [ ] ? zio_nowait+0xd0/0xf4 [zfs] [ ] ? vdev_mirror_io_start+0x2fa/0x313 [zfs] [ ] ? vdev_config_generate+0xac7/0xac7 [zfs] [ ] ? zio_nowait+0xd0/0xf4 [zfs] [ ] ? arc_read_nolock+0x662/0x673 [zfs] [ ] ? arc_read+0xc2/0x146 [zfs] [ ] ? dnode_block_freed+0xfe/0x119 [zfs] [ ] ? dbuf_fill_done+0x61/0x61 [zfs] [ ] ? dbuf_read+0x3ce/0x5c0 [zfs] [ ] ? dnode_hold_impl+0x1a8/0x43c [zfs] [ ] ? remove_reference+0x93/0x9f [zfs] [ ] ? dmu_bonus_hold+0x22/0x26e [zfs] [ ] ? zfs_zget+0x5c/0x19f [zfs] [ ] ? zfs_dirent_lock+0x447/0x48f [zfs] [ ] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs] [ ] ? zfs_dirlook+0x20a/0x276 [zfs] [ ] ? zfs_lookup+0x26e/0x2b6 [zfs] [ ] ? zpl_lookup+0x47/0x80 [zfs] [ ] ? d_alloc_and_lookup+0x43/0x60 [ ] ? do_lookup+0x1c9/0x2bb [ ] ? path_lookupat+0xe2/0x5af [ ] ? do_path_lookup+0x1d/0x5f [ ] ? user_path_at_empty+0x49/0x84 [ ] ? tsd_exit+0x83/0x18d [spl] [ ] ? cp_new_stat+0xdf/0xf1 [ ] ? vfs_fstatat+0x43/0x70 [ ] ? sys_newlstat+0x11/0x2d [ ] ? system_call_fastpath+0x16/0x1b rsync D ffff8801ba0c6cb8 0 19511 19510 0x00000000 ffff8801ba0c6a10 0000000000000086 ffffffff8115539c 0000000000000001 ffff8801512572a0 000000000000ffc0 ffff8801b1051fd8 0000000000004000 ffff8801b1051fd8 000000000000ffc0 ffff8801ba0c6a10 ffff8801b1050010 Call Trace: [ ] ? cpumask_any_but+0x28/0x37 [ ] ? __schedule+0x727/0x7b0 [ ] ? rcu_implicit_dynticks_qs+0x3f/0x60 [ ] ? force_quiescent_state+0x1c3/0x230 [ ] ? __mutex_lock_slowpath+0xe2/0x128 [ ] ? arc_buf_remove_ref+0xe6/0xf4 [zfs] [ ] ? mutex_lock+0x12/0x25 [ ] ? zfs_zinactive+0x5a/0xd4 [zfs] [ ] ? zfs_inactive+0x106/0x19e [zfs] [ ] ? evict+0x78/0x117 [ ] ? dispose_list+0x2c/0x36 [ ] ? shrink_icache_memory+0x278/0x2a8 [ ] ? shrink_slab+0xe3/0x153 [ ] ? do_try_to_free_pages+0x253/0x3f0 [ ] ? get_page_from_freelist+0x47a/0x4ad [ ] ? try_to_free_pages+0x79/0x7e [ ] ? __alloc_pages_nodemask+0x48b/0x6c8 [ ] ? __get_free_pages+0x12/0x52 [ ] ? spl_kmem_cache_alloc+0x236/0x975 [spl] [ ] ? dbuf_create+0x38/0x32e [zfs] [ ] ? dnode_hold_impl+0x3f0/0x43c [zfs] [ ] ? dbuf_create_bonus+0x16/0x1f [zfs] [ ] ? dmu_bonus_hold+0x10b/0x26e [zfs] [ ] ? zfs_zget+0x5c/0x19f [zfs] [ ] ? zfs_dirent_lock+0x447/0x48f [zfs] [ ] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs] [ ] ? zfs_dirlook+0x20a/0x276 [zfs] [ ] ? zfs_lookup+0x26e/0x2b6 [zfs] [ ] ? zpl_lookup+0x47/0x80 [zfs] [ ] ? d_alloc_and_lookup+0x43/0x60 [ ] ? do_lookup+0x1c9/0x2bb [ ] ? path_lookupat+0xe2/0x5af [ ] ? strncpy_from_user+0x9/0x4e [ ] ? do_path_lookup+0x1d/0x5f [ ] ? user_path_at_empty+0x49/0x84 [ ] ? tsd_exit+0x83/0x18d [spl] [ ] ? cp_new_stat+0xdf/0xf1 [ ] ? vfs_fstatat+0x43/0x70 [ ] ? sys_newlstat+0x11/0x2d [ ] ? system_call_fastpath+0x16/0x1b rsync D ffff88022365e728 0 19522 19521 0x00000000 ffff88022365e480 0000000000000082 ffffffff8115539c ffff8801b219b438 ffff8802219bcea0 000000000000ffc0 ffff8801b219bfd8 0000000000004000 ffff8801b219bfd8 000000000000ffc0 ffff88022365e480 ffff8801b219a010 Call Trace: [ ] ? cpumask_any_but+0x28/0x37 [ ] ? __pagevec_release+0x19/0x22 [ ] ? move_active_pages_to_lru+0x130/0x154 [ ] ? select_task_rq_fair+0x35e/0x791 [ ] ? common_interrupt+0xe/0x13 [ ] ? sched_clock_local+0x13/0x76 [ ] ? __mutex_lock_slowpath+0xe2/0x128 [ ] ? check_preempt_curr+0x25/0x62 [ ] ? mutex_lock+0x12/0x25 [ ] ? zfs_zinactive+0x5a/0xd4 [zfs] [ ] ? zfs_inactive+0x106/0x19e [zfs] [ ] ? evict+0x78/0x117 [ ] ? dispose_list+0x2c/0x36 [ ] ? shrink_icache_memory+0x278/0x2a8 [ ] ? shrink_slab+0xe3/0x153 [ ] ? do_try_to_free_pages+0x253/0x3f0 [ ] ? get_page_from_freelist+0x47a/0x4ad [ ] ? try_to_free_pages+0x79/0x7e [ ] ? __alloc_pages_nodemask+0x48b/0x6c8 [ ] ? __get_free_pages+0x12/0x52 [ ] ? spl_kmem_cache_alloc+0x236/0x975 [spl] [ ] ? dbuf_create+0x38/0x32e [zfs] [ ] ? dnode_hold_impl+0x3f0/0x43c [zfs] [ ] ? dbuf_create_bonus+0x16/0x1f [zfs] [ ] ? dmu_bonus_hold+0x10b/0x26e [zfs] [ ] ? zfs_zget+0x5c/0x19f [zfs] [ ] ? zfs_dirent_lock+0x447/0x48f [zfs] [ ] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs] [ ] ? zfs_dirlook+0x20a/0x276 [zfs] [ ] ? zfs_lookup+0x26e/0x2b6 [zfs] [ ] ? zpl_lookup+0x47/0x80 [zfs] [ ] ? d_alloc_and_lookup+0x43/0x60 [ ] ? do_lookup+0x1c9/0x2bb [ ] ? path_lookupat+0xe2/0x5af [ ] ? do_path_lookup+0x1d/0x5f [ ] ? user_path_at_empty+0x49/0x84 [ ] ? tsd_exit+0x83/0x18d [spl] [ ] ? cp_new_stat+0xdf/0xf1 [ ] ? vfs_fstatat+0x43/0x70 [ ] ? sys_newlstat+0x11/0x2d [ ] ? system_call_fastpath+0x16/0x1b rsync D ffff8802216c93c8 0 19527 19526 0x00000000 ffff8802216c9120 0000000000000082 000000000000000a ffff880000000000 ffff88022e8a6750 000000000000ffc0 ffff88021c0f1fd8 0000000000004000 ffff88021c0f1fd8 000000000000ffc0 ffff8802216c9120 ffff88021c0f0010 Call Trace: [ ] ? zap_get_leaf_byblk+0x1b5/0x249 [zfs] [ ] ? zap_leaf_array_match+0x166/0x197 [zfs] [ ] ? remove_reference+0x93/0x9f [zfs] [ ] ? arc_buf_remove_ref+0xe6/0xf4 [zfs] [ ] ? dbuf_rele_and_unlock+0x12b/0x19a [zfs] [ ] ? __mutex_lock_slowpath+0xe2/0x128 [ ] ? mutex_lock+0x12/0x25 [ ] ? zfs_zget+0x46/0x19f [zfs] [ ] ? zfs_dirent_lock+0x447/0x48f [zfs] [ ] ? zfs_zaccess_aces_check+0x1d5/0x203 [zfs] [ ] ? zfs_dirlook+0x20a/0x276 [zfs] [ ] ? zfs_lookup+0x26e/0x2b6 [zfs] [ ] ? zpl_lookup+0x47/0x80 [zfs] [ ] ? d_alloc_and_lookup+0x43/0x60 [ ] ? do_lookup+0x1c9/0x2bb [ ] ? path_lookupat+0xe2/0x5af [ ] ? do_path_lookup+0x1d/0x5f [ ] ? user_path_at_empty+0x49/0x84 [ ] ? tsd_exit+0x83/0x18d [spl] [ ] ? cp_new_stat+0xdf/0xf1 [ ] ? vfs_fstatat+0x43/0x70 [ ] ? sys_newlstat+0x11/0x2d [ ] ? system_call_fastpath+0x16/0x1b
zfs-fuse has pool v26 available in the Git repo.
[Edit: full arcstats included]
We're working on some spl kmem improvements which may help with this. Expect patches in the next week or so. Two cases have been identified which can result in memory reclaim from the slab being less the optimal.
I'm also experiencing this exact issue when attempting an rsync. Has there been any progress; is there anything I should try? I'm running 0.6.0-rc8 on Debian wheezy/sid using the deb instructions on the website.
Several patches which should help with this have been merged in to master post rc8. Please try the latest master source. Additionally, you can try increasing /proc/sys/vm/min_free_kbytes to say 256m or so.
I was actually just looking at #154 and it seemed to be related, and was just looking for confirmation that I should indeed try the latest master. So, thanks, will do, and will report back.
@jspiros Is there anything useful being output to dmesg?
You might want to run sysctl kernel.hung_task_timeout_secs=15
to reduce the amount of time it takes for the hang check timer to report hangs to dmesg from 120 seconds. This requires that your kernel is compiled with CONFIG_HANGCHECK_TIMER=Y.
@ryao Indeed, there were many hung tasks, even with the default of 120 seconds. The first three were kswapd0, arc_reclaim, and rsync, though many others hung after that.
I'm currently running the latest master, and I set min_free_kbytes to 256MB as recommended, but if you tell me what information you want should the same problem occur again tonight (I have some rsync backups that run nightly), I'll try to get it here. I was able to run some commands as root, including dmesg, so just tell me what would be helpful and I'll be sure to get as much information as I can.
The hung task information in the dmesg output would be useful.
Fresh git pulls from this morning on both spl and zfs. Versions are 2371321e8a3d29e19cbc3a280f48636211abb480 and ee191e802c515ae509aec382cac33c9d89235dc5
Arc size 2^31 bytes. It's the same setup from my first post.
# cat /proc/spl/kstat/zfs/arcstats 10 1 0x01 77 3696 216989312967 3547642914356 name type data hits 4 4820390 misses 4 1632473 demand_data_hits 4 2 demand_data_misses 4 58777 demand_metadata_hits 4 4437436 demand_metadata_misses 4 875202 prefetch_data_hits 4 0 prefetch_data_misses 4 0 prefetch_metadata_hits 4 382952 prefetch_metadata_misses 4 698494 mru_hits 4 676077 mru_ghost_hits 4 483254 mfu_hits 4 3761394 mfu_ghost_hits 4 483721 deleted 4 418327 recycle_miss 4 643047 mutex_miss 4 2093 evict_skip 4 96315269 evict_l2_cached 4 0 evict_l2_eligible 4 15802661888 evict_l2_ineligible 4 7292848128 hash_elements 4 267779 hash_elements_max 4 299201 hash_collisions 4 328936 hash_chains 4 71327 hash_chain_max 4 8 p 4 135680512 c 4 2147483648 c_min 4 293601280 c_max 4 2147483648 size 4 2306523480 hdr_size 4 90425088 data_size 4 1618198528 other_size 4 597899864 anon_size 4 3948544 anon_evict_data 4 0 anon_evict_metadata 4 0 mru_size 4 199776256 mru_evict_data 4 0 mru_evict_metadata 4 0 mru_ghost_size 4 1946481152 mru_ghost_evict_data 4 16462336 mru_ghost_evict_metadata 4 1930018816 mfu_size 4 1414473728 mfu_evict_data 4 0 mfu_evict_metadata 4 0 mfu_ghost_size 4 200998912 mfu_ghost_evict_data 4 0 mfu_ghost_evict_metadata 4 200998912 l2_hits 4 0 l2_misses 4 0 l2_feeds 4 0 l2_rw_clash 4 0 l2_read_bytes 4 0 l2_write_bytes 4 0 l2_writes_sent 4 0 l2_writes_done 4 0 l2_writes_error 4 0 l2_writes_hdr_miss 4 0 l2_evict_lock_retry 4 0 l2_evict_reading 4 0 l2_free_on_write 4 0 l2_abort_lowmem 4 0 l2_cksum_bad 4 0 l2_io_error 4 0 l2_size 4 0 l2_hdr_size 4 0 memory_throttle_count 4 0 memory_direct_count 4 0 memory_indirect_count 4 0 arc_no_grow 4 0 arc_tempreserve 4 0 arc_loaned_bytes 4 0 arc_prune 4 1992 arc_meta_used 4 2306523480 arc_meta_limit 4 536870912 arc_meta_max 4 2657959776
I caught it before it the OS threatened to lock up and SIGSTOP'd all my rsync processes. While it was paused the arc 'size' was slowly dropping and I resumed once it got below 1.6G. Right now the size is 2,718,915,552 and still climbing unless I stop it again.
@behlendorf Your suggestions helped me make it through the nightly rsync, and everything seems to be working fine.
@ryao Due to this, I'm afraid I do not currently have any dmesg output for you (yet). If something goes wrong again, I'll provide it, of course.
@jspiros That's good news. The hope is that going forward some of Richards VM work will remove the need for this sort of tuning but we're not quite there yet.
I (partially?) retract my previous post. I left the rsync running so it would presumably ruin the system, but it didn't die. That was some hours ago and the ARC is now restraining itself at about 570-580 megabytes. I'm turning up the heat a bit to see if I can coerce a useful stack dump out of it or find out if it properly behaves now.
Edit: after turning up the pressure the ARC did eventually grow to the breaking point. It seems to fare better but still cracks under intense pressure.
Okay, well, this morning was doing alright until just now. It probably doesn't help that I decided to scrub my pool, which will take about three days (18TB in [usable] capacity), overlapping with the nightly rsync.
Here's what dmesg is looking like right now.
[161160.692017] INFO: task kswapd0:33 blocked for more than 120 seconds.
[161160.692055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692093] kswapd0 D ffff88042f388240 0 33 2 0x00000000
[161160.692098] ffff88042f388240 0000000000000046 ffff8801a8c8b600 ffff88042df12830
[161160.692103] 0000000000013540 ffff88042c3affd8 ffff88042c3affd8 ffff88042f388240
[161160.692108] ffff88042d645800 ffffffffa0e81aa4 00000000000080d0 ffff880412b89c20
[161160.692113] Call Trace:
[161160.692158] [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.692169] [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.692172] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.692175] [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.692178] [<ffffffff810ea5e0>] ? kfree+0x5b/0x6c
[161160.692194] [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.692212] [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.692216] [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.692218] [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.692221] [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.692224] [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.692227] [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.692229] [<ffffffff810c3cea>] ? balance_pgdat+0x283/0x4b7
[161160.692232] [<ffffffff810c41e8>] ? kswapd+0x2ca/0x2e6
[161160.692235] [<ffffffff8105f533>] ? add_wait_queue+0x3c/0x3c
[161160.692237] [<ffffffff810c3f1e>] ? balance_pgdat+0x4b7/0x4b7
[161160.692239] [<ffffffff8105eeed>] ? kthread+0x76/0x7e
[161160.692243] [<ffffffff81350534>] ? kernel_thread_helper+0x4/0x10
[161160.692245] [<ffffffff8105ee77>] ? kthread_worker_fn+0x139/0x139
[161160.692248] [<ffffffff81350530>] ? gs_change+0x13/0x13
[161160.692258] INFO: task arc_adapt:954 blocked for more than 120 seconds.
[161160.692281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692308] arc_adapt D ffff88042bf18140 0 954 2 0x00000000
[161160.692311] ffff88042bf18140 0000000000000046 ffff880300000001 ffff88041b5690a0
[161160.692315] 0000000000013540 ffff88040ed4dfd8 ffff88040ed4dfd8 ffff88042bf18140
[161160.692318] ffff88042d645800 00000001a0e81aa4 ffff880363073680 ffff880412b89c20
[161160.692321] Call Trace:
[161160.692324] [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.692326] [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.692345] [<ffffffffa0e914c2>] ? txg_rele_to_sync+0x48/0x55 [zfs]
[161160.692361] [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.692377] [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.692381] [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.692383] [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.692385] [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.692388] [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.692404] [<ffffffffa0eb47fa>] ? zfs_sb_prune+0x76/0x9a [zfs]
[161160.692420] [<ffffffffa0ecab51>] ? zpl_inode_alloc+0x66/0x66 [zfs]
[161160.692434] [<ffffffffa0ecab6a>] ? zpl_prune_sb+0x19/0x1e [zfs]
[161160.692438] [<ffffffff810fab4d>] ? iterate_supers_type+0x61/0xc2
[161160.692452] [<ffffffffa0ecab6f>] ? zpl_prune_sb+0x1e/0x1e [zfs]
[161160.692467] [<ffffffffa0ecab9d>] ? zpl_prune_sbs+0x2e/0x39 [zfs]
[161160.692479] [<ffffffffa0e517a3>] ? arc_adjust_meta+0x101/0x171 [zfs]
[161160.692490] [<ffffffffa0e51813>] ? arc_adjust_meta+0x171/0x171 [zfs]
[161160.692503] [<ffffffffa0e51864>] ? arc_adapt_thread+0x51/0xb4 [zfs]
[161160.692510] [<ffffffffa0440848>] ? thread_generic_wrapper+0x6a/0x75 [spl]
[161160.692517] [<ffffffffa04407de>] ? __thread_create+0x2be/0x2be [spl]
[161160.692519] [<ffffffff8105eeed>] ? kthread+0x76/0x7e
[161160.692522] [<ffffffff81350534>] ? kernel_thread_helper+0x4/0x10
[161160.692525] [<ffffffff8105ee77>] ? kthread_worker_fn+0x139/0x139
[161160.692527] [<ffffffff81350530>] ? gs_change+0x13/0x13
[161160.692546] INFO: task txg_quiesce:1178 blocked for more than 120 seconds.
[161160.692570] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692597] txg_quiesce D ffff88043fc13540 0 1178 2 0x00000000
[161160.692599] ffff8804161a7020 0000000000000046 ffffffff00000000 ffffffff8160d020
[161160.692603] 0000000000013540 ffff88041a32dfd8 ffff88041a32dfd8 ffff8804161a7020
[161160.692606] 0000000000000202 0000000181349424 ffffc900066d12a8 ffffc900066d12a0
[161160.692609] Call Trace:
[161160.692616] [<ffffffffa0445ec6>] ? cv_wait_common+0xb8/0x141 [spl]
[161160.692619] [<ffffffff8105f533>] ? add_wait_queue+0x3c/0x3c
[161160.692638] [<ffffffffa0e92330>] ? txg_quiesce_thread+0x202/0x2ae [zfs]
[161160.692659] [<ffffffffa0e9212e>] ? txg_sync_thread+0x3c6/0x3c6 [zfs]
[161160.692666] [<ffffffffa0440848>] ? thread_generic_wrapper+0x6a/0x75 [spl]
[161160.692672] [<ffffffffa04407de>] ? __thread_create+0x2be/0x2be [spl]
[161160.692675] [<ffffffff8105eeed>] ? kthread+0x76/0x7e
[161160.692677] [<ffffffff81350534>] ? kernel_thread_helper+0x4/0x10
[161160.692680] [<ffffffff8105ee77>] ? kthread_worker_fn+0x139/0x139
[161160.692682] [<ffffffff81350530>] ? gs_change+0x13/0x13
[161160.692693] INFO: task avahi-daemon:3937 blocked for more than 120 seconds.
[161160.692718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692745] avahi-daemon D ffff88043fc13540 0 3937 1 0x00000000
[161160.692748] ffff8804247827b0 0000000000000082 ffff88033734d440 ffffffff8160d020
[161160.692751] 0000000000013540 ffff8804236f9fd8 ffff8804236f9fd8 ffff8804247827b0
[161160.692754] ffff88042d645800 ffffffffa0e81aa4 ffff88033734d440 ffff880412b89c20
[161160.692757] Call Trace:
[161160.692777] [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.692780] [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.692782] [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.692801] [<ffffffffa0e914c2>] ? txg_rele_to_sync+0x48/0x55 [zfs]
[161160.692816] [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.692832] [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.692836] [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.692838] [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.692840] [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.692843] [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.692845] [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.692848] [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.692850] [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.692853] [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.692856] [<ffffffff810e3dec>] ? alloc_pages_current+0xb2/0xcc
[161160.692859] [<ffffffff810b7631>] ? __get_free_pages+0x6/0x42
[161160.692862] [<ffffffff81106ebd>] ? __pollwait+0x56/0xce
[161160.692864] [<ffffffff810ff741>] ? pipe_poll+0x2a/0x7b
[161160.692866] [<ffffffff81107b8f>] ? do_sys_poll+0x1dc/0x347
[161160.692869] [<ffffffff81106e67>] ? poll_freewait+0x97/0x97
[161160.692871] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692873] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692876] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692878] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692880] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692883] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692885] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692887] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692889] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.692892] [<ffffffff8104b197>] ? timespec_add_safe+0x32/0x63
[161160.692895] [<ffffffff81107d85>] ? sys_poll+0x43/0xab
[161160.692897] [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
[161160.692917] INFO: task ntpd:4648 blocked for more than 120 seconds.
[161160.692939] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.692966] ntpd D ffff88041d332ee0 0 4648 1 0x00000004
[161160.692969] ffff88041d332ee0 0000000000000082 ffff880115471cc0 ffff88014956a080
[161160.692972] 0000000000013540 ffff880426907fd8 ffff880426907fd8 ffff88041d332ee0
[161160.692976] ffff88042d645800 ffffffffa0e81aa4 ffff880115471cc0 ffff880412b89c20
[161160.692979] Call Trace:
[161160.692998] [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.693001] [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.693003] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.693006] [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.693009] [<ffffffff8107066d>] ? arch_local_irq_save+0x11/0x17
[161160.693024] [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.693041] [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.693045] [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.693047] [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.693049] [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.693052] [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.693054] [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.693056] [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.693059] [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.693061] [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.693064] [<ffffffff810e3dec>] ? alloc_pages_current+0xb2/0xcc
[161160.693066] [<ffffffff810b88a9>] ? get_pageblock_flags_group+0x3a/0x6e
[161160.693069] [<ffffffff810b7631>] ? __get_free_pages+0x6/0x42
[161160.693071] [<ffffffff81106ebd>] ? __pollwait+0x56/0xce
[161160.693074] [<ffffffff81283dff>] ? datagram_poll+0x1b/0xd0
[161160.693077] [<ffffffff812d0741>] ? udp_poll+0xf/0x42
[161160.693079] [<ffffffff81107394>] ? do_select+0x2b0/0x46a
[161160.693082] [<ffffffff81106e67>] ? poll_freewait+0x97/0x97
[161160.693084] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693086] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693089] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693091] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693093] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693096] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693098] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693100] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693102] [<ffffffff81106f35>] ? __pollwait+0xce/0xce
[161160.693105] [<ffffffff811076c3>] ? core_sys_select+0x175/0x21a
[161160.693107] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.693110] [<ffffffff81055c48>] ? set_current_blocked+0x2d/0x43
[161160.693113] [<ffffffff8100e342>] ? do_signal+0x547/0x610
[161160.693116] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.693118] [<ffffffff8134810f>] ? _cond_resched+0x7/0x1c
[161160.693121] [<ffffffff810159d8>] ? check_for_xstate+0x27/0x7c
[161160.693123] [<ffffffff81015cc5>] ? restore_i387_xstate+0xaa/0x15d
[161160.693125] [<ffffffff810566b3>] ? do_sigaltstack+0xaa/0x13e
[161160.693128] [<ffffffff811077f0>] ? sys_select+0x88/0xad
[161160.693130] [<ffffffff8134e8ec>] ? stub_rt_sigreturn+0x6c/0xc0
[161160.693133] [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
[161160.693135] INFO: task radvd:4719 blocked for more than 120 seconds.
[161160.693157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.693183] radvd D ffff88041d389610 0 4719 1 0x00000000
[161160.693186] ffff88041d389610 0000000000000082 ffff880288008f00 ffff880427534080
[161160.693189] 0000000000013540 ffff8804201d3fd8 ffff8804201d3fd8 ffff88041d389610
[161160.693193] ffff88042d645800 ffffffffa0e81aa4 ffff880288008f00 ffff880412b89c20
[161160.693196] Call Trace:
[161160.693214] [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.693217] [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.693220] [<ffffffff810363c7>] ? should_resched+0x5/0x23
[161160.693222] [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.693224] [<ffffffff810ea5e0>] ? kfree+0x5b/0x6c
[161160.693240] [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.693257] [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.693261] [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.693263] [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.693265] [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.693268] [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.693270] [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.693272] [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.693275] [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.693277] [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.693281] [<ffffffff8113e179>] ? proc_alloc_inode+0x96/0x96
[161160.693284] [<ffffffff810e9b55>] ? kmem_getpages+0x4c/0xd9
[161160.693287] [<ffffffff810eaef5>] ? fallback_alloc+0x123/0x1c2
[161160.693289] [<ffffffff810eb4c8>] ? kmem_cache_alloc_trace+0x79/0xce
[161160.693292] [<ffffffff8111160b>] ? seq_read+0xb5/0x34c
[161160.693294] [<ffffffff8113e6f3>] ? proc_reg_read+0x78/0x91
[161160.693297] [<ffffffff810f9321>] ? vfs_read+0x9f/0xe6
[161160.693300] [<ffffffff810d5933>] ? sys_mmap_pgoff+0x125/0x160
[161160.693302] [<ffffffff810f93ad>] ? sys_read+0x45/0x6b
[161160.693304] [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
[161160.693308] INFO: task dropbox:4920 blocked for more than 120 seconds.
[161160.693331] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.693357] dropbox D ffff88043fd13540 0 4920 1 0x00000000
[161160.693360] ffff880427534080 0000000000000086 ffff88018fc43600 ffff88042d074730
[161160.693363] 0000000000013540 ffff8803f479bfd8 ffff8803f479bfd8 ffff880427534080
[161160.693366] ffff88042d645800 ffffffffa0e81aa4 ffff88018fc43600 ffff880412b89c20
[161160.693369] Call Trace:
[161160.693388] [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.693391] [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.693394] [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.693413] [<ffffffffa0e914c2>] ? txg_rele_to_sync+0x48/0x55 [zfs]
[161160.693428] [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.693444] [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.693448] [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.693450] [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.693452] [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.693455] [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.693457] [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.693459] [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.693462] [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.693464] [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.693467] [<ffffffff810e5123>] ? alloc_pages_vma+0xe2/0xe7
[161160.693470] [<ffffffff810cff31>] ? handle_pte_fault+0x165/0x79f
[161160.693472] [<ffffffff810cd641>] ? pte_offset_kernel+0x16/0x35
[161160.693475] [<ffffffff8134c3c3>] ? do_page_fault+0x312/0x337
[161160.693477] [<ffffffff81037e30>] ? set_next_entity+0x32/0x55
[161160.693480] [<ffffffff8100d755>] ? __switch_to+0x1e5/0x258
[161160.693482] [<ffffffff810397ec>] ? finish_task_switch+0x88/0xba
[161160.693485] [<ffffffff813480b4>] ? __schedule+0x5ac/0x5c3
[161160.693487] [<ffffffff813499b5>] ? page_fault+0x25/0x30
[161160.693492] INFO: task irssi:7797 blocked for more than 120 seconds.
[161160.693514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.693540] irssi D ffff88043fc13540 0 7797 4834 0x00000000
[161160.693543] ffff8804269de300 0000000000000082 ffff880157971800 ffff88042d075510
[161160.693546] 0000000000013540 ffff880366853fd8 ffff880366853fd8 ffff8804269de300
[161160.693549] ffff88042d645800 ffffffffa0e81aa4 ffff880157971800 ffff880412b89c20
[161160.693553] Call Trace:
[161160.693572] [<ffffffffa0e81aa4>] ? sa_attr_register_sync.isra.6+0x19d/0x1ac [zfs]
[161160.693575] [<ffffffff81348a0f>] ? __mutex_lock_common.isra.5+0xff/0x164
[161160.693577] [<ffffffff813488fd>] ? mutex_lock+0x1a/0x2d
[161160.693596] [<ffffffffa0e914c2>] ? txg_rele_to_sync+0x48/0x55 [zfs]
[161160.693611] [<ffffffffa0ec0518>] ? zfs_zinactive+0x43/0xa7 [zfs]
[161160.693627] [<ffffffffa0eb72e6>] ? zfs_inactive+0x102/0x139 [zfs]
[161160.693631] [<ffffffff8110bb48>] ? evict+0x96/0x148
[161160.693633] [<ffffffff8110bdb2>] ? dispose_list+0x27/0x31
[161160.693636] [<ffffffff8110c8f2>] ? prune_icache_sb+0x24f/0x25e
[161160.693638] [<ffffffff810fb5cb>] ? prune_super+0xd5/0x13f
[161160.693640] [<ffffffff810c13f5>] ? shrink_slab+0x18f/0x24d
[161160.693643] [<ffffffff810c354d>] ? do_try_to_free_pages+0x25f/0x3de
[161160.693645] [<ffffffff810c3937>] ? try_to_free_pages+0xa9/0xe9
[161160.693648] [<ffffffff810baa2b>] ? __alloc_pages_nodemask+0x4a9/0x726
[161160.693651] [<ffffffff810e9b55>] ? kmem_getpages+0x4c/0xd9
[161160.693653] [<ffffffff810eaef5>] ? fallback_alloc+0x123/0x1c2
[161160.693656] [<ffffffff810ebb5c>] ? kmem_cache_alloc+0x73/0xea
[161160.693658] [<ffffffff811254c6>] ? fsnotify_create_event+0x37/0x141
[161160.693661] [<ffffffff81124bb5>] ? send_to_group.isra.1+0x130/0x177
[161160.693663] [<ffffffff81124d56>] ? fsnotify+0x15a/0x20c
[161160.693665] [<ffffffff81124f9f>] ? __fsnotify_parent+0x8a/0xa3
[161160.693668] [<ffffffff810f8a36>] ? fsnotify_modify+0x3b/0x5c
[161160.693670] [<ffffffff810f9254>] ? vfs_write+0xbb/0xe9
[161160.693672] [<ffffffff810f9418>] ? sys_write+0x45/0x6b
[161160.693674] [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
[161160.693679] INFO: task java:598 blocked for more than 120 seconds.
[161160.693700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[161160.693727] java D ffff88043fc93540 0 598 1 0x00000000
[161160.693730] ffff880425297850 0000000000000082 0000000000000000 ffff88042f1140c0
[161160.693733] 0000000000013540 ffff8801062bdfd8 ffff8801062bdfd8 ffff880425297850
[161160.693736] 0000000000000001 00000001817a8dd0 ffffffff817a8770 ffff880369fe7be0
[161160.693739] Call Trace:
[161160.693742] [<ffffffff813493b7>] ? rwsem_down_failed_common+0xe0/0x114
[161160.693746] [<ffffffff811b1b93>] ? call_rwsem_down_write_failed+0x13/0x20
[161160.693748] [<ffffffff81348d24>] ? down_write+0x25/0x27
[161160.693751] [<ffffffff810d63dd>] ? sys_mprotect+0xc5/0x1fc
[161160.693753] [<ffffffff8134e3d2>] ? system_call_fastpath+0x16/0x1b
I will try to avoid rebooting the system, in case anyone has any suggestions that might resolve this problem online.
Update: Nevermind, I realized that with kswapd0 in deadlock it's highly unlikely that there's anything I could do to fix this problem without a reboot, so I rebooted. On an unrelated note, I'm happy to see that the scrub is continuing from where it left off.
While using ryao's "gentoo" branch of his own fork here on on github I killed the same machine again. Here's more data dumps.
Edit: must have been too much for github. Here's a pastebin of what I wanted: http://pastebin.com/303uhMen
After talking with ryao in IRC I've been asked to add a few points to this issue.
The vdev is a single device, /dev/sda, which is actually 13x drives in a hardware RAID-6 array. It can churn out pretty high IOPS for random reads which is the significant portion of the system's job - lots of rsync (as a sync target) instances all scanning what is tens of millions of files. It's also compressed with a ratio of about 1.7x so it grows pretty fast. (No Dedup)
The 'size' field from the kstats for the ARC, during heavy IO, just grows past the c and c_max fields, but it also seems to have some kind of background job that is draining the ARC at all times to around 512 megabytes. When completely idle, the size
just shrinks mysteriously. When under relatively light load, it may float in place but still stay under c
. It seems to stop shrinking at about 512 megabytes whether idle or not. I'd guess that under heavy load the background shrink just isn't keeping up with the foreground fill.
As a hack I can keep the system stable by SIGSTOP'ing all the rsync processes if the size
gets dangerously high (my breaking point seems to be around 3.3 billion bytes) and SIGCONT once it's fallen back down again.
In my most recent reports I've been setting sysctl vm.min_free_kbytes to 524288
After switching to vm.min_free_kbytes = 512M from 256M I've been able to survive the nightly rsyncs, however, now I'm running into exactly the same problem except with CrashPlan (a java application for remote backups) causing it instead of rsync. Exactly the same problem in that kswapd0 deadlocks followed by arc_reclaim and then everything else. I needed to get the system back online and neglected to copy the dmesg output, but I confirmed that it is essentially identical to the previous output from the rsync-caused problem. I am hesitant to turn CrashPlan back on until there's something new to try.
Still trying to collect useful information. I set the arc_max to 256 megabytes just for experimentation. It still deadlocked.
Stack traces and arcstats from kstat in pastebin: http://pastebin.com/cgKZwaSJ
Two sets of stack traces are provided, set about 2.5 hours apart. The first is from long after the system had hung - I don't know when that actually happened.
vm.min_free_kbytes = 67584 (bootup default) # free -m total used free shared buffers cached Mem: 7990 7835 155 0 52 111 -/+ buffers/cache: 7671 319 Swap: 4095 1 4094 # history | grep zfs.ko insmod zfs.ko zfs_arc_max=268435456 zfs_arc_min=134217728 zfs_arc_shrink_shift=4 zfs_vdev_max_pending=20 zfs_vdev_min_pending=8
I turned on debugging and turned off a bunch of compiler optimizations and locked it up again the same way. These stack traces look a lot nicer to decode.
After speaking with @DeHackEd on IRC about the cause of this problem, I tried limiting the queue depth (to 16 from 32) on the SATA drives that make up my pool using hdparm. I experienced success with this, in that I survived the night with both CrashPlan and the nightly rsyncs running. This isn't a long term fix, as it does slow everything down, but if someone else is in a situation like mine, it might help until this issue is solved.
The crashing/deadlocking side of the issue is apparently fixed by https://bugs.gentoo.org/show_bug.cgi?id=416685 (which I applied and cleaned up for kernel 3.0.x). Now it can run for over an hour with nothing controlling its execution speed. I'll update if anything goes wrong on me.
While running in this mode the ARC appears to behave itself better with regard to following the c
value. This is just a rough observation right now as the ARC size is unlimited and generally can't reach the maximum due to memory usage. That said I did witness c
shrink below c_max
and the size
was generally following the new c
value. This is just one brief observation though - more grinding is required.
Note: I am currently using ryao's 'gentoo' tree.
The memory management and swap improvements in issue #883 have now been merged in to master and should remove the need for the kernel vmalloc() patch. If you could verify these resolve your issue I think we'll be able to close this issue.
I can try installing it probably on Tuesday, but it may be a few days before the result is yay/nay.
No great rush, I'm not going anywhere. But it would be nice to know. :)
I'm heading to DjangoCon for a week, so I'm going to wait until I get back to give this a test. Last thing I need is my server dying when I'm not physically around to reboot it. That said, I'm very excited, and look forward to testing it out when I get back. :)
To be clear (this bug has wandered a bit), the changes merged just address the deadlock not any misbehavior by the ARC. Assuming the deadlock is resolved, I fully expect it will be, I'd suggest we close the bug and reopen a new one from any bad ARC behavior.
Since I haven't heard otherwise, I'm closing this issue because the deadlock has been fixed. If someone is able to recreate the issue with the master source we can reopen this bug, or file a new one.
# cat /proc/spl/kstat/zfs/arcstats
4 1 0x01 77 3696 3042886422925 64409337601438
name type data
hits 4 248393341
misses 4 75936476
demand_data_hits 4 105856
demand_data_misses 4 10155
demand_metadata_hits 4 234545007
demand_metadata_misses 4 40492914
prefetch_data_hits 4 0
prefetch_data_misses 4 11009
prefetch_metadata_hits 4 13742478
prefetch_metadata_misses 4 35422398
mru_hits 4 18658796
mru_ghost_hits 4 8980156
mfu_hits 4 217641924
mfu_ghost_hits 4 33122819
deleted 4 34384069
recycle_miss 4 52654888
mutex_miss 4 171299
evict_skip 4 28471305530
evict_l2_cached 4 769955428864
evict_l2_eligible 4 286991601152
evict_l2_ineligible 4 286747104256
hash_elements 4 2731819
hash_elements_max 4 2731819
hash_collisions 4 20413265
hash_chains 4 506626
hash_chain_max 4 18
p 4 340169571
c 4 5442450994
c_min 4 680306374
c_max 4 5442450994
size 4 7394144008
hdr_size 4 211937584
data_size 4 4567342592
other_size 4 2046281288
anon_size 4 3329536
anon_evict_data 4 0
anon_evict_metadata 4 0
mru_size 4 555188224
mru_evict_data 4 0
mru_evict_metadata 4 688128
mru_ghost_size 4 4887883776
mru_ghost_evict_data 4 0
mru_ghost_evict_metadata 4 4887883776
mfu_size 4 4008824832
mfu_evict_data 4 0
mfu_evict_metadata 4 2129920
mfu_ghost_size 4 554135040
mfu_ghost_evict_data 4 0
mfu_ghost_evict_metadata 4 554135040
l2_hits 4 30877561
l2_misses 4 45058896
l2_feeds 4 61336
l2_rw_clash 4 1219
l2_read_bytes 4 520022814720
l2_write_bytes 4 40076504064
l2_writes_sent 4 48373
l2_writes_done 4 48373
l2_writes_error 4 0
l2_writes_hdr_miss 4 75
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 287468
l2_abort_lowmem 4 11
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 37175568896
l2_hdr_size 4 602028576
memory_throttle_count 4 0
memory_direct_count 4 55
memory_indirect_count 4 1127
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 106172
arc_meta_used 4 7394144008
arc_meta_limit 4 1360612748
arc_meta_max 4 8175635800
ZFS git checkout fc24f7c887a040b6dc9f2a3dd3d5ae0c03a5d639 compiled with only 'configure --with-spl=/path/to/spl-build/'
SPL git checkout ac8ca67a88bdb8be08456785116a8824fba735df compiled with 'configure --enable-debug'
Kernel is RHEL 6.3 up-to-date (2.6.32-279.5.2.el6.x86_64)
The point being that we appear to have exceeded the arc c_max
and arc_meta_limit
? But no deadlocks correct?
Correct, in spite of going over my set limits it was/is stable. The backup cycle has now finished and the ARC has shrunk naturally on its own from much reduced activity (currently floating around 1300 megabytes while it does some last minute work).
It's only been one cycle but I currently have no crashes/hangs/etc. Stability testing comes tonight when I turn the pressure up a bit (more concurrent threads) but that's not directly relevant to this issue.
After a bit of playing around I'd come to a possible conclusion. When activity is low (load average ~15 or less... yeah that's low by my standards) the ARC seems to behave normally. When activity is higher (currently ~23) it will go over its limit.
# cat /proc/spl/kstat/zfs/arcstats 4 1 0x01 77 3696 20842511127 88312299340944 name type data p 4 340219235 c 4 5442450994 c_min 4 680306374 c_max 4 5442450994 size 4 6010999752 hdr_size 4 179171888 data_size 4 3746268672 other_size 4 1539191432 memory_throttle_count 4 0 memory_direct_count 4 5958 memory_indirect_count 4 12149 arc_no_grow 4 0 arc_tempreserve 4 0 arc_loaned_bytes 4 0 arc_prune 4 58588 arc_meta_used 4 5986955720 arc_meta_limit 4 1360612748 arc_meta_max 4 7889218664
arc_adapt D 0000000000000001 0 1671 2 0x00000080 ffff880410659c30 0000000000000046 ffff880410659bc0 ffffffffa04fbb74 ffff880410659bc0 ffff880024b4d800 0000000000000000 0000000000000001 ffff880420ac9098 ffff880410659fd8 000000000000fb88 ffff880420ac9098 Call Trace: [] ? arc_buf_remove_ref+0xd4/0x120 [zfs] [ ] ? dmu_buf_rele+0x30/0x40 [zfs] [ ] __mutex_lock_slowpath+0x13e/0x180 [ ] mutex_lock+0x2b/0x50 [ ] zfs_zinactive+0x7e/0x110 [zfs] [ ] zfs_inactive+0x7f/0x220 [zfs] [ ] zpl_clear_inode+0xe/0x10 [zfs] [ ] clear_inode+0xac/0x140 [ ] dispose_list+0x40/0x120 [ ] shrink_icache_memory+0x274/0x2e0 [ ] ? zpl_prune_sbs+0x0/0x70 [zfs] [ ] zpl_prune_sbs+0x57/0x70 [zfs] [ ] arc_adjust_meta+0x120/0x1e0 [zfs] [ ] ? arc_adapt_thread+0x0/0xd0 [zfs] [ ] ? arc_adapt_thread+0x0/0xd0 [zfs] [ ] arc_adapt_thread+0x6a/0xd0 [zfs] [ ] thread_generic_wrapper+0x71/0xd0 [spl] [ ] ? thread_generic_wrapper+0x0/0xd0 [spl] [ ] kthread+0x96/0xa0 [ ] child_rip+0xa/0x20 [ ] ? kthread+0x0/0xa0 [ ] ? child_rip+0x0/0x20 txg_sync D 0000000000000000 0 1807 2 0x00000080 ffff8804000b5ba0 0000000000000046 000000003e5c2a80 0000000000000001 ffff88023e5c2700 0000000000000082 ffff8804000b5b60 ffff880420d67d48 ffff880410769af8 ffff8804000b5fd8 000000000000fb88 ffff880410769af8 Call Trace: [ ] ? prepare_to_wait_exclusive+0x4e/0x80 [ ] cv_wait_common+0xe5/0x3d0 [spl] [ ] ? zio_execute+0x0/0xf0 [zfs] [ ] ? autoremove_wake_function+0x0/0x40 [ ] __cv_wait+0x13/0x20 [spl] [ ] zio_wait+0xeb/0x160 [zfs] [ ] dsl_pool_sync+0x2a7/0x480 [zfs] [ ] spa_sync+0x397/0x9a0 [zfs] [ ] txg_sync_thread+0x2c1/0x490 [zfs] [ ] ? set_user_nice+0xc9/0x130 [ ] ? txg_sync_thread+0x0/0x490 [zfs] [ ] thread_generic_wrapper+0x71/0xd0 [spl] [ ] ? thread_generic_wrapper+0x0/0xd0 [spl] [ ] kthread+0x96/0xa0 [ ] child_rip+0xa/0x20 [ ] ? kthread+0x0/0xa0 [ ] ? child_rip+0x0/0x20
More stack traces available if you want them.
Currently active jobs: 2 pools (independent disks involved) each scrubbing, and a send/receive in progress across them. The bigger pool (10 TB in a single disk) is running a small number of rsyncs (~5) and a dozen or so threads essentially doing the job of du
.
Reopening issue. But for clarity, the deadlock has been resolved and the remaining issue here is that fact that the ARC limit isn't being honored.
I am experiencing similar problems. Under heavy load (~ 60 rsyncs) ARC grows over limit. arc_adapt and kswapd1 load CPU very intensive The system freezes for a short time and performance degradation occure. In the end, the system hangs. Tested on the latest version from git.
load average: 49.08, 48.50, 43.52
c_max 4 13421772800 size 4 10599128312 arc_meta_used 4 10474886904 arc_meta_limit 4 3355443200
total used free shared buffers cached
Mem: 24736592 24458768 277824 0 2588 45360 -/+ buffers/cache: 24410820 325772 Swap: 7811064 304 7810760
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24718 root 0 -20 0 0 0 R 100 0.0 21:24.07 arc_adapt 144 root 20 0 0 0 0 R 75 0.0 0:17.07 kswapd1 143 root 20 0 0 0 0 R 67 0.0 0:30.36 kswapd0
0x7a [zfs]
[4879179.203765] [
While running a lot of rsync instances (~45 at once... it makes sense in context) I found my ARC expanding beyond the set limit of 2 gigabytes until ZFS deadlocked - almost all rsync processes hung in the D state and some kernel threads as well. A reboot was necessary. It only took about 5-10 minutes of this kind of pressure to break.
Not too long after all ZFS-involved processes froze up.
Stack dumps for some hung processes:
Machine specs: Single-socket quad-core Xeon 8 Gigs of RAM
SPL version: b29012b99994ece46019b664d67dace29e5c2586 ZFS version: 409dc1a570a836737b2a5bb43658cdde703c935e Kernel version: 3.0.28 vanilla custom build ZPool version 26 (originally built/run by zfs-fuse)
I've also tried using the module/zfs/arc.c from https://github.com/zfsonlinux/zfs/pull/669 for testing and reducing the ARC size. RAM usage still exceeds the limits set.
Nevertheless it's been running for a few hours now reliably.
(Edit: I also raised vm.min_free_kbytes from its default up to 262144 as part of a shotgun attempt to make this more stable.)