openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.43k stars 1.73k forks source link

Deadlock under load #1365

Closed jstillwa closed 8 years ago

jstillwa commented 11 years ago

Running a fairly large server compared to what I've seen posted so may be experiencing issues that don't pop up until you get to a fairly large setup.

128GB Ram (ARC size is 64GB) 2 x Intel E5-2640 2 x LSI SAS2308 Intel 10Gb Ethernet Zpool 80 x 3TB Zpool divided into 8 x RAIDZ2 (8+2 parity) + 2x mirrored SSD Zil + 2 SDD Cache + 4 Spares

Under heavy load, 40+, system will seemingly randomly lock up. ZFS is unable to process commands. All disk activity to the pool stops. Kernel flags timeout errors. Rebooting will sometimes work if it manages to unmount the busy zpools. Running latest branch and zpool version.

Otherwise, works great. Mar 20 01:51:49 servername kernel: INFO: task java:17341 blocked for more than 120 seconds. Mar 20 01:51:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 01:51:49 servername kernel: java D 000000000000000b 0 17341 1 0x00000080 Mar 20 01:51:49 servername kernel: ffff880e3aefda78 0000000000000086 ffff881fb9162000 ffff8800a53074c0 Mar 20 01:51:49 servername kernel: ffff88161e8b3818 ffff880e3aefdbc4 0000000000000000 0000000000000001 Mar 20 01:51:49 servername kernel: ffff880e3aef65f8 ffff880e3aefdfd8 000000000000fb88 ffff880e3aef65f8 Mar 20 01:51:49 servername kernel: Call Trace: Mar 20 01:51:49 servername kernel: [] ? prepare_to_wait+0x4e/0x80 Mar 20 01:51:49 servername kernel: [] wait_on_freeing_inode+0x98/0xc0 Mar 20 01:51:49 servername kernel: [] ? wake_bit_function+0x0/0x50 Mar 20 01:51:49 servername kernel: [] ? unlock_buffer+0x17/0x20 Mar 20 01:51:49 servername kernel: [] find_inode_fast+0x58/0x80 Mar 20 01:51:49 servername kernel: [] ifind_fast+0x3c/0xb0 Mar 20 01:51:49 servername kernel: [] iget_locked+0x49/0x170 Mar 20 01:51:49 servername kernel: [] ext4_iget+0x37/0x7d0 [ext4] Mar 20 01:51:49 servername kernel: [] ? dput+0x9a/0x150 Mar 20 01:51:49 servername kernel: [] ext4_lookup+0xa5/0x140 [ext4] Mar 20 01:51:49 servername kernel: [] do_lookup+0x1a5/0x230 Mar 20 01:51:49 servername kernel: [] link_path_walk+0x734/0x1030 Mar 20 01:51:49 servername kernel: [] path_walk+0x6a/0xe0 Mar 20 01:51:49 servername kernel: [] do_path_lookup+0x5b/0xa0 Mar 20 01:51:49 servername kernel: [] user_path_at+0x57/0xa0 Mar 20 01:51:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80 Mar 20 01:51:49 servername kernel: [] ? cp_new_stat+0xe4/0x100 Mar 20 01:51:49 servername kernel: [] vfs_fstatat+0x3c/0x80 Mar 20 01:51:49 servername kernel: [] vfs_stat+0x1b/0x20 Mar 20 01:51:49 servername kernel: [] sys_newstat+0x24/0x50 Mar 20 01:51:49 servername kernel: [] ? audit_syscall_entry+0x1d7/0x200 Mar 20 01:51:49 servername kernel: [] ? audit_syscall_exit+0x265/0x290 Mar 20 01:51:49 servername kernel: [] system_call_fastpath+0x16/0x1b Mar 20 01:53:49 servername kernel: INFO: task arc_adapt:8730 blocked for more than 120 seconds. Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 01:53:49 servername kernel: arc_adapt D 0000000000000008 0 8730 2 0x00000000 Mar 20 01:53:49 servername kernel: ffff881fd585fb00 0000000000000046 0000000000000400 a2d25ae6bc4ce414 Mar 20 01:53:49 servername kernel: 00000000002baba5 ffff881fd58d6000 ffff8800616f6680 0000000000000000 Mar 20 01:53:49 servername kernel: ffff8820140ef058 ffff881fd585ffd8 000000000000fb88 ffff8820140ef058 Mar 20 01:53:49 servername kernel: Call Trace: Mar 20 01:53:49 servername kernel: [] ? mutex_lock_slowpath+0x70/0x180 Mar 20 01:53:49 servername kernel: [] mutex_lock_slowpath+0x13e/0x180 Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50 Mar 20 01:53:49 servername kernel: [] arc_buf_remove_ref+0xaf/0x120 [zfs] Mar 20 01:53:49 servername kernel: [] dbuf_rele_and_unlock+0x10f/0x210 [zfs] Mar 20 01:53:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs] Mar 20 01:53:49 servername kernel: [] dnode_rele+0x80/0x90 [zfs] Mar 20 01:53:49 servername kernel: [] dbuf_rele_and_unlock+0x1b0/0x210 [zfs] Mar 20 01:53:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs] Mar 20 01:53:49 servername kernel: [] sa_handle_destroy+0x68/0xa0 [zfs] Mar 20 01:53:49 servername kernel: [] zfs_znode_dmu_fini+0x1d/0x30 [zfs] Mar 20 01:53:49 servername kernel: [] zfs_zinactive+0xd0/0x110 [zfs] Mar 20 01:53:49 servername kernel: [] zfs_inactive+0x7f/0x220 [zfs] Mar 20 01:53:49 servername kernel: [] zpl_clear_inode+0xe/0x10 [zfs] Mar 20 01:53:49 servername kernel: [] clear_inode+0xac/0x140 Mar 20 01:53:49 servername kernel: [] dispose_list+0x40/0x120 Mar 20 01:53:49 servername kernel: [] shrink_icache_memory+0x274/0x2e0 Mar 20 01:53:49 servername kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs] Mar 20 01:53:49 servername kernel: [] zpl_prune_sbs+0x53/0x60 [zfs] Mar 20 01:53:49 servername kernel: [] arc_adjust_meta+0x120/0x1e0 [zfs] Mar 20 01:53:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs] Mar 20 01:53:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs] Mar 20 01:53:49 servername kernel: [] arc_adapt_thread+0x6a/0xd0 [zfs] Mar 20 01:53:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl] Mar 20 01:53:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl] Mar 20 01:53:49 servername kernel: [] kthread+0x96/0xa0 Mar 20 01:53:49 servername kernel: [] child_rip+0xa/0x20 Mar 20 01:53:49 servername kernel: [] ? kthread+0x0/0xa0 Mar 20 01:53:49 servername kernel: [] ? child_rip+0x0/0x20 Mar 20 01:53:49 servername kernel: INFO: task txg_sync:9178 blocked for more than 120 seconds. Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 01:53:49 servername kernel: txg_sync D 000000000000000e 0 9178 2 0x00000000 Mar 20 01:53:49 servername kernel: ffff880ed59b1960 0000000000000046 0000000511782aa0 0000000000000001 Mar 20 01:53:49 servername kernel: 000000004b7342b0 0000000000000082 ffff882019de8060 ffff882017c37718 Mar 20 01:53:49 servername kernel: ffff881011783058 ffff880ed59b1fd8 000000000000fb88 ffff881011783058 Mar 20 01:53:49 servername kernel: Call Trace: Mar 20 01:53:49 servername kernel: [] ? wake_up_common+0x59/0x90 Mar 20 01:53:49 servername kernel: [] mutex_lock_slowpath+0x13e/0x180 Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50 Mar 20 01:53:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs] Mar 20 01:53:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs] Mar 20 01:53:49 servername kernel: [] ? kmalloc+0x20c/0x220 Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50 Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50 Mar 20 01:53:49 servername kernel: [] arc_read+0x82/0x170 [zfs] Mar 20 01:53:49 servername kernel: [] dsl_read+0x31/0x40 [zfs] Mar 20 01:53:49 servername kernel: [] dbuf_prefetch+0x1d1/0x2b0 [zfs] Mar 20 01:53:49 servername kernel: [] dmu_prefetch+0x200/0x220 [zfs] Mar 20 01:53:49 servername kernel: [] metaslab_sync_reassess+0x14f/0x180 [zfs] Mar 20 01:53:49 servername kernel: [] vdev_sync_done+0x6c/0x80 [zfs] Mar 20 01:53:49 servername kernel: [] spa_sync+0x54b/0xa00 [zfs] Mar 20 01:53:49 servername kernel: [] ? read_tsc+0x9/0x20 Mar 20 01:53:49 servername kernel: [] txg_sync_thread+0x307/0x590 [zfs] Mar 20 01:53:49 servername kernel: [] ? set_user_nice+0xc9/0x130 Mar 20 01:53:49 servername kernel: [] ? txg_sync_thread+0x0/0x590 [zfs] Mar 20 01:53:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl] Mar 20 01:53:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl] Mar 20 01:53:49 servername kernel: [] kthread+0x96/0xa0 Mar 20 01:53:49 servername kernel: [] child_rip+0xa/0x20 Mar 20 01:53:49 servername kernel: [] ? kthread+0x0/0xa0 Mar 20 01:53:49 servername kernel: [] ? child_rip+0x0/0x20 Mar 20 01:53:49 servername kernel: INFO: task java:17341 blocked for more than 120 seconds. Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 01:53:49 servername kernel: java D 000000000000000b 0 17341 1 0x00000080 Mar 20 01:53:49 servername kernel: ffff880e3aefda78 0000000000000086 ffff881fb9162000 ffff8800a53074c0 Mar 20 01:53:49 servername kernel: ffff88161e8b3818 ffff880e3aefdbc4 0000000000000000 0000000000000001 Mar 20 01:53:49 servername kernel: ffff880e3aef65f8 ffff880e3aefdfd8 000000000000fb88 ffff880e3aef65f8 Mar 20 01:53:49 servername kernel: Call Trace: Mar 20 01:53:49 servername kernel: [] ? prepare_to_wait+0x4e/0x80 Mar 20 01:53:49 servername kernel: [] wait_on_freeing_inode+0x98/0xc0 Mar 20 01:53:49 servername kernel: [] ? wake_bit_function+0x0/0x50 Mar 20 01:53:49 servername kernel: [] ? unlock_buffer+0x17/0x20 Mar 20 01:53:49 servername kernel: [] find_inode_fast+0x58/0x80 Mar 20 01:53:49 servername kernel: [] ifind_fast+0x3c/0xb0 Mar 20 01:53:49 servername kernel: [] iget_locked+0x49/0x170 Mar 20 01:53:49 servername kernel: [] ext4_iget+0x37/0x7d0 [ext4] Mar 20 01:53:49 servername kernel: [] ? dput+0x9a/0x150 Mar 20 01:53:49 servername kernel: [] ext4_lookup+0xa5/0x140 [ext4] Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230 Mar 20 01:53:49 servername kernel: [] link_path_walk+0x734/0x1030 Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0 Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0 Mar 20 01:53:49 servername kernel: [] user_path_at+0x57/0xa0 Mar 20 01:53:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80 Mar 20 01:53:49 servername kernel: [] ? cp_new_stat+0xe4/0x100 Mar 20 01:53:49 servername kernel: [] vfs_fstatat+0x3c/0x80 Mar 20 01:53:49 servername kernel: [] vfs_stat+0x1b/0x20 Mar 20 01:53:49 servername kernel: [] sys_newstat+0x24/0x50 Mar 20 01:53:49 servername kernel: [] ? audit_syscall_entry+0x1d7/0x200 Mar 20 01:53:49 servername kernel: [] ? audit_syscall_exit+0x265/0x290 Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b Mar 20 01:53:49 servername kernel: INFO: task BackupPC_dump:58770 blocked for more than 120 seconds. Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 01:53:49 servername kernel: BackupPC_dump D 0000000000000001 0 58770 55056 0x00000080 Mar 20 01:53:49 servername kernel: ffff880faed1f918 0000000000000086 0000000000000000 ffff8806a3164800 Mar 20 01:53:49 servername kernel: 0000000000000140 0000000000000000 ffff880faed1f8d8 0000000000000246 Mar 20 01:53:49 servername kernel: ffff8805fb71b058 ffff880faed1ffd8 000000000000fb88 ffff8805fb71b058 Mar 20 01:53:49 servername kernel: Call Trace: Mar 20 01:53:49 servername kernel: [] mutex_lock_slowpath+0x13e/0x180 Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50 Mar 20 01:53:49 servername kernel: [] zfs_zget+0xb5/0x1d0 [zfs] Mar 20 01:53:49 servername kernel: [] ? kmem_alloc_debug+0x13a/0x4c0 [spl] Mar 20 01:53:49 servername kernel: [] zfs_dirent_lock+0x47c/0x540 [zfs] Mar 20 01:53:49 servername kernel: [] zfs_dirlook+0x8b/0x270 [zfs] Mar 20 01:53:49 servername kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs] Mar 20 01:53:49 servername kernel: [] ? down_read+0x16/0x30 Mar 20 01:53:49 servername kernel: [] zfs_lookup+0x2fe/0x350 [zfs] Mar 20 01:53:49 servername kernel: [] zpl_lookup+0x57/0xe0 [zfs] Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230 Mar 20 01:53:49 servername kernel: [] link_path_walk+0x734/0x1030 Mar 20 01:53:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl] Mar 20 01:53:49 servername kernel: [] ? mutex_lock+0x1e/0x50 Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0 Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0 Mar 20 01:53:49 servername kernel: [] ? get_empty_filp+0xa0/0x180 Mar 20 01:53:49 servername kernel: [] do_filp_open+0xfb/0xd60 Mar 20 01:53:49 servername kernel: [] ? zfs_getattr_fast+0xdd/0x160 [zfs] Mar 20 01:53:49 servername kernel: [] ? alloc_fd+0x92/0x160 Mar 20 01:53:49 servername kernel: [] do_sys_open+0x69/0x140 Mar 20 01:53:49 servername kernel: [] sys_open+0x20/0x30 Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b Mar 20 01:53:49 servername kernel: INFO: task BackupPC_dump:82327 blocked for more than 120 seconds. Mar 20 01:53:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 01:53:49 servername kernel: BackupPC_dump D 0000000000000011 0 82327 48176 0x00000080 Mar 20 01:53:49 servername kernel: ffff881e2b0bb918 0000000000000086 0000000000000000 ffff880820ab7800 Mar 20 01:53:49 servername kernel: 0000000000000140 0000000000000000 ffff881e2b0bb8d8 0000000000000246 Mar 20 01:53:49 servername kernel: ffff8820140efab8 ffff881e2b0bbfd8 000000000000fb88 ffff8820140efab8 Mar 20 01:53:49 servername kernel: Call Trace: Mar 20 01:53:49 servername kernel: [] mutex_lock_slowpath+0x13e/0x180 Mar 20 01:53:49 servername kernel: [] mutex_lock+0x2b/0x50 Mar 20 01:53:49 servername kernel: [] zfs_zget+0xb5/0x1d0 [zfs] Mar 20 01:53:49 servername kernel: [] zfs_dirent_lock+0x47c/0x540 [zfs] Mar 20 01:53:49 servername kernel: [] zfs_dirlook+0x8b/0x270 [zfs] Mar 20 01:53:49 servername kernel: [] ? zfs_zaccess+0xa0/0x2d0 [zfs] Mar 20 01:53:49 servername kernel: [] zfs_lookup+0x2fe/0x350 [zfs] Mar 20 01:53:49 servername kernel: [] zpl_lookup+0x57/0xe0 [zfs] Mar 20 01:53:49 servername kernel: [] do_lookup+0x1a5/0x230 Mar 20 01:53:49 servername kernel: [] link_path_walk+0x734/0x1030 Mar 20 01:53:49 servername kernel: [] ? sa_lookup+0x4d/0x60 [zfs] Mar 20 01:53:49 servername kernel: [] path_walk+0x6a/0xe0 Mar 20 01:53:49 servername kernel: [] do_path_lookup+0x5b/0xa0 Mar 20 01:53:49 servername kernel: [] ? get_empty_filp+0xa0/0x180 Mar 20 01:53:49 servername kernel: [] do_filp_open+0xfb/0xd60 Mar 20 01:53:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl] Mar 20 01:53:49 servername kernel: [] ? _atomic_dec_and_lock+0x55/0x80 Mar 20 01:53:49 servername kernel: [] ? mntput_no_expire+0x30/0x110 Mar 20 01:53:49 servername kernel: [] ? alloc_fd+0x92/0x160 Mar 20 01:53:49 servername kernel: [] do_sys_open+0x69/0x140 Mar 20 01:53:49 servername kernel: [] sys_open+0x20/0x30 Mar 20 01:53:49 servername kernel: [] system_call_fastpath+0x16/0x1b Mar 20 02:03:49 servername kernel: INFO: task arc_adapt:8730 blocked for more than 120 seconds. Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 02:03:49 servername kernel: arc_adapt D 0000000000000000 0 8730 2 0x00000000 Mar 20 02:03:49 servername kernel: ffff881fd585fb00 0000000000000046 ffff8810789166e8 ffff881078916680 Mar 20 02:03:49 servername kernel: 0000000000000000 0000000000000006 ffff881fd585faa0 ffffffff81062074 Mar 20 02:03:49 servername kernel: ffff8820140ef058 ffff881fd585ffd8 000000000000fb88 ffff8820140ef058 Mar 20 02:03:49 servername kernel: Call Trace: Mar 20 02:03:49 servername kernel: [] ? enqueue_task_fair+0x64/0x100 Mar 20 02:03:49 servername kernel: [] ? check_preempt_curr+0x6d/0x90 Mar 20 02:03:49 servername kernel: [] mutex_lock_slowpath+0x13e/0x180 Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50 Mar 20 02:03:49 servername kernel: [] arc_buf_remove_ref+0xaf/0x120 [zfs] Mar 20 02:03:49 servername kernel: [] dbuf_rele_and_unlock+0x10f/0x210 [zfs] Mar 20 02:03:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs] Mar 20 02:03:49 servername kernel: [] dnode_rele+0x80/0x90 [zfs] Mar 20 02:03:49 servername kernel: [] dbuf_rele_and_unlock+0x1b0/0x210 [zfs] Mar 20 02:03:49 servername kernel: [] dmu_buf_rele+0x30/0x40 [zfs] Mar 20 02:03:49 servername kernel: [] sa_handle_destroy+0x68/0xa0 [zfs] Mar 20 02:03:49 servername kernel: [] zfs_znode_dmu_fini+0x1d/0x30 [zfs] Mar 20 02:03:49 servername kernel: [] zfs_zinactive+0xd0/0x110 [zfs] Mar 20 02:03:49 servername kernel: [] zfs_inactive+0x7f/0x220 [zfs] Mar 20 02:03:49 servername kernel: [] zpl_clear_inode+0xe/0x10 [zfs] Mar 20 02:03:49 servername kernel: [] clear_inode+0xac/0x140 Mar 20 02:03:49 servername kernel: [] dispose_list+0x40/0x120 Mar 20 02:03:49 servername kernel: [] shrink_icache_memory+0x274/0x2e0 Mar 20 02:03:49 servername kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs] Mar 20 02:03:49 servername kernel: [] zpl_prune_sbs+0x53/0x60 [zfs] Mar 20 02:03:49 servername kernel: [] arc_adjust_meta+0x120/0x1e0 [zfs] Mar 20 02:03:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs] Mar 20 02:03:49 servername kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs] Mar 20 02:03:49 servername kernel: [] arc_adapt_thread+0x6a/0xd0 [zfs] Mar 20 02:03:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl] Mar 20 02:03:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl] Mar 20 02:03:49 servername kernel: [] kthread+0x96/0xa0 Mar 20 02:03:49 servername kernel: [] child_rip+0xa/0x20 Mar 20 02:03:49 servername kernel: [] ? kthread+0x0/0xa0 Mar 20 02:03:49 servername kernel: [] ? child_rip+0x0/0x20 Mar 20 02:03:49 servername kernel: INFO: task txg_sync:9178 blocked for more than 120 seconds. Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 02:03:49 servername kernel: txg_sync D 0000000000000014 0 9178 2 0x00000000 Mar 20 02:03:49 servername kernel: ffff880ed59b1960 0000000000000046 ffff881011782aa0 ffff882000000000 Mar 20 02:03:49 servername kernel: 0000000086ff1eb8 0000000000000082 ffff882019de8060 ffff882017c37718 Mar 20 02:03:49 servername kernel: ffff881011783058 ffff880ed59b1fd8 000000000000fb88 ffff881011783058 Mar 20 02:03:49 servername kernel: Call Trace: Mar 20 02:03:49 servername kernel: [] ? wake_up_common+0x59/0x90 Mar 20 02:03:49 servername kernel: [] mutex_lock_slowpath+0x13e/0x180 Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50 Mar 20 02:03:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs] Mar 20 02:03:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs] Mar 20 02:03:49 servername kernel: [] ? kmalloc+0x20c/0x220 Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50 Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50 Mar 20 02:03:49 servername kernel: [] arc_read+0x82/0x170 [zfs] Mar 20 02:03:49 servername kernel: [] dsl_read+0x31/0x40 [zfs] Mar 20 02:03:49 servername kernel: [] dbuf_prefetch+0x1d1/0x2b0 [zfs] Mar 20 02:03:49 servername kernel: [] dmu_prefetch+0x200/0x220 [zfs] Mar 20 02:03:49 servername kernel: [] metaslab_sync_reassess+0x14f/0x180 [zfs] Mar 20 02:03:49 servername kernel: [] vdev_sync_done+0x6c/0x80 [zfs] Mar 20 02:03:49 servername kernel: [] spa_sync+0x54b/0xa00 [zfs] Mar 20 02:03:49 servername kernel: [] ? read_tsc+0x9/0x20 Mar 20 02:03:49 servername kernel: [] txg_sync_thread+0x307/0x590 [zfs] Mar 20 02:03:49 servername kernel: [] ? set_user_nice+0xc9/0x130 Mar 20 02:03:49 servername kernel: [] ? txg_sync_thread+0x0/0x590 [zfs] Mar 20 02:03:49 servername kernel: [] thread_generic_wrapper+0x68/0x80 [spl] Mar 20 02:03:49 servername kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl] Mar 20 02:03:49 servername kernel: [] kthread+0x96/0xa0 Mar 20 02:03:49 servername kernel: [] child_rip+0xa/0x20 Mar 20 02:03:49 servername kernel: [] ? kthread+0x0/0xa0 Mar 20 02:03:49 servername kernel: [] ? child_rip+0x0/0x20 Mar 20 02:03:49 servername kernel: INFO: task java:17122 blocked for more than 120 seconds. Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 02:03:49 servername kernel: java D 0000000000000013 0 17122 1 0x00000080 Mar 20 02:03:49 servername kernel: ffff880e3ad3f7e8 0000000000000086 0000000000000000 ffffffffa0374266 Mar 20 02:03:49 servername kernel: ffff880e00100001 ffff881a432fe090 ffff881a432fe2f8 ffff880e8e4c0000 Mar 20 02:03:49 servername kernel: ffff880e3ad38638 ffff880e3ad3ffd8 000000000000fb88 ffff880e3ad38638 Mar 20 02:03:49 servername kernel: Call Trace: Mar 20 02:03:49 servername kernel: [] ? dmu_object_size_from_db+0x66/0x90 [zfs] Mar 20 02:03:49 servername kernel: [] __mutex_lock_slowpath+0x13e/0x180 Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50 Mar 20 02:03:49 servername kernel: [] buf_hash_find+0x87/0x110 [zfs] Mar 20 02:03:49 servername kernel: [] arc_read_nolock+0x6e/0x820 [zfs] Mar 20 02:03:49 servername kernel: [] ? dbuf_read_done+0x0/0x110 [zfs] Mar 20 02:03:49 servername kernel: [] arc_read+0x82/0x170 [zfs] Mar 20 02:03:49 servername kernel: [] ? dnode_block_freed+0xdb/0x160 [zfs] Mar 20 02:03:49 servername kernel: [] ? dbuf_read_done+0x0/0x110 [zfs] Mar 20 02:03:49 servername kernel: [] dsl_read+0x31/0x40 [zfs] Mar 20 02:03:49 servername kernel: [] dbuf_read+0x1f9/0x720 [zfs] Mar 20 02:03:49 servername kernel: [] dmu_buf_hold+0x108/0x1d0 [zfs] Mar 20 02:03:49 servername kernel: [] zap_lockdir+0x57/0x730 [zfs] Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50 Mar 20 02:03:49 servername kernel: [] ? dbuf_find+0xf9/0x100 [zfs] Mar 20 02:03:49 servername kernel: [] zap_cursor_retrieve+0x1e4/0x2f0 [zfs] Mar 20 02:03:49 servername kernel: [] ? dmu_prefetch+0x169/0x220 [zfs] Mar 20 02:03:49 servername kernel: [] zfs_readdir+0x17a/0x510 [zfs] Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0 Mar 20 02:03:49 servername kernel: [] ? dentry_open+0x23f/0x360 Mar 20 02:03:49 servername kernel: [] ? security_inode_permission+0x1f/0x30 Mar 20 02:03:49 servername kernel: [] ? nameidata_to_filp+0x54/0x70 Mar 20 02:03:49 servername kernel: [] ? do_filp_open+0x6da/0xd60 Mar 20 02:03:49 servername kernel: [] ? zfs_getattr_fast+0xdd/0x160 [zfs] Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0 Mar 20 02:03:49 servername kernel: [] zpl_readdir+0x5c/0x90 [zfs] Mar 20 02:03:49 servername kernel: [] ? filldir+0x0/0xe0 Mar 20 02:03:49 servername kernel: [] vfs_readdir+0xc0/0xe0 Mar 20 02:03:49 servername kernel: [] sys_getdents+0x89/0xf0 Mar 20 02:03:49 servername kernel: [] system_call_fastpath+0x16/0x1b Mar 20 02:03:49 servername kernel: INFO: task java:17261 blocked for more than 120 seconds. Mar 20 02:03:49 servername kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 20 02:03:49 servername kernel: java D 0000000000000009 0 17261 1 0x00000080 Mar 20 02:03:49 servername kernel: ffff880e3adcfab8 0000000000000086 0000000000000000 ffff880e3adcfa7c Mar 20 02:03:49 servername kernel: 0000000000000000 ffff88103fe82200 ffff881078856680 00000000000000ac Mar 20 02:03:49 servername kernel: ffff880e3adcdab8 ffff880e3adcffd8 000000000000fb88 ffff880e3adcdab8 Mar 20 02:03:49 servername kernel: Call Trace: Mar 20 02:03:49 servername kernel: [] mutex_lock_slowpath+0x13e/0x180 Mar 20 02:03:49 servername kernel: [] mutex_lock+0x2b/0x50 Mar 20 02:03:49 servername kernel: [] arc_buf_add_ref+0x9a/0x140 [zfs] Mar 20 02:03:49 servername kernel: [] dbuf_hold_impl+0x66/0x480 [zfs] Mar 20 02:03:49 servername kernel: [] dbuf_hold_impl+0x86/0xc0 [zfs] Mar 20 02:03:49 servername kernel: [] dbuf_hold+0x20/0x30 [zfs] Mar 20 02:03:49 servername kernel: [] dmu_buf_hold_array_by_dnode+0x155/0x570 [zfs] Mar 20 02:03:49 servername kernel: [] dmu_buf_hold_array+0x65/0x90 [zfs] Mar 20 02:03:49 servername kernel: [] ? avl_add+0x38/0x50 [zavl] Mar 20 02:03:49 servername kernel: [] dmu_read_uio+0x41/0xd0 [zfs] Mar 20 02:03:49 servername kernel: [] ? mutex_lock+0x1e/0x50 Mar 20 02:03:49 servername kernel: [] zfs_read+0x178/0x4c0 [zfs] Mar 20 02:03:49 servername kernel: [] ? tsd_exit+0x5f/0x2b0 [spl] Mar 20 02:03:49 servername kernel: [] zpl_read_common+0x52/0x70 [zfs] Mar 20 02:03:49 servername kernel: [] zpl_read+0x68/0xa0 [zfs] Mar 20 02:03:49 servername kernel: [] ? security_file_permission+0x16/0x20 Mar 20 02:03:49 servername kernel: [] vfs_read+0xb5/0x1a0 Mar 20 02:03:49 servername kernel: [] sys_read+0x51/0x90 Mar 20 02:03:49 servername kernel: [] ? __audit_syscall_exit+0x265/0x290 Mar 20 02:03:49 servername kernel: [] system_call_fastpath+0x16/0x1b

ryao commented 11 years ago

Are you using the latest GIT code or 0.6.0-rc14?

jstillwa commented 11 years ago

rc14

ryao commented 11 years ago

Would you try git head master?

jstillwa commented 11 years ago

Sure. I can give it a try. May take a couple of days to get a result yay or nay. I'll check back in.

jstillwa commented 11 years ago

Popped in the master today. Will push the server pretty hard and see what's what.

jstillwa commented 11 years ago

Actually seems significantly more unstable. Tried zfs_arc_memory_throttle_disable=1 and it crashed after about 4 hours. Before it would crash after 2 or 3 days. Removed that option and installed the latest git master. Crashed after about the same amount of time with a load of 35ish. I also had an increase in the max arc size to about 75% of my ram (there's 128GB installed) so going to remove that option as well and have it drop back down to 64GB.

jstillwa commented 11 years ago

while tailing the messages log, I saw these errors pop up again, but they don't seem to be taking down the server... yet. But I suspect, the issue is still there.

Any other debugging information I can provide to help find this issue?

behlendorf commented 11 years ago

@jstillwa It looks like you may have managed to find a deadlock related to the arc buffer hash table. Several of your threads are blocked waiting on this lock (see buf_hash_find, arc_buf_add_ref, arc_buf_remove_ref). But it's unclear to me from the stacks you were able to post what task is holding that lock. Getting a stack trace from the task holding that lock is the key.

To do this you'll need to reproduce the issue and then issue a sysrq t to dump all the stacks to the console so they can be collected. If the system is still responsive this can be done run by running echo t >/proc/syreq-trigger. Otherwise you'll need to configure a serial console for the system and issue the sysrq that way.

jstillwa commented 11 years ago

Will do. Been trying not to stress it out, since I need the machine to actually do stuff, but I upped the load on it a bit. Also, switched over to the latest RPM release rather than running the git head. Ready to either be pleasantly surprised it doesn't lock or to trace.

jstillwa commented 11 years ago

How's this look? https://www.dropbox.com/s/j5ylh4v9zuxbmkj/session2.log

behlendorf commented 11 years ago

That may be enough. Although there are clearly some missing stacks which didn't fit in the kernel ring buffer. It's going to take me some time to digest this.

jstillwa commented 11 years ago

Just as an update, been running it under a load of 5-15 for the last week and it's been totally stable. So, it does seem connected to high load situations.

jstillwa commented 11 years ago

Or at least, the issue culminates slower at lower loads.

Stimpycat commented 11 years ago

Hi: I've been getting attacked by this issue too. Running latest Redhat, 198GB ram, 30TB pool, standard 6.1 zfsonlinux. I managed to issue a shutdown which seems to be slowly taking effect I'll post anything interesting I see in the logs. I have a test setup that I'm going to try to make crash and I'll issue the 'echo t >/proc/syreq-trigger'. Otherwise I'll try angering the production server after hours. Thanks, S

Stimpycat commented 11 years ago

Okay, here is some data. I issued a shutdown on the barely functioning system, it never did shutdown, but now seems okay. Below please find the first soft lockup, as well as the results of sysreq-trigger. I hope this isn't a different bug, I've noticed other posts for the arc_adept.

Apr 11 11:30:18 kiggly kernel: BUG: soft lockup - CPU#1 stuck for 67s! [arc_adapt:1483] Apr 11 11:30:18 kiggly kernel: Modules linked in: be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsitcp libiscsi tcp libiscsi scsi_transport_iscsi autofs4 sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ip6_tables ipv6 sr_m od cdrom microcode zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ioatdma sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp igb dca ptp pps_core i2c_i801 i2c_core acp i_pad sg ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Apr 11 11:30:18 kiggly kernel: CPU 1 Apr 11 11:30:18 kiggly kernel: Modules linked in: be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsitcp libiscsi tcp libiscsi scsi_transport_iscsi autofs4 sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ip6_tables ipv6 sr_m od cdrom microcode zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) spl(U) zlib_deflate ioatdma sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp igb dca ptp pps_core i2c_i801 i2c_core acp i_pad sg ext4 mbcache jbd2 sd_mod crc_t10dif usb_storage ahci isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Apr 11 11:30:18 kiggly kernel: Apr 11 11:30:18 kiggly kernel: Pid: 1483, comm: arc_adapt Tainted: P --------------- 2.6.32-358.2.1.el6.x86_64 #1 Lenovo Lenovo ThinkServer RD430/Reno/Raleigh Apr 11 11:30:18 kiggly kernel: RIP: 0010:[] [] remove_vm_area+0x86/0xa0 Apr 11 11:30:18 kiggly kernel: RSP: 0018:ffff881834741c60 EFLAGS: 00000283 Apr 11 11:30:18 kiggly kernel: RAX: ffff881a667f11c0 RBX: ffff881834741c80 RCX: ffff88291232a340 Apr 11 11:30:18 kiggly kernel: RDX: ffff881dd23dfd40 RSI: 0000000000000001 RDI: ffffffff81ac2450 Apr 11 11:30:18 kiggly kernel: RBP: ffffffff8100bb8e R08: ffffea005ffd3520 R09: 0000000000000000 Apr 11 11:30:18 kiggly kernel: R10: ffff88188033c740 R11: 0000000000000000 R12: ffff881834741c30 Apr 11 11:30:18 kiggly kernel: R13: 0000000000000297 R14: ffff881880010dc0 R15: ffffea005ffd3530 Apr 11 11:30:18 kiggly kernel: FS: 0000000000000000(0000) GS:ffff880028280000(0000) knlGS:0000000000000000 Apr 11 11:30:18 kiggly kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Apr 11 11:30:18 kiggly kernel: CR2: 00007ff66f9a0000 CR3: 0000000001a85000 CR4: 00000000000407e0 Apr 11 11:30:18 kiggly kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 11 11:30:18 kiggly kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 11 11:30:18 kiggly kernel: Process arc_adapt (pid: 1483, threadinfo ffff881834740000, task ffff881857534040) Apr 11 11:30:18 kiggly kernel: Stack: Apr 11 11:30:18 kiggly kernel: ffff88290945e280 ffff881a667f11c0 ffffc91cdf657000 ffffc92504739000 Apr 11 11:30:18 kiggly kernel: ffff881834741cb0 ffffffff81150d0e ffff882948971440 ffff881859bb0000 Apr 11 11:30:18 kiggly kernel: ffffc924aa383000 0000000000000000 ffff881834741cc0 ffffffff81150e7a Apr 11 11:30:18 kiggly kernel: Call Trace: Apr 11 11:30:18 kiggly kernel: [] ? vunmap+0x2e/0x120 Apr 11 11:30:18 kiggly kernel: [] ? vfree+0x2a/0x40 Apr 11 11:30:18 kiggly kernel: [] ? kv_free+0x65/0x70 [spl] Apr 11 11:30:18 kiggly kernel: [] ? spl_slab_reclaim+0x2d9/0x3e0 [spl] Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_cache_reap_now+0x144/0x230 [spl] Apr 11 11:30:18 kiggly kernel: [] ? dispose_list+0xfc/0x120 Apr 11 11:30:18 kiggly kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_cache_generic_shrinker+0x4b/0xe0 [spl] Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_reap+0x27/0x30 [spl] Apr 11 11:30:18 kiggly kernel: [] ? zpl_prune_sbs+0x58/0x60 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? arc_adjust_meta+0x120/0x1e0 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x6a/0xd0 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? thread_generic_wrapper+0x68/0x80 [spl] Apr 11 11:30:18 kiggly kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl] Apr 11 11:30:18 kiggly kernel: [] ? kthread+0x96/0xa0 Apr 11 11:30:18 kiggly kernel: [] ? child_rip+0xa/0x20 Apr 11 11:30:18 kiggly kernel: [] ? kthread+0x0/0xa0 Apr 11 11:30:18 kiggly kernel: [] ? child_rip+0x0/0x20 Apr 11 11:30:18 kiggly kernel: Code: 50 24 ac 81 48 89 45 e8 e8 48 f6 3b 00 48 8b 15 d9 69 e6 00 48 c7 c1 f8 75 fb 81 48 8b 45 e8 48 39 d3 74 0c 90 48 89 d1 48 8b 12 <48> 39 d3 75 f5 48 8b 13 48 89 11 f0 81 05 04 18 97 00 00 00 00 Apr 11 11:30:18 kiggly kernel: Call Trace: Apr 11 11:30:18 kiggly kernel: [] ? vunmap+0x2e/0x120 Apr 11 11:30:18 kiggly kernel: [] ? vfree+0x2a/0x40 Apr 11 11:30:18 kiggly kernel: [] ? kv_free+0x65/0x70 [spl] Apr 11 11:30:18 kiggly kernel: [] ? spl_slab_reclaim+0x2d9/0x3e0 [spl] Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_cache_reap_now+0x144/0x230 [spl] Apr 11 11:30:18 kiggly kernel: [] ? dispose_list+0xfc/0x120 Apr 11 11:30:18 kiggly kernel: [] ? zpl_prune_sbs+0x0/0x60 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_cache_generic_shrinker+0x4b/0xe0 [spl] Apr 11 11:30:18 kiggly kernel: [] ? spl_kmem_reap+0x27/0x30 [spl] Apr 11 11:30:18 kiggly kernel: [] ? zpl_prune_sbs+0x58/0x60 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? arc_adjust_meta+0x120/0x1e0 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x0/0xd0 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? arc_adapt_thread+0x6a/0xd0 [zfs] Apr 11 11:30:18 kiggly kernel: [] ? thread_generic_wrapper+0x68/0x80 [spl] Apr 11 11:30:18 kiggly kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl] Apr 11 11:30:18 kiggly kernel: [] ? kthread+0x96/0xa0 Apr 11 11:30:18 kiggly kernel: [] ? child_rip+0xa/0x20 Apr 11 11:30:18 kiggly kernel: [] ? kthread+0x0/0xa0 Apr 11 11:30:18 kiggly kernel: [] ? child_rip+0x0/0x20

The sysreq-trigger is at :+1:

http://ge.tt/49PTtmd/v/0?c

That was after the system came back though, so I'm not sure if its useful. It seems to be happening quite a bit so I should be able to issue a quicker command in a bit.

Please let me know if there is something else I can help with. -Shon

behlendorf commented 11 years ago

@Stimpycat Your issue looks a bit different, although the symptoms may be the same. It looks like your suffering from contention on the virtual addrtess space lock, that should be addressed by the page integration work. Unfortunately, that's still a work in progress.

Stimpycat commented 11 years ago
 Okay thank you. I've been suffering a few lockups, and they do appear slightly 'different'. 

I'll try opening a new issue if I can catch a good systeq-trigger. BTW, thanks again for the project! -s

byteharmony commented 11 years ago

We've had similar issues in the past. Not so much since rc12 with the following configs. Systems running at 40 load for a few hours during windows backups on Linux VMs.

[root@nas503 ~]# cat /etc/modprobe.d/zfs.conf options zfs zfs_arc_max=1073741824 zfs_scrub_limit=1 zvol_threads=8 [root@nas503 ~]#

There have been some threads up here regarding ram being used for both arc cache (ZFS) and linux buffer, why buffer twice?

Can't find another way to slow the scrub down to reduce the impact more than the above config.

I'll be posting an issue tonight regarding SSD, SAS and SATA bus you may want to read.

BK

jstillwa commented 11 years ago

So, basically, you're killing the Arc with this setup, right?

jstillwa commented 11 years ago

Still seeing this with the latest package from the repo -

INFO: task arc_adapt:8299 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. arc_adapt D 0000000000000012 0 8299 2 0x00000000 ffff880ff8b83c40 0000000000000046 ffff880a158b52a0 ffff88203fc09340 ffff88103fc40100 ffff88103fc21180 0000000000000000 ffffffff8100bb8e ffff881024262638 ffff880ff8b83fd8 000000000000fb88 ffff881024262638 Call Trace: [] ? apic_timer_interrupt+0xe/0x20 [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] zfs_zinactive+0x7b/0x110 [zfs] [] zfs_inactive+0x7f/0x220 [zfs] [] zpl_clear_inode+0xe/0x10 [zfs] [] clear_inode+0xac/0x140 [] dispose_list+0x40/0x120 [] shrink_icache_memory+0x274/0x2e0 [] ? zpl_prune_sbs+0x0/0x60 [zfs] [] zpl_prune_sbs+0x53/0x60 [zfs] [] arc_adjust_meta+0x120/0x1e0 [zfs] [] ? arc_adapt_thread+0x0/0xd0 [zfs] [] ? arc_adapt_thread+0x0/0xd0 [zfs] [] arc_adapt_thread+0x6a/0xd0 [zfs] [] thread_generic_wrapper+0x68/0x80 [spl] [] ? thread_generic_wrapper+0x0/0x80 [spl] [] kthread+0x96/0xa0 [] child_rip+0xa/0x20 [] ? kthread+0x0/0xa0 [] ? child_rip+0x0/0x20

etc etc

It seems like if you respond quick enough and lower the load, you can "escape" before it deadlocks out.

jstillwa commented 11 years ago

Also saw this pop up in syslog today after a night of fairly heavy work, at least compared to how I've kept the server running thus far.

kernel: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-2744668268

jstillwa commented 11 years ago

So, we switched from using 2x LSI SAS 9207-8e (LSISAS2308 chipset) to using 2x LSI SAS 9200-8e (LSI SAS2008) and so far things seem stable. Haven't gotten it to deadlock again and have been pushing it pretty hard. So this could be an issue with the SAS2008 chip/driver? Will let it run hard for another week or so to confirm.

chrisrd commented 11 years ago

I'm surprised changing the HBAs makes a significant difference to this issue unless there's something significantly wrong with the 9207 HBA's drivers or firmware.

A grab bag of questions... Are both type of HBAs using the mpt2sas driver? What firmware revisions on the 9207 vs 9200 HBAs? What model hard drives? Were you getting any kern.log errors from the 9207 HBAs, e.g. timeouts etc?

jstillwa commented 11 years ago

Same default driver, mpt2sas, both on firmware 15, half of the drives are WD4001FYYG-01SL3, the other just added half are HUS7230ALS640 Hitachis. No errors outside of the ones posted above.

I have noticed a slow crawl down on free memory, so set a monitor on that to ping me when I drop below 20GB free. chart

Though, I haven't been able to really keep load above 20 for any real long period of time. A lot of spindles on that server now.

chrisrd commented 11 years ago

Ok, nothing standing out to me there. What kernel and mpt2sas version are you on? If you're not aware, firmware 16 is available on the LSI site - it could be worth checking the release notes to see if there's anything in there that might relate to your 9207 issue. (That's if you still have any interest in the 9207s!)

jstillwa commented 11 years ago

2.6.32-358.6.1.el6.x86_64 13.101

Again, I haven't been able to get them to deadlock, but of course, it still might. But with the same workloads, and half as many drives, it would have deadlocked by now with the previous setup. I'll check that changelog.

jstillwa commented 11 years ago

Well, it finally happened again, or could be something else since I think that first slab error looks new?

Jun 7 21:17:09 forlong kernel: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-503011486 Jun 7 21:20:10 forlong kernel: INFO: task txg_quiesce:17697 blocked for more than 120 seconds. Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:20:10 forlong kernel: txg_quiesce D 0000000000000007 0 17697 2 0x00000000 Jun 7 21:20:10 forlong kernel: ffff880fdfa37d40 0000000000000046 0000000000000055 ffff882012c4a6a0 Jun 7 21:20:10 forlong kernel: ffff880fdfa37cf0 ffffffff8116829c ffff880fdfa37de0 ffffffff000080d0 Jun 7 21:20:10 forlong kernel: ffff881023e41058 ffff880fdfa37fd8 000000000000fb88 ffff881023e41058 Jun 7 21:20:10 forlong kernel: Call Trace: Jun 7 21:20:10 forlong kernel: [] ? kmalloc+0x20c/0x220 Jun 7 21:20:10 forlong kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80 Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:20:10 forlong kernel: [] ? __bitmap_weight+0x50/0xb0 Jun 7 21:20:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:20:10 forlong kernel: [] txg_quiesce_thread+0x243/0x3a0 [zfs] Jun 7 21:20:10 forlong kernel: [] ? set_user_nice+0xc9/0x130 Jun 7 21:20:10 forlong kernel: [] ? txg_quiesce_thread+0x0/0x3a0 [zfs] Jun 7 21:20:10 forlong kernel: [] thread_generic_wrapper+0x68/0x80 [spl] Jun 7 21:20:10 forlong kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl] Jun 7 21:20:10 forlong kernel: [] kthread+0x96/0xa0 Jun 7 21:20:10 forlong kernel: [] child_rip+0xa/0x20 Jun 7 21:20:10 forlong kernel: [] ? kthread+0x0/0xa0 Jun 7 21:20:10 forlong kernel: [] ? child_rip+0x0/0x20 Jun 7 21:20:10 forlong kernel: INFO: task java:127837 blocked for more than 120 seconds. Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:20:10 forlong kernel: java D 0000000000000000 0 127837 1 0x00000080 Jun 7 21:20:10 forlong kernel: ffff8818b6f43b28 0000000000000086 0000000000000000 ffff880037907840 Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000286 ffff8818b6f43bc8 ffffffffa022c9e7 Jun 7 21:20:10 forlong kernel: ffff881c395ddab8 ffff8818b6f43fd8 000000000000fb88 ffff881c395ddab8 Jun 7 21:20:10 forlong kernel: Call Trace: Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl] Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:20:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs] Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 7 21:20:10 forlong kernel: [] zfs_write+0x3be/0xca0 [zfs] Jun 7 21:20:10 forlong kernel: [] ? futex_wake+0x10e/0x120 Jun 7 21:20:10 forlong kernel: [] zpl_write_common+0x52/0x70 [zfs] Jun 7 21:20:10 forlong kernel: [] zpl_write+0x68/0xa0 [zfs] Jun 7 21:20:10 forlong kernel: [] ? security_file_permission+0x16/0x20 Jun 7 21:20:10 forlong kernel: [] vfs_write+0xb8/0x1a0 Jun 7 21:20:10 forlong kernel: [] sys_pwrite64+0x82/0xa0 Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:81700 blocked for more than 120 seconds. Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:20:10 forlong kernel: BackupPC_dump D 0000000000000015 0 81700 80535 0x00000080 Jun 7 21:20:10 forlong kernel: ffff881a28137b28 0000000000000082 0000000000000000 ffff882018717260 Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000282 ffff881a28137bc8 ffffffffa022c9e7 Jun 7 21:20:10 forlong kernel: ffff8806e3145af8 ffff881a28137fd8 000000000000fb88 ffff8806e3145af8 Jun 7 21:20:10 forlong kernel: Call Trace: Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl] Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:20:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs] Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 7 21:20:10 forlong kernel: [] zfs_write+0x3be/0xca0 [zfs] Jun 7 21:20:10 forlong kernel: [] ? tsd_exit+0x5f/0x2b0 [spl] Jun 7 21:20:10 forlong kernel: [] zpl_write_common+0x52/0x70 [zfs] Jun 7 21:20:10 forlong kernel: [] zpl_write+0x68/0xa0 [zfs] Jun 7 21:20:10 forlong kernel: [] ? security_file_permission+0x16/0x20 Jun 7 21:20:10 forlong kernel: [] vfs_write+0xb8/0x1a0 Jun 7 21:20:10 forlong kernel: [] sys_write+0x51/0x90 Jun 7 21:20:10 forlong kernel: [] ? audit_syscall_exit+0x265/0x290 Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:82759 blocked for more than 120 seconds. Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:20:10 forlong kernel: BackupPC_dump D 0000000000000007 0 82759 82640 0x00000080 Jun 7 21:20:10 forlong kernel: ffff8818b9787c18 0000000000000082 0000000000000055 ffff882018f53760 Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000282 ffff8818b9787cb8 ffffffffa022c9e7 Jun 7 21:20:10 forlong kernel: ffff88201e6fe5f8 ffff8818b9787fd8 000000000000fb88 ffff88201e6fe5f8 Jun 7 21:20:10 forlong kernel: Call Trace: Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl] Jun 7 21:20:10 forlong kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80 Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:20:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs] Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 7 21:20:10 forlong kernel: [] zfs_link+0x31e/0x3f0 [zfs] Jun 7 21:20:10 forlong kernel: [] zpl_link+0xa1/0xd0 [zfs] Jun 7 21:20:10 forlong kernel: [] vfs_link+0x102/0x170 Jun 7 21:20:10 forlong kernel: [] ? lookup_hash+0x3a/0x50 Jun 7 21:20:10 forlong kernel: [] sys_linkat+0x114/0x150 Jun 7 21:20:10 forlong kernel: [] ? sys_newstat+0x36/0x50 Jun 7 21:20:10 forlong kernel: [] ? audit_syscall_entry+0x1d7/0x200 Jun 7 21:20:10 forlong kernel: [] sys_link+0x1e/0x20 Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:86430 blocked for more than 120 seconds. Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:20:10 forlong kernel: BackupPC_dump D 0000000000000006 0 86430 85213 0x00000080 Jun 7 21:20:10 forlong kernel: ffff881eef8f5c18 0000000000000086 0000000000000055 ffff88201be745a0 Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000286 ffff881eef8f5cb8 ffffffffa022c9e7 Jun 7 21:20:10 forlong kernel: ffff88009312c638 ffff881eef8f5fd8 000000000000fb88 ffff88009312c638 Jun 7 21:20:10 forlong kernel: Call Trace: Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl] Jun 7 21:20:10 forlong kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80 Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:20:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs] Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 7 21:20:10 forlong kernel: [] zfs_link+0x31e/0x3f0 [zfs] Jun 7 21:20:10 forlong kernel: [] zpl_link+0xa1/0xd0 [zfs] Jun 7 21:20:10 forlong kernel: [] vfs_link+0x102/0x170 Jun 7 21:20:10 forlong kernel: [] ? lookup_hash+0x3a/0x50 Jun 7 21:20:10 forlong kernel: [] sys_linkat+0x114/0x150 Jun 7 21:20:10 forlong kernel: [] ? vfs_read+0x12f/0x1a0 Jun 7 21:20:10 forlong kernel: [] ? audit_syscall_entry+0x1d7/0x200 Jun 7 21:20:10 forlong kernel: [] sys_link+0x1e/0x20 Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:93050 blocked for more than 120 seconds. Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:20:10 forlong kernel: BackupPC_dump D 000000000000000c 0 93050 92304 0x00000080 Jun 7 21:20:10 forlong kernel: ffff8815d1f67bd8 0000000000000082 0000000000000000 ffff8800bd0dabe0 Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000286 ffff8815d1f67c78 ffffffffa022c9e7 Jun 7 21:20:10 forlong kernel: ffff8820251b85f8 ffff8815d1f67fd8 000000000000fb88 ffff8820251b85f8 Jun 7 21:20:10 forlong kernel: Call Trace: Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl] Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:20:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs] Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 7 21:20:10 forlong kernel: [] zfs_mkdir+0x1b3/0x590 [zfs] Jun 7 21:20:10 forlong kernel: [] zpl_mkdir+0x9c/0xe0 [zfs] Jun 7 21:20:10 forlong kernel: [] ? generic_permission+0x23/0xb0 Jun 7 21:20:10 forlong kernel: [] vfs_mkdir+0xa7/0x100 Jun 7 21:20:10 forlong kernel: [] sys_mkdirat+0xb7/0x130 Jun 7 21:20:10 forlong kernel: [] ? sys_newstat+0x36/0x50 Jun 7 21:20:10 forlong kernel: [] sys_mkdir+0x18/0x20 Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b Jun 7 21:20:10 forlong kernel: INFO: task BackupPC_dump:95544 blocked for more than 120 seconds. Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:20:10 forlong kernel: BackupPC_dump D 0000000000000006 0 95544 93759 0x00000080 Jun 7 21:20:10 forlong kernel: ffff881d8c4bdc18 0000000000000082 0000000000000000 ffff88201be745a0 Jun 7 21:20:10 forlong kernel: 0000000000000044 0000000000000286 ffff881d8c4bdcb8 ffffffffa022c9e7 Jun 7 21:20:10 forlong kernel: ffff881b8ebbdab8 ffff881d8c4bdfd8 000000000000fb88 ffff881b8ebbdab8 Jun 7 21:20:10 forlong kernel: Call Trace: Jun 7 21:20:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl] Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:20:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs] Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 7 21:20:10 forlong kernel: [] zfs_link+0x31e/0x3f0 [zfs] Jun 7 21:20:10 forlong kernel: [] zpl_link+0xa1/0xd0 [zfs] Jun 7 21:20:10 forlong kernel: [] vfs_link+0x102/0x170 Jun 7 21:20:10 forlong kernel: [] ? lookup_hash+0x3a/0x50 Jun 7 21:20:10 forlong kernel: [] sys_linkat+0x114/0x150 Jun 7 21:20:10 forlong kernel: [] ? vfs_read+0x12f/0x1a0 Jun 7 21:20:10 forlong kernel: [] ? audit_syscall_entry+0x1d7/0x200 Jun 7 21:20:10 forlong kernel: [] sys_link+0x1e/0x20 Jun 7 21:20:10 forlong kernel: [] system_call_fastpath+0x16/0x1b Jun 7 21:20:10 forlong kernel: INFO: task VeeamAgentb43d8:104889 blocked for more than 120 seconds. Jun 7 21:20:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:20:10 forlong kernel: VeeamAgentb43 D 0000000000000011 0 104889 104281 0x00020080 Jun 7 21:20:10 forlong kernel: ffff8818fcf75b28 0000000000000082 ffff8818fcf75af0 ffff8818fcf75aec Jun 7 21:20:10 forlong kernel: 0000000000000044 ffff88103fe83200 ffff880061936700 0000000000000202 Jun 7 21:20:10 forlong kernel: ffff88201efabaf8 ffff8818fcf75fd8 000000000000fb88 ffff88201efabaf8 Jun 7 21:20:10 forlong kernel: Call Trace: Jun 7 21:20:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:20:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:20:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:20:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs] Jun 7 21:20:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 7 21:20:10 forlong kernel: [] zfs_write+0x3be/0xca0 [zfs] Jun 7 21:20:10 forlong kernel: [] ? jbd2_log_wait_commit+0xf5/0x140 [jbd2] Jun 7 21:20:10 forlong kernel: [] zpl_write_common+0x52/0x70 [zfs] Jun 7 21:20:10 forlong kernel: [] zpl_write+0x68/0xa0 [zfs] Jun 7 21:20:10 forlong kernel: [] ? security_file_permission+0x16/0x20 Jun 7 21:20:10 forlong kernel: [] vfs_write+0xb8/0x1a0 Jun 7 21:20:10 forlong kernel: [] sys_write+0x51/0x90 Jun 7 21:20:10 forlong kernel: [] ia32_sysret+0x0/0x5 Jun 7 21:22:10 forlong kernel: INFO: task txg_quiesce:17697 blocked for more than 120 seconds. Jun 7 21:22:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:22:10 forlong kernel: txg_quiesce D 0000000000000007 0 17697 2 0x00000000 Jun 7 21:22:10 forlong kernel: ffff880fdfa37d40 0000000000000046 0000000000000055 ffff882012c4a6a0 Jun 7 21:22:10 forlong kernel: ffff880fdfa37cf0 ffffffff8116829c ffff880fdfa37de0 ffffffff000080d0 Jun 7 21:22:10 forlong kernel: ffff881023e41058 ffff880fdfa37fd8 000000000000fb88 ffff881023e41058 Jun 7 21:22:10 forlong kernel: Call Trace: Jun 7 21:22:10 forlong kernel: [] ? kmalloc+0x20c/0x220 Jun 7 21:22:10 forlong kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80 Jun 7 21:22:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:22:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:22:10 forlong kernel: [] ? bitmap_weight+0x50/0xb0 Jun 7 21:22:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:22:10 forlong kernel: [] txg_quiesce_thread+0x243/0x3a0 [zfs] Jun 7 21:22:10 forlong kernel: [] ? set_user_nice+0xc9/0x130 Jun 7 21:22:10 forlong kernel: [] ? txg_quiesce_thread+0x0/0x3a0 [zfs] Jun 7 21:22:10 forlong kernel: [] thread_generic_wrapper+0x68/0x80 [spl] Jun 7 21:22:10 forlong kernel: [] ? thread_generic_wrapper+0x0/0x80 [spl] Jun 7 21:22:10 forlong kernel: [] kthread+0x96/0xa0 Jun 7 21:22:10 forlong kernel: [] child_rip+0xa/0x20 Jun 7 21:22:10 forlong kernel: [] ? kthread+0x0/0xa0 Jun 7 21:22:10 forlong kernel: [] ? child_rip+0x0/0x20 Jun 7 21:22:10 forlong kernel: INFO: task java:127837 blocked for more than 120 seconds. Jun 7 21:22:10 forlong kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 7 21:22:10 forlong kernel: java D 0000000000000000 0 127837 1 0x00000080 Jun 7 21:22:10 forlong kernel: ffff8818b6f43b28 0000000000000086 0000000000000000 ffff880037907840 Jun 7 21:22:10 forlong kernel: 0000000000000044 0000000000000286 ffff8818b6f43bc8 ffffffffa022c9e7 Jun 7 21:22:10 forlong kernel: ffff881c395ddab8 ffff8818b6f43fd8 000000000000fb88 ffff881c395ddab8 Jun 7 21:22:10 forlong kernel: Call Trace: Jun 7 21:22:10 forlong kernel: [] ? spl_debug_msg+0x427/0x9d0 [spl] Jun 7 21:22:10 forlong kernel: [] cv_wait_common+0x105/0x1c0 [spl] Jun 7 21:22:10 forlong kernel: [] ? autoremove_wake_function+0x0/0x40 Jun 7 21:22:10 forlong kernel: [] cv_wait+0x15/0x20 [spl] Jun 7 21:22:10 forlong kernel: [] txg_wait_open+0x8b/0x110 [zfs] Jun 7 21:22:10 forlong kernel: [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 7 21:22:10 forlong kernel: [] zfs_write+0x3be/0xca0 [zfs] Jun 7 21:22:10 forlong kernel: [] ? futex_wake+0x10e/0x120 Jun 7 21:22:10 forlong kernel: [] zpl_write_common+0x52/0x70 [zfs] Jun 7 21:22:10 forlong kernel: [] zpl_write+0x68/0xa0 [zfs] Jun 7 21:22:10 forlong kernel: [] ? security_file_permission+0x16/0x20 Jun 7 21:22:10 forlong kernel: [] vfs_write+0xb8/0x1a0 Jun 7 21:22:10 forlong kernel: [] sys_pwrite64+0x82/0xa0 Jun 7 21:22:10 forlong kernel: [] system_call_fastpath+0x16/0x1b

ryao commented 11 years ago

Are you using swap on a zvol? If you are, then zfsonlinux/zfs@57f5a2008e2e6acf58934cf43c5fdca0faffa73e might fix this. It was committed to head about 6 weeks ago. zfsonlinux/zfs@1508 should also fix some deadlocks when doing swap on a zvol.

jstillwa commented 11 years ago

Nah, the boot vol/swap is all ext4. Just data stores are zfs.

behlendorf commented 11 years ago

@jstillwa This looks like something else. The system is blocked waiting on the txg_sync thread, although that thread is clearly still making forward progress since no stack was dumped for it. If this happens again grab the contents of the /proc/spl/kstat/zfs/dmu_tx file and a stack from txg_sync it should provide some information about why.

Incidentally, there are some known reasons why this might happen which are being worked on but without additional debug it's hard to say if you've hit any of those cases. For example, it's still possible for txg_sync thread to take quite a long time to write the txg if the pool is very full or very fragmented.

eborisch commented 11 years ago

I'm having a similar failure (see below) where the system free memory just seems to keep dropping until the shrinker error. At that point (as expected?) zfs is locked up and I have to reboot to recover.

There was a single rsync job running at the time, with minimal new data transfer, just lots of file/directory searching. I've set zfs_arc_meta_limit=16106127360 as this is a rsync backup server that mainly walks through the directory entries, rather than reading file data.

Here's some memory history leading up to the error at 22:22. Sorry for the missing userdata (total - meta arc size); still tweaking the monitoring. The main dataset being hit is configured with primarycache=metadata, so the metadata size ~ ARC size.

screen shot 2013-08-02 at 8 50 26 am

Any suggestions greatly appreciated. I'm going to remove my zfs_arc_meta_limit setting for now to see if that helps.

Eric

Note: I've removed the date (Aug 1) and the ' kernel' from each line just to save some display space.

22:22:21: shrink_slab: spl_kmem_cache_generic_shrinker+0x0/0x30 [spl] negative objects to delete nr=-2666943616
22:25:04: INFO: task txg_quiesce:2661 blocked for more than 120 seconds.
22:25:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:25:04: txg_quiesce   D 000000000000000d     0  2661      2 0x00000000
22:25:04: ffff8804631ebd40 0000000000000046 0000000000000000 ffff8803136a5f20
22:25:04: ffff8804631ebcf0 ffffffff8116832c ffff8804631ebde0 ffffffff000080d0
22:25:04: ffff8804669f4638 ffff8804631ebfd8 000000000000fb88 ffff8804669f4638
22:25:04: Call Trace:
22:25:04: [<ffffffff8116832c>] ? __kmalloc+0x20c/0x220
22:25:04: [<ffffffffa01e65d5>] cv_wait_common+0x105/0x1c0 [spl]
22:25:04: [<ffffffff81096cc0>] ? autoremove_wake_function+0x0/0x40
22:25:04: [<ffffffff8128586c>] ? __bitmap_weight+0x8c/0xb0
22:25:04: [<ffffffffa01e66e5>] __cv_wait+0x15/0x20 [spl]
22:25:04: [<ffffffffa030dd03>] txg_quiesce_thread+0x243/0x3a0 [zfs]
22:25:04: [<ffffffff810560a9>] ? set_user_nice+0xc9/0x130
22:25:04: [<ffffffffa030dac0>] ? txg_quiesce_thread+0x0/0x3a0 [zfs]
22:25:04: [<ffffffffa01de4c8>] thread_generic_wrapper+0x68/0x80 [spl]
22:25:04: [<ffffffffa01de460>] ? thread_generic_wrapper+0x0/0x80 [spl]
22:25:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:25:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:25:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:25:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_iss/6:2570 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_iss/6    D 0000000000000006     0  2570      2 0x00000000
22:29:04: ffff880469c29a80 0000000000000046 0000000000000000 0000003000000212
22:29:04: ffff880469c36aa0 ffff880477c68080 0000000077c380a8 ffff880469c29a48
22:29:04: ffff880469c37058 ffff880469c29fd8 000000000000fb88 ffff880469c37058
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffffa01df9b4>] ? taskq_init_ent+0x34/0x80 [spl]
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318cd0>] vdev_queue_io+0x80/0x130 [zfs]
22:29:04: [<ffffffffa0352d29>] zio_vdev_io_start+0x1c9/0x2e0 [zfs]
22:29:04: [<ffffffffa0353703>] zio_nowait+0xb3/0x150 [zfs]
22:29:04: [<ffffffffa031b8d8>] vdev_raidz_io_start+0x5d8/0x6b0 [zfs]
22:29:04: [<ffffffffa0319260>] ? vdev_raidz_child_done+0x0/0x30 [zfs]
22:29:04: [<ffffffffa0352c07>] zio_vdev_io_start+0xa7/0x2e0 [zfs]
22:29:04: [<ffffffffa0353703>] zio_nowait+0xb3/0x150 [zfs]
22:29:04: [<ffffffffa031800e>] vdev_mirror_io_start+0x17e/0x3c0 [zfs]
22:29:04: [<ffffffffa03179c0>] ? vdev_mirror_child_done+0x0/0x30 [zfs]
22:29:04: [<ffffffffa0352d7f>] zio_vdev_io_start+0x21f/0x2e0 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/0:2593 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/0    D 0000000000000000     0  2593      2 0x00000000
22:29:04: ffff88046695bce0 0000000000000046 00000068088b0800 ffffc9001e89d000
22:29:04: 0001f2000000000f ffff880251431180 000000106695bc80 ffff880876dd3780
22:29:04: ffff880466959ab8 ffff88046695bfd8 000000000000fb88 ffff880466959ab8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/1:2594 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/1    D 0000000000000001     0  2594      2 0x00000000
22:29:04: ffff88046695fce0 0000000000000046 0000000000000000 ffff88046695fca4
22:29:04: 0000000000000000 ffff88047fc24500 ffff880028216700 000000000000000f
22:29:04: ffff880466959058 ffff88046695ffd8 000000000000fb88 ffff880466959058
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/2:2595 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/2    D 0000000000000002     0  2595      2 0x00000000
22:29:04: ffff880466961ce0 0000000000000046 0000000000000000 ffff880466961ca4
22:29:04: 0000000000000000 ffff88047fc24700 ffff880028216700 000000000000000f
22:29:04: ffff8804669585f8 ffff880466961fd8 000000000000fb88 ffff8804669585f8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/3:2596 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/3    D 0000000000000003     0  2596      2 0x00000000
22:29:04: ffff880466965ce0 0000000000000046 0000000000000000 ffff880466965ca4
22:29:04: 0000000000000000 ffff88047fc24900 ffff880028216700 000000000000000f
22:29:04: ffff880466963af8 ffff880466965fd8 000000000000fb88 ffff880466963af8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/4:2597 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/4    D 0000000000000004     0  2597      2 0x00000000
22:29:04: ffff880466969ce0 0000000000000046 0000000000000000 ffff880466969ca4
22:29:04: 0000000000000000 ffff88047fc24b00 ffff880028216700 000000000000000f
22:29:04: ffff880466963098 ffff880466969fd8 000000000000fb88 ffff880466963098
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/6:2599 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/6    D 0000000000000006     0  2599      2 0x00000000
22:29:04: ffff880466971ce0 0000000000000046 0000000000000000 ffff880466971ca4
22:29:04: 0000000000000000 ffff88047fc24f00 ffff880028216700 000000000000000f
22:29:04: ffff88046696fab8 ffff880466971fd8 000000000000fb88 ffff88046696fab8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/8:2601 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/8    D 0000000000000008     0  2601      2 0x00000000
22:29:04: ffff880466977ce0 0000000000000046 0000000000000000 ffff880466977ca4
22:29:04: 0000000000000000 ffff88047fc25300 ffff880028216700 000000000000000f
22:29:04: ffff88046696e5f8 ffff880466977fd8 000000000000fb88 ffff88046696e5f8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
22:29:04: INFO: task z_wr_int/9:2602 blocked for more than 120 seconds.
22:29:04: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
22:29:04: z_wr_int/9    D 0000000000000009     0  2602      2 0x00000000
22:29:04: ffff88046697bce0 0000000000000046 0000000000000000 0000000000016700
22:29:04: 0000000000016700 ffff880466979540 ffff880879c88ae0 ffffffff8160b780
22:29:04: ffff880466979af8 ffff88046697bfd8 000000000000fb88 ffff880466979af8
22:29:04: Call Trace:
22:29:04: [<ffffffff8150f3de>] __mutex_lock_slowpath+0x13e/0x180
22:29:04: [<ffffffff8150f27b>] mutex_lock+0x2b/0x50
22:29:04: [<ffffffffa0318bb0>] vdev_queue_io_done+0x30/0xd0 [zfs]
22:29:04: [<ffffffffa0351928>] zio_vdev_io_done+0x88/0x190 [zfs]
22:29:04: [<ffffffffa03527d3>] zio_execute+0xb3/0x130 [zfs]
22:29:04: [<ffffffffa01df6e8>] taskq_thread+0x218/0x4b0 [spl]
22:29:04: [<ffffffff8150dd80>] ? thread_return+0x4e/0x76e
22:29:04: [<ffffffff81063330>] ? default_wake_function+0x0/0x20
22:29:04: [<ffffffffa01df4d0>] ? taskq_thread+0x0/0x4b0 [spl]
22:29:04: [<ffffffff81096956>] kthread+0x96/0xa0
22:29:04: [<ffffffff8100c0ca>] child_rip+0xa/0x20
22:29:04: [<ffffffff810968c0>] ? kthread+0x0/0xa0
22:29:04: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
eborisch commented 11 years ago

FWIW: Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 23:51:20 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux (CentOS 6.4)

Installed Packages
spl.x86_64                    0.6.1-1.el6           @zfs                        
spl-dkms.noarch               0.6.1-2.el6           @zfs                        
zfs.x86_64                    0.6.1-1.el6           @zfs                        
zfs-dkms.noarch               0.6.1-2.el6           @zfs                        
zfs-dracut.x86_64             0.6.1-1.el6           @zfs                        
zfs-release.noarch            1-2.el6               @/zfs-release-1-2.el6.noarch
zfs-test.x86_64               0.6.1-1.el6           @zfs                        
byteharmony commented 10 years ago

Sorry for disappearing, had some business I needed to attend to.

Here are a few answers and information on our latest experiance. Yes, in releases from 6 months to a year ago we were virtually disabling ARC ram for 2 reasons, first there is a concern about Linux already doing buffering of this data in the kernel, why buffer twice. Don't know if that's been addressed but it was a concern. Second we needed the ram to run Virtual Machines on the servers running ZFS.

In current stable larger servers we're still running RC14 and we've increased ARC from 1 GB to 10 GB on servers running 64GB of ram. It's been good.

Next tonight I upgraded to: zfs-0.6.2-1.el6.x86_64

and kernel: Linux nas200.domain.local 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

latest for centos 4.6 stable

It was thought there were some DKMS issues with 0.6.1, perhaps that was causing issues, I'm not sure, but I didn't have time to look at it. Tonight I wanted to get back into stable release since Brian had spoiled me so much with a master build that gave me hidden snapshots and information stored in the zvols I really didn't NEED much else it was waiting.

I found the old kernel argument I was using to reduce scrub operations was removed and caused errors on DKMS build, hence any upgrade with the paramenter: zfs_scrub_limit in the /etc/modprodbe.d/zfs.conf would no longer have a working zfs after upgrade and reboot, new kernel :(

To fix this just remove the parameter and do a zfs list command. Poof fixed. Future reboots have no errors.

Although posts suggest this parameter did nothing, it leaves me with the problem of high load during scrub. In solaris machines we control this using advice from a very prominent paid ZFS provider (not sure I should mention them).

Resilver priority:

Shows the current value (Default is 2)

echo zfs_resilver_delay/D | mdb -k

Changes the value to 1

echo zfs_resilver_delay/W1 | mdb -k

Shows the current value (Default is 2

echo zfs_resilver_min_time_ms/D | mdb -k

Changes the value to 3000

echo zfs_resilver_min_time_ms/W0t4000 | mdb -kw

Change the value to 4000

I'm not sure where I would set these parameters in ZOL?

Regarding the kernel error. I've seen that same error when machine loads are too high. An average load over 5 is asking for trouble. Basically the answer I've gotten in the past on the 120 second time out is that the disks are just so busy that the requests time out. Possible solutions:

  1. Consider splitting disks into separate zpools when multiple requests are made on the same spindles at the same time. Seek on the disks takes forever and kills performance.
  2. Get faster disks, maybe SSD (this worked for us, I'll warn you, LSI controllers, dell specifically in our experience, are unstable with SSD) Onboard have been fine. We just rolled out 6 new LSI controller supermicro machines, but they are all running spindles (4k spindles :). So far solid as a rock. Will get chip set if needed.
  3. Use more smaller machines rather than one monster. This is along the same lines as #1. Splitting the load will also help you isolate what is hitting the machine causing the performance issue.

We still do have some machines that push over 20 on load, but only during backups. The worst problem I may have just licked. A big exchange server that was having windows disk queue lengths up to 10 on the OS, caused the exchange server to go off line during a backup even though it was still running, it was just CRAZY slow. Moving each zvol that was getting pounded to a separate set of spindles so far has resolved the issue.

Possible performance tuning I didn't do that could have resolved the issue, set zvol from latency to through put? or perhaps vice versa.

Hoping for smooth sailing with 0.6.2, thanks again ZOL team! Brian

ryao commented 10 years ago

@eborisch Your issue looks like a different bug. It might have been fixed by this:

https://github.com/zfsonlinux/zfs/commit/34e143323e359b42bc9d06dd19cc4b1f13091283

ryao commented 10 years ago

@byteharmony Double buffering only occurs when you use mmap(). It is not a problem for any other file-based activity. It might also happen on zvols, although I have not made time to verify that is the case.

As for zfs_resilver_delay and zfs_resilver_min_time_ms, they are module parameters that can be set via a file in modprobe.d. Alternatively, you can set them at runtime by echoing new values into /sys/module/zfs/parameters/{zfs_resilver_delay,zfs_resilver_min_time_ms}.

jstillwa commented 10 years ago

Still seeing this issue regularly with heavy IO.

ryao commented 10 years ago

This should be fixed by the following commits:

https://github.com/zfsonlinux/zfs/commit/8ac67298b175f98de07e040456d0fe7b1841a5eb https://github.com/zfsonlinux/zfs/commit/6f9548c487dbcf958f2f226c5f1eac2b85f8f78e

FransUrbo commented 10 years ago

@jstillwa Is this still a problem with latest GIT? Both the commits mentioned by @ryao is there...

jstillwa commented 10 years ago

We will be able to check it out after the next release, which I think is very soon. Our current solution doesn't deadlock, but we've found that's because we have enough horsepower where we can't really overload the disks (We're running 90+ drives a server), which is when we'd see the deadlocks pop up. I'll put it though its paces though and see if anything odd happens.

On Tue, Jun 10, 2014 at 9:54 AM, Turbo Fredriksson <notifications@github.com

wrote:

@jstillwa https://github.com/jstillwa Is this still a problem with latest GIT? Both the commits mentioned by @ryao https://github.com/ryao is there...

— Reply to this email directly or view it on GitHub https://github.com/zfsonlinux/zfs/issues/1365#issuecomment-45640849.

behlendorf commented 9 years ago

@jstillwa did the latest release resolve resolve this issue for you?

jstillwa commented 9 years ago

We are still seeing this error. It's not killing the server, but we do get kernel messages that say - "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.. soft lockup - CPU#10 stuck for 67s! [kswapd1:260] during heavy load.

behlendorf commented 8 years ago

This is believed to be resolved in master.