openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.44k stars 1.73k forks source link

Deadlock in zpool, txg_sync and udev/vol_id when importing the pool #1862

Closed FransUrbo closed 10 years ago

FransUrbo commented 10 years ago

This is a fork of the #1848 issue.

When importing the pool without the 'zvol_inhibit_dev' parameter, I get (after a few minutes) an oops from first zpool, txg_sync and then vol_id (from udev, NOT the zvol_id) and all vol_id processes are tagged Dead (177 of them, although I only have 64 VDEVs). And the import is hung and I can no longer issue any zfs command...

[ 1935.855833] SPL: Loaded module v0.6.2-1.20131110_crypto
[ 1935.856315] zunicode: module license 'CDDL' taints kernel.
[ 1935.856318] Disabling lock debugging due to kernel taint
[ 1935.883882] ZFS: Loaded module v0.6.2-1.20131113_crypto, ZFS pool version 5000, ZFS filesystem version 5
[ 1972.592902] SPL: using hostid 0xa8c00845
[ 1974.925020]  zd0: unknown partition table
[ 1975.171192]  zd16: p1 p2 p3
[ 1975.380597]  zd32: p1 p2 p3
[ 1975.425234]  zd48: unknown partition table
[ 1975.469974]  zd64: unknown partition table
[ 1975.536150]  zd80: unknown partition table
[ 1975.811834]  zd96: p1
[ 1975.811834]  p1: <solaris: [s0] p5 [s2] p6 [s8] p7 >
[ 1976.009411]  zd112: unknown partition table
[ 1976.304591]  zd128: p1
[ 1976.619496]  zd144: p1
[ 1976.619496]  p1: <solaris: [s0] p5 [s2] p6 [s8] p7 >
[ 1976.909568]  zd160: p1 p2 p3
[ 1977.038737]  zd176: unknown partition table
[ 1977.345266]  zd192: p1 p2 < p5 >
[ 1977.624824]  zd208: p1 p2 < p5 >
[ 1977.828811]  zd224: p1
[ 1978.041224]  zd240: p1
[ 1978.248323]  zd256: p1
[ 1978.364182]  zd272: unknown partition table
[ 1978.618128]  zd288: p1
[ 1978.854242]  zd304: p1
[ 1979.071706]  zd320: p1 p2 p3
[ 1979.257299]  zd336: p1 p2 p3
[ 1979.440590]  zd352: p1
[ 1979.589259]  zd368: p1
[ 1979.809191]  zd384: p1 p2 p3
[ 1980.091709]  zd400: p1 p2 < p5 >
[ 1980.396174]  zd416: p1 p2 < p5 >
[ 1980.708611]  zd432: p1 p2 p3
[ 1980.896684]  zd448: p1 p2 < p5 >
[ 1981.201037]  zd464: p1 p2 < p5 >
[ 1981.381464]  zd480: p1
[ 1981.584330]  zd496: unknown partition table
[ 1981.837717]  zd512: p1
[ 1982.139536]  zd528: p1 p2 < p5 >
[ 1982.340551]  zd544: p1
[ 1982.432025]  zd560: p1
[ 1982.658010]  zd576: p1 p2
[ 1982.750527]  zd592: p1
[ 1982.835009]  zd608: p1 p2
[ 1983.062772]  zd624: p1
[ 1983.379149]  zd640: p1 p2 < p5 >
[ 1983.671411]  zd656: p1 p2 < p5 >
[ 1983.953109]  zd672: p1 p2 p3
[ 1984.134006]  zd688: p1 p2 p3
[ 1984.338892]  zd704: unknown partition table
[ 1984.525932]  zd720: p1
[ 1984.741452]  zd736: p1 p2 p3
[ 1985.062658]  zd752: p1 p2 < p5 >
[ 1985.275333]  zd768: p1 p2 < p5 >
[ 1985.404177]  zd784: unknown partition table
[ 1985.550725]  zd800: unknown partition table
[ 1985.634266]  zd816: unknown partition table
[ 1985.937100]  zd832: p1 p2 p3
[ 1986.033612]  zd848: unknown partition table
[ 1986.322119]  zd864: p1 p2 < p5 >
[ 1986.510430]  zd880: p1 p2
[ 1986.833831]  zd896: p1
[ 1987.015856]  zd912: p1
[ 1987.232678]  zd928: p1 p2
[ 1987.451998]  zd944: unknown partition table
[ 1987.563576]  zd960: unknown partition table
[ 1987.816374]  zd976: p1 p2 p3
[ 1988.165110]  zd992: p1 p2 p3 p4
[ 1988.165110]  p1: <bsd: >
[ 1988.535175]  zd1008: p1 p2 p3 p4
[ 1988.535175]  p1: <bsd: >
[ 2159.714988] INFO: task zpool:23852 blocked for more than 120 seconds.
[ 2159.715077]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.715157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.715269] zpool           D ffff88041fcd4ac0     0 23852      1 0x20020000
[ 2159.715273]  ffff88040e029000 0000000000200086 0000000000000007 0000000000000008
[ 2159.715277]  ffff8803ef1ca010 0000000000014ac0 0000000000014ac0 ffff88040e029000
[ 2159.715279]  0000000000014ac0 ffff8803ef1cbfd8 ffff8803ef1cbfd8 0000000000014ac0
[ 2159.715282] Call Trace:
[ 2159.715288]  [<ffffffff8111ee39>] ? __kmalloc+0x138/0x1b5
[ 2159.715294]  [<ffffffffa0767034>] ? nvs_native_nvpair+0x1f/0x7a [znvpair]
[ 2159.715298]  [<ffffffffa0766a1b>] ? nvs_operation+0x7e/0x280 [znvpair]
[ 2159.715302]  [<ffffffffa0767009>] ? nvs_native_nvp_op+0x1e0/0x1ec [znvpair]
[ 2159.715305]  [<ffffffff811fdad7>] ? clear_page_c+0x7/0x10
[ 2159.715309]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.715311]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.715314]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.715316]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.715337]  [<ffffffffa07d404f>] ? spa_open_common+0x51/0x2f4 [zfs]
[ 2159.715339]  [<ffffffff8111ee39>] ? __kmalloc+0x138/0x1b5
[ 2159.715344]  [<ffffffffa06e65b6>] ? kmem_alloc_debug+0x11d/0x2c9 [spl]
[ 2159.715362]  [<ffffffffa07d5d3a>] ? spa_get_stats+0x35/0x228 [zfs]
[ 2159.715365]  [<ffffffff810ef59f>] ? release_pages+0x197/0x1a6
[ 2159.715380]  [<ffffffffa07ffe4d>] ? zfs_ioc_pool_stats+0x21/0x57 [zfs]
[ 2159.715395]  [<ffffffffa0802efa>] ? zfsdev_ioctl+0x335/0x3c0 [zfs]
[ 2159.715398]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.715413]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.715416]  [<ffffffff81104fd7>] ? handle_mm_fault+0x9c3/0x9f8
[ 2159.715418]  [<ffffffff8110fe8f>] ? unlink_anon_vmas+0xee/0x17b
[ 2159.715421]  [<ffffffff81045eed>] ? do_page_fault+0x3ed/0x439
[ 2159.715424]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
[ 2159.715426]  [<ffffffff8110a038>] ? remove_vma+0x44/0x4b
[ 2159.715428]  [<ffffffff8110b590>] ? do_munmap+0x2dd/0x2f9
[ 2159.715430]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.715443] INFO: task zpool:24089 blocked for more than 120 seconds.
[ 2159.715525]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.715605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.715716] zpool           D ffff88041fd14ac0     0 24089  24068 0x20020000
[ 2159.715718]  ffff88040da28000 0000000000200086 ffff88041fd14ac0 ffff88041d8f6010
[ 2159.715721]  ffff8803e7c60010 0000000000014ac0 0000000000014ac0 ffff88040da28000
[ 2159.715723]  0000000000014ac0 ffff8803e7c61fd8 ffff8803e7c61fd8 0000000000014ac0
[ 2159.715725] Call Trace:
[ 2159.715728]  [<ffffffffa0767034>] ? nvs_native_nvpair+0x1f/0x7a [znvpair]
[ 2159.715731]  [<ffffffffa07665fc>] ? nvs_native_nvlist+0x53/0x7d [znvpair]
[ 2159.715734]  [<ffffffffa0766a1b>] ? nvs_operation+0x7e/0x280 [znvpair]
[ 2159.715738]  [<ffffffffa06e4bc3>] ? kmem_free_debug+0xc5/0x10d [spl]
[ 2159.715740]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[ 2159.715744]  [<ffffffffa06ee2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[ 2159.715759]  [<ffffffffa07dbcc3>] ? spa_lookup+0x4b/0x50 [zfs]
[ 2159.715762]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[ 2159.715776]  [<ffffffffa07ca570>] ? rrw_enter_read+0x40/0x130 [zfs]
[ 2159.715791]  [<ffffffffa07bd572>] ? dsl_pool_hold+0x43/0x4c [zfs]
[ 2159.715792]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.715805]  [<ffffffffa07a49c9>] ? dmu_objset_hold+0x1d/0x78 [zfs]
[ 2159.715809]  [<ffffffffa06e65b6>] ? kmem_alloc_debug+0x11d/0x2c9 [spl]
[ 2159.715822]  [<ffffffffa07ff64d>] ? zfs_ioc_dataset_list_next+0x48/0x140 [zfs]
[ 2159.715835]  [<ffffffffa0802efa>] ? zfsdev_ioctl+0x335/0x3c0 [zfs]
[ 2159.715838]  [<ffffffff810e4b89>] ? find_get_page+0x18/0x6c
[ 2159.715840]  [<ffffffff810e63d2>] ? generic_file_aio_read+0x557/0x5f6
[ 2159.715842]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
[ 2159.715845]  [<ffffffff8112967e>] ? vfs_read+0xa8/0x14c
[ 2159.715847]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.715849] INFO: task txg_sync:24095 blocked for more than 120 seconds.
[ 2159.715932]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.716012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.716123] txg_sync        D ffff88041fcd4ac0     0 24095      2 0x00000000
[ 2159.716125]  ffff8803ed34f000 0000000000000046 ffff8803e6aae010 0000000000014ac0
[ 2159.716127]  ffff8803e6aae010 0000000000014ac0 0000000000014ac0 ffff8803ed34f000
[ 2159.716129]  0000000000014ac0 ffff8803e6aaffd8 ffff8803e6aaffd8 0000000000014ac0
[ 2159.716132] Call Trace:
[ 2159.716134]  [<ffffffff810185f9>] ? sched_clock+0x13/0x16
[ 2159.716136]  [<ffffffff810767b1>] ? update_rq_clock+0x1e/0xec
[ 2159.716139]  [<ffffffff810711fa>] ? check_preempt_curr+0x36/0x62
[ 2159.716141]  [<ffffffff81071302>] ? ttwu_do_wakeup+0xf/0xab
[ 2159.716142]  [<ffffffff8101865f>] ? read_tsc+0x18/0x2e
[ 2159.716144]  [<ffffffff81077cfd>] ? try_to_wake_up+0x22d/0x23f
[ 2159.716147]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[ 2159.716150]  [<ffffffffa06ee2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[ 2159.716153]  [<ffffffffa06ee30e>] ? cv_wait_common+0x110/0x164 [spl]
[ 2159.716155]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[ 2159.716170]  [<ffffffffa07ca4ed>] ? rrw_enter_write+0x29/0x6c [zfs]
[ 2159.716185]  [<ffffffffa07c4905>] ? dsl_sync_task_sync+0x7e/0xca [zfs]
[ 2159.716199]  [<ffffffffa07be2de>] ? dsl_pool_sync+0x37c/0x4d9 [zfs]
[ 2159.716201]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[ 2159.716216]  [<ffffffffa07ce7a0>] ? spa_sync+0x506/0x941 [zfs]
[ 2159.716218]  [<ffffffff8106acd9>] ? autoremove_wake_function+0x9/0x2e
[ 2159.716221]  [<ffffffff8107370c>] ? __wake_up+0x30/0x44
[ 2159.716235]  [<ffffffffa07df0b6>] ? txg_sync_thread+0x334/0x54b [zfs]
[ 2159.716239]  [<ffffffffa06e4bc3>] ? kmem_free_debug+0xc5/0x10d [spl]
[ 2159.716253]  [<ffffffffa07ded82>] ? txg_thread_exit+0x2b/0x2b [zfs]
[ 2159.716257]  [<ffffffffa06e76dc>] ? __thread_create+0x2e4/0x2e4 [spl]
[ 2159.716260]  [<ffffffffa06e7746>] ? thread_generic_wrapper+0x6a/0x73 [spl]
[ 2159.716262]  [<ffffffff8106a781>] ? kthread+0xb0/0xba
[ 2159.716264]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
[ 2159.716266]  [<ffffffff813c590c>] ? ret_from_fork+0x7c/0xb0
[ 2159.716268]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
[ 2159.716270] INFO: task vol_id:24103 blocked for more than 120 seconds.
[ 2159.716352]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.716432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.716543] vol_id          D ffff88041fc94ac0     0 24103      1 0x20020000
[ 2159.716545]  ffff8803ef277000 0000000000000086 ffffffff8107c64f ffffffff810eb3ea
[ 2159.716547]  ffff8803e66be010 0000000000014ac0 0000000000014ac0 ffff8803ef277000
[ 2159.716549]  0000000000014ac0 ffff8803e66bffd8 ffff8803e66bffd8 0000000000014ac0
[ 2159.716551] Call Trace:
[ 2159.716554]  [<ffffffff8107c64f>] ? dequeue_task_fair+0x4a3/0xa4d
[ 2159.716556]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[ 2159.716558]  [<ffffffff813c3d63>] ? __schedule+0x754/0x7ea
[ 2159.716560]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.716562]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.716564]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[ 2159.716567]  [<ffffffffa06ee2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[ 2159.716582]  [<ffffffffa07dbcc3>] ? spa_lookup+0x4b/0x50 [zfs]
[ 2159.716584]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[ 2159.716599]  [<ffffffffa07ca570>] ? rrw_enter_read+0x40/0x130 [zfs]
[ 2159.716613]  [<ffffffffa07bd572>] ? dsl_pool_hold+0x43/0x4c [zfs]
[ 2159.716625]  [<ffffffffa07a48dd>] ? dmu_objset_own+0x2e/0xfd [zfs]
[ 2159.716638]  [<ffffffffa0826d67>] ? zvol_open+0xaf/0x276 [zfs]
[ 2159.716641]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.716643]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.716645]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.716647]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.716649]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.716651]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.716653]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.716655]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.716657]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.716659]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.716661]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.716663]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.716665]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.716667]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.716669]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.716671]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.716673]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.716675]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.716677]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.716679] INFO: task vol_id:24112 blocked for more than 120 seconds.
[ 2159.716761]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.716840] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.716952] vol_id          D ffff88041fd54ac0     0 24112      1 0x20020000
[ 2159.716953]  ffff8803e65c0000 0000000000000086 0000000000000029 ffff88040affe080
[ 2159.716955]  ffff8803e7f9e010 0000000000014ac0 0000000000014ac0 ffff8803e65c0000
[ 2159.716958]  0000000000014ac0 ffff8803e7f9ffd8 ffff8803e7f9ffd8 0000000000014ac0
[ 2159.716960] Call Trace:
[ 2159.716962]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.716964]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.716966]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[ 2159.716968]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.716970]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.716972]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.716975]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.716977]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.716989]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.716991]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.716993]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.716995]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.716997]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.716999]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717001]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.717003]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717005]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.717006]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.717008]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.717010]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.717012]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.717014]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.717016]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.717018]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.717020]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717022]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.717023]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.717025]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.717027] INFO: task vol_id:24120 blocked for more than 120 seconds.
[ 2159.717109]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.717189] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.717300] vol_id          D ffff88041fd54ac0     0 24120      1 0x20020000
[ 2159.717302]  ffff8803e6753000 0000000000000086 ffff8803e6776000 000088040ab18780
[ 2159.717304]  ffff8803e6776010 0000000000014ac0 0000000000014ac0 ffff8803e6753000
[ 2159.717306]  0000000000014ac0 ffff8803e6777fd8 ffff8803e6777fd8 0000000000014ac0
[ 2159.717308] Call Trace:
[ 2159.717310]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[ 2159.717312]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.717315]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.717317]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.717319]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.717320]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.717332]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.717335]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.717336]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.717339]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.717340]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.717342]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717344]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.717346]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717348]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.717350]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.717352]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.717354]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.717356]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.717358]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.717360]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.717361]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.717363]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717365]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.717367]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.717369]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.717371] INFO: task vol_id:24128 blocked for more than 120 seconds.
[ 2159.717453]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.717533] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.717644] vol_id          D ffff88041fc14ac0     0 24128      1 0x20020000
[ 2159.717645]  ffff880409ec6000 0000000000000086 ffff8803e6756000 000088040ccb17c0
[ 2159.717648]  ffff8803e6756010 0000000000014ac0 0000000000014ac0 ffff880409ec6000
[ 2159.717650]  0000000000014ac0 ffff8803e6757fd8 ffff8803e6757fd8 0000000000014ac0
[ 2159.717652] Call Trace:
[ 2159.717654]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[ 2159.717656]  [<ffffffff81104fd7>] ? handle_mm_fault+0x9c3/0x9f8
[ 2159.717658]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.717660]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.717662]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.717664]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.717676]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.717678]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.717680]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.717682]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.717684]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.717686]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717688]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.717690]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717692]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.717694]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.717695]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.717697]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.717699]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.717701]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.717703]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.717705]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.717707]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.717709]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.717711]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.717713]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.717714] INFO: task vol_id:24136 blocked for more than 120 seconds.
[ 2159.717797]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.717876] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.717987] vol_id          D ffff88041fc14ac0     0 24136      1 0x20020000
[ 2159.717989]  ffff8803e676f000 0000000000000086 0000000000000029 ffff88040bd15e80
[ 2159.717991]  ffff8803e605c010 0000000000014ac0 0000000000014ac0 ffff8803e676f000
[ 2159.717993]  0000000000014ac0 ffff8803e605dfd8 ffff8803e605dfd8 0000000000014ac0
[ 2159.717995] Call Trace:
[ 2159.717997]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.717999]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.718001]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[ 2159.718003]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.718006]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.718008]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.718010]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.718012]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.718024]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.718026]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.718028]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.718030]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.718031]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.718033]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718035]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.718037]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718039]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.718041]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.718043]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.718045]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.718047]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.718049]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.718050]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.718052]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.718054]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718056]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.718058]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.718060]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.718062] INFO: task vol_id:24144 blocked for more than 120 seconds.
[ 2159.718144]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.718223] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.718334] vol_id          D ffff88041fcd4ac0     0 24144      1 0x20020000
[ 2159.718336]  ffff8803e605f000 0000000000000086 0000000000000029 ffff88040cd0aac0
[ 2159.718338]  ffff8803e661e010 0000000000014ac0 0000000000014ac0 ffff8803e605f000
[ 2159.718340]  0000000000014ac0 ffff8803e661ffd8 ffff8803e661ffd8 0000000000014ac0
[ 2159.718343] Call Trace:
[ 2159.718345]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.718347]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.718349]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[ 2159.718351]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.718353]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.718355]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.718357]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.718359]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.718371]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.718373]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.718375]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.718377]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.718379]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.718381]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718383]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.718385]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718387]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.718388]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.718390]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.718392]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.718394]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.718396]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.718398]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.718400]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.718402]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718404]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.718405]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.718407]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[ 2159.718409] INFO: task vol_id:24152 blocked for more than 120 seconds.
[ 2159.718492]       Tainted: P           O 3.12.0+scst+tf.1 #5
[ 2159.718572] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2159.718683] vol_id          D ffff88041fc14ac0     0 24152      1 0x20020000
[ 2159.718685]  ffff8803e6750000 0000000000000086 0000000000000029 ffff88040ccb16c0
[ 2159.718687]  ffff8803e61c8010 0000000000014ac0 0000000000014ac0 ffff8803e6750000
[ 2159.718689]  0000000000014ac0 ffff8803e61c9fd8 ffff8803e61c9fd8 0000000000014ac0
[ 2159.718691] Call Trace:
[ 2159.718693]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[ 2159.718695]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[ 2159.718697]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[ 2159.718699]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[ 2159.718701]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[ 2159.718703]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[ 2159.718705]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[ 2159.718707]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[ 2159.718719]  [<ffffffffa0826d05>] ? zvol_open+0x4d/0x276 [zfs]
[ 2159.718721]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[ 2159.718723]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[ 2159.718725]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[ 2159.718727]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[ 2159.718729]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718731]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[ 2159.718733]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718735]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[ 2159.718737]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[ 2159.718738]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[ 2159.718740]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[ 2159.718742]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[ 2159.718744]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[ 2159.718746]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[ 2159.718748]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[ 2159.718750]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[ 2159.718752]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[ 2159.718754]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[ 2159.718756]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13

Yes, this is zfs-crypto, but it is now completely in line with ZoL/master so I'm a little unsure if it have something to with the crypto code or ZoL...

Since I only have one filesystem that is currently encrypted, I'm prepared to 'downgrade' (i.e. remove the crypto feature) to do some testing with ZoL/master if needed.

behlendorf commented 10 years ago

This looks like some sort of deadlock between pool import and zvol device creation. The fact that suppressing the zvol creation with zvol_inhibit_dev would support that. Also just to be clear this isn't an OOPS, it's just a deadlock which triggered the slow task watchdog. Still it's absolutely not good.

FransUrbo commented 10 years ago

@behlendorf Any idea how I should proceed [with finding out what/where the deadlock might be]?

Setting zvol_inhibit_dev to 0 and creating a new filesystem works fine. The ZVOL device is where it's supposed to be in /dev. However, accessing it doesn't seems possible.

I ran fdisk on it after creating the ZVOL, and that hung. As well as 'txg_sync'. I got that not-an-oops again. And all writes to the/a filesystem froze...

FransUrbo commented 10 years ago

Doing some more debugging, after cherry-picking some stuff that looks fine, I have:

 4287 ?        S<s    0:01 udevd --daemon
10882 ?        S<     0:00  \_ udevd --daemon
10883 ?        D<     0:01  |   \_ /sbin/zpool list

Doing an strace on these three process I see:

[pid  4287] stat64("/dev/.udev/failed", {st_mode=S_IFDIR|0755, st_size=2420, ...}) = 0
[pid  4287] rename("/dev/.udev/queue/4319", "/dev/.udev/failed/\\x2fblock\\x2fzd1024") = 0
[pid  4287] rmdir("/dev/.udev/queue")   = -1 ENOTEMPTY (Directory not empty)
[pid  4287] waitpid(-1, 0xffd91a70, WNOHANG) = 0
[pid  4287] select(7, [3 4 5 6], NULL, NULL, NULL

It then hangs on the select.

Looking in /udev/.udev/:

celia:~# ll /dev/.udev/
total 4
drwxr-xr-x    2 root root 18160 Nov 18 08:39 db/
drwxr-xr-x    2 root root  4700 Nov 18 08:46 failed/
drwxr-xr-x 1944 root root 38880 Nov 18 08:39 names/
drwxr-xr-x    2 root root    60 Nov 18 08:37 rules.d/
-rw-r--r--    1 root root     5 Nov 18 08:40 uevent_seqnum
celia:~# ll /dev/.udev/failed/ | head
total 0
lrwxrwxrwx 1 root root 10 Nov 18 08:39 \x2fblock\x2fzd0 -> /block/zd0
lrwxrwxrwx 1 root root 13 Nov 18 08:40 \x2fblock\x2fzd1008 -> /block/zd1008
lrwxrwxrwx 1 root root 22 Nov 18 08:40 \x2fblock\x2fzd1008\x2fzd1008p1 -> /block/zd1008/zd1008p1
lrwxrwxrwx 1 root root 22 Nov 18 08:40 \x2fblock\x2fzd1008\x2fzd1008p2 -> /block/zd1008/zd1008p2
lrwxrwxrwx 1 root root 22 Nov 18 08:40 \x2fblock\x2fzd1008\x2fzd1008p3 -> /block/zd1008/zd1008p3
lrwxrwxrwx 1 root root 22 Nov 18 08:40 \x2fblock\x2fzd1008\x2fzd1008p4 -> /block/zd1008/zd1008p4
lrwxrwxrwx 1 root root 13 Nov 18 08:40 \x2fblock\x2fzd1024 -> /block/zd1024
lrwxrwxrwx 1 root root 12 Nov 18 08:39 \x2fblock\x2fzd112 -> /block/zd112
lrwxrwxrwx 1 root root 12 Nov 18 08:39 \x2fblock\x2fzd128 -> /block/zd128
celia:~# ll /dev/.udev/failed/|tail
lrwxrwxrwx 1 root root 32 Nov 18 08:37 \x2fdevices\x2fpci0000:00\x2f0000:00:18.1 -> /devices/pci0000:00/0000:00:18.1
lrwxrwxrwx 1 root root 32 Nov 18 08:37 \x2fdevices\x2fpci0000:00\x2f0000:00:18.2 -> /devices/pci0000:00/0000:00:18.2
lrwxrwxrwx 1 root root 32 Nov 18 08:37 \x2fdevices\x2fpci0000:00\x2f0000:00:18.4 -> /devices/pci0000:00/0000:00:18.4
lrwxrwxrwx 1 root root 34 Nov 18 08:37 \x2fdevices\x2fplatform\x2fSP5100 TCO timer -> /devices/platform/SP5100 TCO timer
lrwxrwxrwx 1 root root 40 Nov 18 08:37 \x2fdevices\x2fplatform\x2fplatform-framebuffer.0 -> /devices/platform/platform-framebuffer.0
lrwxrwxrwx 1 root root 25 Nov 18 08:39 \x2fdevices\x2fvirtual\x2fmisc\x2fzfs -> /devices/virtual/misc/zfs
lrwxrwxrwx 1 root root 57 Nov 18 08:37 \x2fdevices\x2fvirtual\x2fwmi\x2f05901221-D566-11D1-B2F0-00A0C9062910 -> /devices/virtual/wmi/05901221-D566-11D1-B2F0-00A0C9062910
lrwxrwxrwx 1 root root 57 Nov 18 08:37 \x2fdevices\x2fvirtual\x2fwmi\x2f466747A0-70EC-11DE-8A39-0800200C9A66 -> /devices/virtual/wmi/466747A0-70EC-11DE-8A39-0800200C9A66
lrwxrwxrwx 1 root root 57 Nov 18 08:37 \x2fdevices\x2fvirtual\x2fwmi\x2f97845ED0-4E6D-11DE-8A39-0800200C9A66 -> /devices/virtual/wmi/97845ED0-4E6D-11DE-8A39-0800200C9A66
lrwxrwxrwx 1 root root 57 Nov 18 08:37 \x2fdevices\x2fvirtual\x2fwmi\x2fABBC0F72-8EA1-11D1-00A0-C90629100000 -> /devices/virtual/wmi/ABBC0F72-8EA1-11D1-00A0-C90629100000
celia:~# ll /dev/.udev/db/| head
total 3624
-rw-r--r-- 1 root root  30 Nov 18 08:37 \x2fblock\x2floop0
-rw-r--r-- 1 root root  30 Nov 18 08:37 \x2fblock\x2floop1
-rw-r--r-- 1 root root  33 Nov 18 08:37 \x2fblock\x2floop10
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop100
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop101
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop102
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop103
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop104
-rw-r--r-- 1 root root  36 Nov 18 08:37 \x2fblock\x2floop105
celia:~# ll /dev/.udev/names/| head
total 0
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:0/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:1/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:10/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:11/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:12/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:13/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:14/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:15/
drwxr-xr-x 2 root root 60 Nov 18 08:37 block\x2f1:2/
celia:~# ll /dev/.udev/names/block\\x2f1\:0/
total 0
-rw-r--r-- 1 root root 0 Nov 18 08:37 \x2fblock\x2fram0

Since the process list indicates that it's the 'zpool list' that hangs, running this under strace shows:

[....]
access("/usr/bin/net", X_OK)            = 0
access("/proc/net/iet/volume", F_OK)    = -1 ENOENT (No such file or directory)
stat64("/sys/kernel/scst_tgt", 0xfff16794) = -1 ENOENT (No such file or directory)
access("/sys/module/zfs", F_OK)         = 0
open("/dev/zfs", O_RDWR|O_LARGEFILE)    = 3
open("/etc/mtab", O_RDONLY|O_LARGEFILE) = 4
open("/etc/dfs/sharetab", O_RDONLY|O_LARGEFILE) = 5
open("/dev/zfs", O_RDWR|O_LARGEFILE)    = 6
fstat64(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf772b000
_llseek(5, 0, [0], SEEK_SET)            = 0
read(5, ""..., 4096)                    = 0
ioctl(3, ITE_GPIO_OUT_STATUS

and then it hangs at the ioctl...

FransUrbo commented 10 years ago

After a restart (so I could try again):

 4313 ?        S<s    0:01 udevd --daemon
11816 ?        S<     0:00  \_ udevd --daemon
11819 ?        D<     0:00  |   \_ /lib/udev/vol_id --export /dev/.tmp-230-17

and running that vol_id under strace, I see:

set_thread_area({entry_number:-1 -> 12, base_addr:0xf75afad0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xf76e8000, 4096, PROT_READ)   = 0
munmap(0xf76f6000, 89847)               = 0
open("/dev/.tmp-230-17", O_RDONLY|O_LARGEFILE

So maybe it's LARGEFILE support that's the problem?!

celia:/dev# df -h .
Filesystem            Size  Used Avail Use% Mounted on
udev                   10M  3.6M  6.5M  36% /dev
FransUrbo commented 10 years ago

Well, I couldn't find anything that support that, but what's (also) weird:

celia:~# ll /dev/.udev/db/*zd*
-rw-r--r-- 1 root root 29 Nov 18 10:47 /dev/.udev/db/\x2fdevices\x2fvirtual\x2ftty\x2fptyzd
-rw-r--r-- 1 root root 29 Nov 18 10:47 /dev/.udev/db/\x2fdevices\x2fvirtual\x2ftty\x2fttyzd

Shouldn't the zd* devices be there? I CAN see that udev IS reading/loading the ZoL rules...

FransUrbo commented 10 years ago

Might be something wrong with udev:

celia:~# udevinfo --export-db | grep zd
P: /devices/virtual/tty/ptyzd
N: ptyzd
P: /devices/virtual/tty/ttyzd
N: ttyzd
celia:~# 
FransUrbo commented 10 years ago

Running zvol_id under strace:

stat64("/dev/.tmp-230-1008", {st_mode=S_IFBLK|0600, st_rdev=makedev(230, 1008), ...}) = 0
open("/dev/.tmp-230-1008", O_RDONLY|O_LARGEFILE
FransUrbo commented 10 years ago

A full log of a strace of the original (mother) udev, a child and the zpool list processes can be found at http://bayour.com/misc/strace_zpool_import-full.txt.

I'm currently looking at it myself, but it's 18M and I can't see something obvious :(

I also put the strace log from a manual zpool list at http://bayour.com/misc/strace_zpool_import.txt just for completeness and there I can see that the ioctl() hangs on /dev/zfs.

FransUrbo commented 10 years ago

Any access to a ZVOL, even with 'zvol_inhibit_dev=1' hangs txg_sync. Creating one works, but any access to it hangs...

FransUrbo commented 10 years ago

@behlendorf do you have any input? Any idea how to proceed?

FransUrbo commented 10 years ago

Loading the module without zvol_inhibit_dev then restart udevd gives:

celia:~# /etc/init.d/udev start
Starting the hotplug events dispatcher: udevd.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...

After the udevadm settle timeout, the events queue contains:

5065: /devices/virtual/misc/zfs
5699: /devices/virtual/block/zd0
5701: /devices/virtual/block/zd16
5702: /devices/virtual/block/zd16/zd16p1
5703: /devices/virtual/block/zd16/zd16p2
5704: /devices/virtual/block/zd16/zd16p3
[....]
5934: /devices/virtual/block/zd1008
5935: /devices/virtual/block/zd1008/zd1008p1
5936: /devices/virtual/block/zd1008/zd1008p2
5937: /devices/virtual/block/zd1008/zd1008p3
5938: /devices/virtual/block/zd1008/zd1008p4

done (timeout).

Then trying to run 'zpool list' hangs zpool, txg_sync and vol_id as usual.

FransUrbo commented 10 years ago

Disabling the ZoL udev rules completely does not change anything. Still get the hang...

FransUrbo commented 10 years ago

The syslog after module loaded (includes output from sysrq 't' key) - http://bayour.com/misc/syslog_zpool_import.txt.

FransUrbo commented 10 years ago

Now I can't even mount all filesystems! It fails after a while.

[  191.386413] ZFS: Loaded module v0.6.2-1.20131123_crypto_iscsi_mine_cherry, ZFS pool version 5000, ZFS filesystem version 5
[  393.786148] SPL: using hostid 0xa8c00845
[  721.064113] INFO: task mount.zfs:11173 blocked for more than 120 seconds.
[  721.064202]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  721.064282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  721.064395] mount.zfs       D ffff88041fc54ac0     0 11173  11172 0x20020000
[  721.064399]  ffff8803f0f3e000 0000000000200086 ffff8803f08d7bb8 00000000000009fc
[  721.064402]  ffff8803f08d6010 0000000000014ac0 0000000000014ac0 ffff8803f0f3e000
[  721.064405]  0000000000014ac0 ffff8803f08d7fd8 ffff8803f08d7fd8 0000000000014ac0
[  721.064407] Call Trace:
[  721.064415]  [<ffffffff81110ddf>] ? __insert_vmap_area+0x75/0xc1
[  721.064420]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[  721.064427]  [<ffffffffa07bf2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[  721.064431]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[  721.064435]  [<ffffffffa07b72fc>] ? vmem_alloc_debug+0x104/0x2a1 [spl]
[  721.064456]  [<ffffffffa08afa33>] ? txg_wait_synced+0x134/0x156 [zfs]
[  721.064470]  [<ffffffffa08f56a3>] ? zpl_mount+0x21/0x21 [zfs]
[  721.064484]  [<ffffffffa08ebad9>] ? zil_replay+0x79/0xe8 [zfs]
[  721.064508]  [<ffffffffa08dc3ba>] ? zfs_sb_setup+0xe6/0x109 [zfs]
[  721.064521]  [<ffffffffa08dc9c0>] ? zfs_domount+0x247/0x2c3 [zfs]
[  721.064523]  [<ffffffff8112a52b>] ? get_anon_bdev+0xd9/0xd9
[  721.064535]  [<ffffffffa08f56ac>] ? zpl_fill_super+0x9/0xd [zfs]
[  721.064537]  [<ffffffff8112bb28>] ? mount_nodev+0x46/0x7f
[  721.064549]  [<ffffffffa08f569e>] ? zpl_mount+0x1c/0x21 [zfs]
[  721.064551]  [<ffffffff8112b968>] ? mount_fs+0x6a/0x14b
[  721.064554]  [<ffffffff81142cee>] ? vfs_kern_mount+0x60/0xde
[  721.064556]  [<ffffffff81143577>] ? do_mount+0x7e2/0x953
[  721.064558]  [<ffffffff8116836f>] ? compat_sys_mount+0x1b4/0x1f0
[  721.064561]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  840.951724] INFO: task mount.zfs:11173 blocked for more than 120 seconds.
[  840.951811]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  840.951892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  840.952003] mount.zfs       D ffff88041fc54ac0     0 11173  11172 0x20020000
[  840.952007]  ffff8803f0f3e000 0000000000200086 ffff8803f08d7bb8 00000000000009fc
[  840.952010]  ffff8803f08d6010 0000000000014ac0 0000000000014ac0 ffff8803f0f3e000
[  840.952013]  0000000000014ac0 ffff8803f08d7fd8 ffff8803f08d7fd8 0000000000014ac0
[  840.952015] Call Trace:
[  840.952021]  [<ffffffff81110ddf>] ? __insert_vmap_area+0x75/0xc1
[  840.952034]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[  840.952039]  [<ffffffffa07bf2d2>] ? cv_wait_common+0xd4/0x164 [spl]
[  840.952042]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[  840.952046]  [<ffffffffa07b72fc>] ? vmem_alloc_debug+0x104/0x2a1 [spl]
[  840.952064]  [<ffffffffa08afa33>] ? txg_wait_synced+0x134/0x156 [zfs]
[  840.952077]  [<ffffffffa08f56a3>] ? zpl_mount+0x21/0x21 [zfs]
[  840.952090]  [<ffffffffa08ebad9>] ? zil_replay+0x79/0xe8 [zfs]
[  840.952103]  [<ffffffffa08dc3ba>] ? zfs_sb_setup+0xe6/0x109 [zfs]
[  840.952116]  [<ffffffffa08dc9c0>] ? zfs_domount+0x247/0x2c3 [zfs]
[  840.952118]  [<ffffffff8112a52b>] ? get_anon_bdev+0xd9/0xd9
[  840.952131]  [<ffffffffa08f56ac>] ? zpl_fill_super+0x9/0xd [zfs]
[  840.952133]  [<ffffffff8112bb28>] ? mount_nodev+0x46/0x7f
[  840.952145]  [<ffffffffa08f569e>] ? zpl_mount+0x1c/0x21 [zfs]
[  840.952147]  [<ffffffff8112b968>] ? mount_fs+0x6a/0x14b
[  840.952149]  [<ffffffff81142cee>] ? vfs_kern_mount+0x60/0xde
[  840.952151]  [<ffffffff81143577>] ? do_mount+0x7e2/0x953
[  840.952153]  [<ffffffff8116836f>] ? compat_sys_mount+0x1b4/0x1f0
[  840.952156]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13

Output from a sysrq 't' key: http://bayour.com/misc/syslog_zfs_mount_all.txt.

Since I see a 'txg_wait_synced' in there, I checked the 'zfs_txg_timeout' value, and it's at the default '5'. I'll reboot and increase that to ... 10?

FransUrbo commented 10 years ago

Trying to remove the cache device (after a successful import zfs_txg_timeout=10):

Nov 27 12:54:29 Celia kernel: [  481.292664] INFO: task l2arc_feed:10711 blocked for more than 120 seconds.
Nov 27 12:54:29 Celia kernel: [  481.292753]       Tainted: P           O 3.12.0+scst+tf.1 #5
Nov 27 12:54:29 Celia kernel: [  481.292834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 27 12:54:29 Celia kernel: [  481.292947] l2arc_feed      D ffff88041fd14ac0     0 10711      2 0x00000000
Nov 27 12:54:29 Celia kernel: [  481.292950]  ffff8800dc95d000 0000000000000046 ffff880037ef9fd8 ffff880037ef9fd8
Nov 27 12:54:29 Celia kernel: [  481.292954]  ffff880037ef8010 0000000000014ac0 0000000000014ac0 ffff8800dc95d000
Nov 27 12:54:29 Celia kernel: [  481.292957]  0000000000014ac0 ffff880037ef9fd8 ffff880037ef9fd8 0000000000014ac0
Nov 27 12:54:29 Celia kernel: [  481.292959] Call Trace:
Nov 27 12:54:29 Celia kernel: [  481.292967]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
Nov 27 12:54:29 Celia kernel: [  481.292972]  [<ffffffff81059d71>] ? lock_timer_base+0x26/0x4b
Nov 27 12:54:29 Celia kernel: [  481.292974]  [<ffffffff81059de0>] ? try_to_del_timer_sync+0x4a/0x54
Nov 27 12:54:29 Celia kernel: [  481.292976]  [<ffffffff81059e24>] ? del_timer_sync+0x3a/0x44
Nov 27 12:54:29 Celia kernel: [  481.292979]  [<ffffffff813c1a61>] ? schedule_timeout+0x191/0x1af
Nov 27 12:54:29 Celia kernel: [  481.292982]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
Nov 27 12:54:29 Celia kernel: [  481.292984]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
Nov 27 12:54:29 Celia kernel: [  481.292987]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
Nov 27 12:54:29 Celia kernel: [  481.292989]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
Nov 27 12:54:29 Celia kernel: [  481.293007]  [<ffffffffa076d589>] ? l2arc_feed_thread+0xa3/0x998 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293020]  [<ffffffffa076d4e6>] ? arc_freed+0x91/0x91 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293025]  [<ffffffffa06c36dc>] ? __thread_create+0x2e4/0x2e4 [spl]
Nov 27 12:54:29 Celia kernel: [  481.293029]  [<ffffffffa06c3746>] ? thread_generic_wrapper+0x6a/0x73 [spl]
Nov 27 12:54:29 Celia kernel: [  481.293031]  [<ffffffff8106a781>] ? kthread+0xb0/0xba
Nov 27 12:54:29 Celia kernel: [  481.293034]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
Nov 27 12:54:29 Celia kernel: [  481.293036]  [<ffffffff813c590c>] ? ret_from_fork+0x7c/0xb0
Nov 27 12:54:29 Celia kernel: [  481.293038]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
Nov 27 12:54:29 Celia kernel: [  481.293056] INFO: task zpool:11009 blocked for more than 120 seconds.
Nov 27 12:54:29 Celia kernel: [  481.293138]       Tainted: P           O 3.12.0+scst+tf.1 #5
Nov 27 12:54:29 Celia kernel: [  481.293218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 27 12:54:29 Celia kernel: [  481.293329] zpool           D ffff88041fd54ac0     0 11009  10625 0x20020000
Nov 27 12:54:29 Celia kernel: [  481.293331]  ffff8803f60c9000 0000000000200086 0000000000200282 0000000000000018
Nov 27 12:54:29 Celia kernel: [  481.293333]  ffff88040b21e010 0000000000014ac0 0000000000014ac0 ffff8803f60c9000
Nov 27 12:54:29 Celia kernel: [  481.293336]  0000000000014ac0 ffff88040b21ffd8 ffff88040b21ffd8 0000000000014ac0
Nov 27 12:54:29 Celia kernel: [  481.293338] Call Trace:
Nov 27 12:54:29 Celia kernel: [  481.293342]  [<ffffffffa0743034>] ? nvs_native_nvpair+0x1f/0x7a [znvpair]
Nov 27 12:54:29 Celia kernel: [  481.293353]  [<ffffffffa076ce65>] ? arc_buf_remove_ref+0xde/0xea [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293356]  [<ffffffffa0742a1b>] ? nvs_operation+0x7e/0x280 [znvpair]
Nov 27 12:54:29 Celia kernel: [  481.293359]  [<ffffffffa0743009>] ? nvs_native_nvp_op+0x1e0/0x1ec [znvpair]
Nov 27 12:54:29 Celia kernel: [  481.293362]  [<ffffffffa0743034>] ? nvs_native_nvpair+0x1f/0x7a [znvpair]
Nov 27 12:54:29 Celia kernel: [  481.293365]  [<ffffffff810e4b89>] ? find_get_page+0x18/0x6c
Nov 27 12:54:29 Celia kernel: [  481.293367]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
Nov 27 12:54:29 Celia kernel: [  481.293371]  [<ffffffffa06ca2d2>] ? cv_wait_common+0xd4/0x164 [spl]
Nov 27 12:54:29 Celia kernel: [  481.293373]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
Nov 27 12:54:29 Celia kernel: [  481.293374]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
Nov 27 12:54:29 Celia kernel: [  481.293391]  [<ffffffffa07b7d47>] ? spa_config_enter+0x9b/0xf0 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293405]  [<ffffffffa07b811f>] ? spa_vdev_config_enter+0x16/0x22 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293420]  [<ffffffffa07ac8a4>] ? spa_vdev_remove+0x40/0x330 [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293424]  [<ffffffffa06c25da>] ? kmem_alloc_debug+0x141/0x2c9 [spl]
Nov 27 12:54:29 Celia kernel: [  481.293437]  [<ffffffffa07dc651>] ? zfs_ioc_vdev_remove+0x33/0x4f [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293451]  [<ffffffffa07dee62>] ? zfsdev_ioctl+0x326/0x3ba [zfs]
Nov 27 12:54:29 Celia kernel: [  481.293454]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
Nov 27 12:54:29 Celia kernel: [  481.293457]  [<ffffffff8110a038>] ? remove_vma+0x44/0x4b
Nov 27 12:54:29 Celia kernel: [  481.293459]  [<ffffffff8110b590>] ? do_munmap+0x2dd/0x2f9
Nov 27 12:54:29 Celia kernel: [  481.293461]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
FransUrbo commented 10 years ago

Have #938 resurfaced again? My txg_sync process is also at 100% after a call trace like https://github.com/zfsonlinux/zfs/issues/1862#issuecomment-29376714 which hangs 'zfs mount -av' at 12/628.

During the import:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10716 root      20   0  8864 2320 1676 R  101  0.0  27:56.87 zpool
10937 root       0 -20     0    0    0 R  101  0.0  27:57.50 txg_sync

101%!? Cool! :)

Trying options wildly.

celia:~# grep ^options /etc/modprobe.d/zfs
options zfs zfs_arc_max=4294967296
options zfs zfs_arc_min=536870912
options zfs zvol_threads=64
FransUrbo commented 10 years ago

A zpool import (no mount) is still running after almost an hour. Increasing the arc_max to 10GB don't seem to change anything (as yet). No call trace or hung process though, that's good. I'm going to let it run and see what happens...

Load average at 2.10/2.20/2.11, an strace on the zpool process doesn't give anything (and yet it runs at 100% CPU) and I don't know how to trace a kernel process (txg_sync, also 100%)...

FransUrbo commented 10 years ago

After running zpool import for 80 minutes, with a huge max arch cache, memory is barely touched

celia:~# free
             total       used       free     shared    buffers     cached
Mem:      16407640     665372   15742268          0      83192     176252
-/+ buffers/cache:     405928   16001712
Swap:      8393956          0    8393956
FransUrbo commented 10 years ago

A 'zdb share' gives

celia:~# zdb share
zdb: can't open 'share': Input/output error

An strace output of this can be found at http://bayour.com/misc/strace_zdb.txt.

dweeezil commented 10 years ago

@FransUrbo Not much information there. Try doing an strace -o <file> -ff. You'll find that it generates a ton of subprocesses.

FransUrbo commented 10 years ago

@dweeezil what process do you want me to do strace? I think I've done it on every one I could think of...

FransUrbo commented 10 years ago
celia:~# strace -o /tmp/x -ff zdb share
zdb: can't open 'share': Input/output error
upeek: ptrace(PTRACE_PEEKUSER,4587,44,0): No such process

And there's no /tmp/x file...

Trying again:

celia:~# strace -o /tmp/x -ff zdb share
zdb: can't open 'share': Input/output error
celia:~# ll /tmp/x
/bin/ls: cannot access /tmp/x: No such file or directory
dweeezil commented 10 years ago

@FransUrbo The zdb command forks/clones itself a lot and most of the interesting action that might help figure out your Input/output error is in one of the forked processes. The processes are all zdb. When you use "-o /tmp/x -ff", the output files will be named "/tmp/xx." and there should be a whole lot of them. Did you look in /tmp to see if there were any files there?

# strace -o /tmp/x -ff zdb junk1 > /dev/null
# ls /tmp/x.*
/tmp/x.21772  /tmp/x.21797  /tmp/x.21822  /tmp/x.21847 ...

I'd suspect that one of those trace files should give a clue as to where you I/O error is occurring.

FransUrbo commented 10 years ago

Ah! I've never used double F's, single yes, but not double. Much simpler than getting it all in one file :)

Uploaded to http://bayour.com/misc/zdb_share/.

dweeezil commented 10 years ago

@FransUrbo The strace output doesn't show anything terribly interesting because a lot of the work is being done in the kernel. The first thing I'd do is to rule out a corrupted cache file. I'd also suggest backing off to a simple zdb -d share which won't try to do so much.

In order to check your cache file, I'd suggest renaming it with mv /etc/zfs/zpool.cache /etc/zfs/zpool.cache.save. When running zdb with no cache file, you need to add the "-e" option and I also recommend an appropriate "-p" option which from the looks of your strace output would end up as zdb -e -p /dev/disk/by-id .... However, once the cache file is renamed, it would probably be worth backing off to testing with zpool import. Either try a plain zpool import or a zpool import -d /dev/disk/by-id and see whether it can find the pool. Once that works, you can start trying various zdb commands as indicated above.

Your strace output shows the following devices being opened:

/dev/disk/by-id/ata-ST3000DM001-9YN166_Z1F0X41F-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4XK4T-part1
/dev/disk/by-id/ata-ST3000DM001-9YN166_Z1F0X3P4-part1
/dev/disk/by-id/scsi-SATA_ST1500DL003-9VT_5YD1F2KF-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4VT5X-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS3SAWS-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4WM30-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS0H3A9-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4WWPA-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4Q38C-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4VT5R-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4R2MJ-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS4Q3F4-part1
/dev/disk/by-id/scsi-SATA_ST31500341AS_9VS13W11-part1
/dev/disk/by-id/scsi-SATA_ST3000DM001-9YN_Z1F0WJSN-part1

Are those the only devices in your pool?

FransUrbo commented 10 years ago

I'm going to try the different zdb command lines, but in the meantime - those disks looks like all of them. 15 disks (3x3TB+12x1.5TB). I also have a 120GB Corsair SSD disk as cache device, but I've removed that from the system, but not the pool (can't, any zpool remove/offline etc gives me a lockup).

        NAME                                            STATE     READ WRITE CKSUM
        share                                           ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            scsi-SATA_ST3000DM001-9YN_Z1F0WJSN          ONLINE       0     0     0
            ata-ST3000DM001-9YN166_Z1F0X41F             ONLINE       0     0     0
            ata-ST3000DM001-9YN166_Z1F0X3P4             ONLINE       0     0     0
          raidz1-1                                      ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4XK4T             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4Q3F4             ONLINE       0     0     0
            scsi-SATA_ST1500DL003-9VT_5YD1F2KF          ONLINE       0     0     0
          raidz1-2                                      ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS3SAWS             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4R2MJ             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS13W11             ONLINE       0     0     0
          raidz1-3                                      ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4VT5R             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4Q38C             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4WM30             ONLINE       0     0     0
          raidz1-4                                      ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4VT5X             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS4WWPA             ONLINE       0     0     0
            scsi-SATA_ST31500341AS_9VS0H3A9             ONLINE       0     0     0
        cache
          ata-Corsair_Force_3_SSD_11486508000008952122  UNAVAIL      0     0     0

Strangely enough (and probably unrelated to the issue) is that all the disks that is prefixed with ata- used to be prefixed with scsi- as well a few days ago.

FransUrbo commented 10 years ago

Running 'zdb -d share' gave an I/O error:

celia(SSD1):~# strace -o /tmp/strace_-d_share -ff zdb -d share
zdb: can't open 'share': Input/output error

The strace output can be found at http://bayour.com/misc/strace_zdb_-d_share/.

Loading the module, importing the pool, removing the cache file, rebooting (just to be sure :), load the module shows 'no pools availible'. Importing the pool with 'zpool import -d /dev/disk/by-id -aNf' (had to force it for some reason) worked just fine.

Exporting the pool and then run 'zdb -e -p /dev/disk/by-id' gave http://bayour.com/misc/strace_zdb_-e_-p/zdb_-e_-p.out and the strace output http://bayour.com/misc/strace_zdb_-e_-p

dweeezil commented 10 years ago

I had expected the plain zdb -d share to give the same I/O error.

Your second set of steps is a bit confusing. You said "Loading the module, importing the pool": was that a "modprobe zfs" followed by a "zpool import" or were you referring to the auto-import caused by the presence of the cache file? After removing the cache file and rebooting, and loading the module, it would be expected behavior for "zpool status" (I presume that's what you ran which said "no pools available") to say "no pools available" because there's no cache file and no pools have yet been imported.

Next, you imported the pool which appeared to work. When did importing the pool start working? The original problem being reported in this thread was a hang during import. I re-read the thread completely and didn't notice any explicit mention that imports started working.

In any case, your "zdb -e -p..." is interesting in that it shows the zpool configuration as missing the disks in the "raidz1-3" and "raidz1-4" vdevs. Were those vdevs always part of the pool or were they added after the pool was created?

Based on that ouput, I'd suggest dumping the labels and uberblocks of each disk (partition) with commands like:

zdb -lu /dev/disk/by-id/ata-ST3000DM001-9YN166_Z1F0X41F-part1

that might give some clue as to why the pool is not being assembled correctly and will also make sure the devices are all actually readable.

FransUrbo commented 10 years ago

You said "Loading the module, importing the pool": was that a "modprobe zfs" followed by a "zpool import"

Yes. or were you referring to the auto-import caused by the presence of the cache file?

No auto-import - I've set zfs_autoimport_disable=1 to make sure I have to do it individually. I presume that's what you ran which said "no pools available"

Correct. Next, you imported the pool which appeared to work.

Yes. When did importing the pool start working?

It's been very intermittent. Sometimes it works, sometimes it doesn't. But it seems removing the cache device makes it more prone to work than not. The original problem being reported in this thread was a hang during import. I re-read the thread completely and didn't notice any explicit mention that imports started working.

Actually, this issue is about getting a hang whenever I do NOT set zfs zvol_inhibit_dev=1. If I do, any import (wether I use auto import or do it individually) gives a hang/dead lock. In any case, your "zdb -e -p..." is interesting in that it shows the zpool configuration as missing the disks in the "raidz1-3" and "raidz1-4" vdevs. Were those vdevs always part of the pool or were they added after the pool was created?

I'm fairly certain that they was always there. Doing a zpool history share (to triple check) gives me:

2011-11-03.22:41:47 zpool create -f -o ashift=12 share raidz1 scsi-SATA_ST31500341AS_9VS3S9YD scsi-SATA_ST31500341AS_9VS08THF scsi-SATA_ST31500341AS_9VS16S63
2011-11-03.22:41:50 zpool add -f share raidz1 scsi-SATA_ST31500341AS_9VS4XK4T scsi-SATA_ST31500341AS_9VS4Q3F4 scsi-SATA_ST1500DL003-9VT_5YD1F2KF
2011-11-03.22:41:54 zpool add -f share raidz1 scsi-SATA_ST31500341AS_9VS3SAWS scsi-SATA_ST31500341AS_9VS0DR98 scsi-SATA_ST31500341AS_9VS13W11
2011-11-03.22:41:58 zpool add -f share raidz1 scsi-SATA_ST31500341AS_9VS4VT5R scsi-SATA_ST31500341AS_9VS4Q38C scsi-SATA_ST31500341AS_9VS4WM30
2011-11-03.22:42:00 zpool add -f share raidz1 scsi-SATA_ST31500341AS_9VS4VT5X scsi-SATA_ST31500341AS_9VS4WWPA scsi-SATA_ST31500341AS_9VS0H3A9
2011-11-03.22:42:02 zpool add -f share cache scsi-SATA_INTEL_SSDSA2CW0CVPR13500014080BGN
[...]

So, yes, they've always been there.

Initially I had 15x1.5TB disks in 5 vdevs. But a couple of months ago (in april), I replace all three 1.5TB disks in the first vdev (raidz1-0) with 3TB disks (in need of more space AND replacement of disks that was close to their end of warranty).

[...]
2013-04-20.14:05:44 zpool offline share scsi-SATA_ST31500341AS_9VS08THF
2013-04-20.14:08:08 zpool replace -f share scsi-SATA_ST31500341AS_9VS08THF scsi-SATA_ST3000DM001-9YN_Z1F0X41F
[...]
2013-04-23.09:40:23 zpool replace share scsi-SATA_ST31500341AS_9VS3S9YD scsi-SATA_ST3000DM001-9YN_Z1F0WJSN
[...]
2013-04-24.21:55:34 zpool replace share scsi-SATA_ST31500341AS_9VS16S63 scsi-SATA_ST3000DM001-9YN_Z1F0X3P4
[...]

See http://bayour.com/misc/screenlog_zpool_history.txt for the full history of the pool.

I'd suggest dumping the labels and uberblocks of each disk (partition)

Have a look at http://bayour.com/misc/screenlog_zdb_-lu.txt. I did only the first partition (ZFS have full access to the raw device so it also created a part9, but I don't get any output from zdb -lu on those).

When I say 'intermittent', I mean that in every meaning of the word! :).

Sometimes, commands that have worked almost all the time, just stops working! The import is such a thing - when I got indications that it had something to do with the cache (I did an strace of some command I can't remember now, but that indicated that it tried to open the device file of the cache disk and hung there), I removed it (by echo 1 > /sys/block/<dev>/device/delete).

I have not been able to do a zfs remove or zfs offline on the device - that gives a dead lock.

So doing that before I loaded the module and imported the pool (individually), it still didn't work for 'a couple of' reboots. But then it started working again, and so far it works...

Also, sometimes a modprobe zfs would import the pool (when I had zfs_autoimport_disable=0), and sometimes it didn't. When it didn't (and zpool status showed 'no pools available') a zpool import .... hung, and sometimes it didn't (most often it DID hang though), but sometimes (very rarely - every ten/fifteen reboot or so) it managed to import, but the zfs mount -av hung.

So it's been a extremely difficult problem to debug, because no matter what I do or did, after a couple of reboots and 'unknown things I did' (things that shouldn't be important, such as rebooting into a full amd64 system I have on an external USB disk on the same machine and try to import the pool there and then back to my mixed amd64/i386 system again) something that previously failed, just started working again!!

I've tried to use this issue as a step by step thing of everything I do or did, but you know how impossible that is :)...

The current status as of this minute (in this order of tests):

[set zfs_autoimport_disable=1] [set zfs zvol_inhibit_dev=1]

  1. Loading the module WORKS ('no pools available', which is correct behavior due to autoimport_disable)
  2. Importing the pool WORKS
  3. Mounting all the filesystems WORKS (which didn't work yesterday or the day before that, I do know I've tried this with the cache device removed which it is now). [zpool export + reboot] [set zfs_autoimport_disable=0]
  4. Loading the module WORKS ('no pools available', which is correct due to the previous export)
  5. Importing the pool WORKS [reboot]
  6. Loading the module WORKS (auto imports the pool correctly)
  7. Importing the pool (as in 'zpool import -d /dev/disk/by-id -aN') WORKS [reboot] [set zfs_autoimport_disable=1] [set zfs zvol_inhibit_dev=0]
  8. Loading the module WORKS (no pools available, which is correct behavior)
  9. Importing the pool (had to use -f) !!FAILS!! (the zd* devices is output in the syslog, but not created in /dev) [reboot] [set zfs_autoimport_disable=0] [set zfs zvol_inhibit_dev=0]
  10. Loading the module WORKS (pool auto imported). However, the following 'zpool status' hangs with the specified hang/dead lock (see point 9).

When I say '[set ....]', I mean add that value to the '/etc/modprobe.d/zfs' file...

Those last points (9 and 10) is really what this issue is about - any access to a ZVOL (including a zfs send/receive) gives a dead lock.

When importing the pool without the 'zvol_inhibit_dev' parameter, I get (after a few minutes) an oops from first zpool, txg_sync and then vol_id (from udev, NOT the zvol_id) and all vol_id processes are tagged Dead (177 of them, although I only have 64 VDEVs). And the import is hung and I can no longer issue any zfs command...
dweeezil commented 10 years ago

It sounds like you're back to your original issue. I jumped into this thread when I spotted the I/O error on zdb, thinking it would respond to a rational debugging regimen, but clearly that's not going to happen :) It looks like all your disks' labels are just fine so there doesn't appear to be any obvious reason for an import to fail.

The common function between most zdb commands and minor creation upon import is that both operations try to iterate through all datasets (filesystems and zvols) within the pool. There must be something in the pool configuration that's messed up.

When the pool's imported, could you try zdb -d export 0 and also zdb -dddd export 4 (object 4 is usually the top-level child map). I'm not sure if those forms of zdb need to do a full-blown iteration. If so, then they'll likely segfault, too and your only further recourse is likely to start debugging the module by hand. If those commands do work, you could walk through your pool until you find out what object is causing the problem.

I don't believe you mentioned yet whether your machine has ECC memory. If not, then I suspect some corrupted data has been written to the pool. In that case, there's all sorts of problems that can occur and most of them aren't handled very gracefully within zfs.

You might get more information by compiling the zfs and spl modules with --enabled-debug (argument to configure).

FransUrbo commented 10 years ago

It sounds like you're back to your original issue.

Yes, it seem so... However, I can still not run most zdb commands (among them '-d', including the commands you just showed - zdb -d share 0 and zdb -dddd share 4).

Still get the I/O error on both of these...

I don't believe you mentioned yet whether your machine has ECC memory.

Unfortunately not :(. It seems the motherboard (ASUS SABERTOOTH 990FX v1.00) does support it, but I was not able to afford enough ECC memory when I build the machine, so I settled for non-ECC.

You might get more information by compiling the zfs and spl modules with --enabled-debug

I tried that, but unfortunately I'm using zfs-crypto by zfsrouge and that don't respond well to --enable-debug (it just won't compile). I've tried to circumvent that, but eventually I came to a problem I was not knowledgeable enough to fix.

And since it's not possible to disable a feature once it's been enabled, I can't go back either. I was planning on building a new backup machine, but I can't afford ~20TB at the moment...

FransUrbo commented 10 years ago

@dweeezil I've spend most of the day trying to fix the compile problems I got when enabling debugging. Had to do some ugly hacks, but nothing major.

So I now have spl/zfs configured and compiled with debugging. Let me know how you want me to proceed.

FransUrbo commented 10 years ago

With debug enabled modules and using zvol_inhibit_dev=0, I get a slightly different stack trace when importing the pool:

[  361.440244] INFO: task zpool:11116 blocked for more than 120 seconds.
[  361.440331]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.440412] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.440523] zpool           D ffff88041fc14ac0     0 11116      1 0x20020000
[  361.440526]  ffff8800dd29e000 0000000000200086 0000000000000007 0000000000000008
[  361.440529]  ffff8800da112010 0000000000014ac0 0000000000014ac0 ffff8800dd29e000
[  361.440532]  0000000000014ac0 ffff8800da113fd8 ffff8800da113fd8 0000000000014ac0
[  361.440534] Call Trace:
[  361.440540]  [<ffffffff810eac59>] ? get_page_from_freelist+0x48e/0x590
[  361.440543]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[  361.440547]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[  361.440549]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[  361.440552]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[  361.440554]  [<ffffffff810ef656>] ? pagevec_lru_move_fn+0xa8/0xbd
[  361.440574]  [<ffffffffa07cd027>] ? spa_open_common+0x51/0x2f4 [zfs]
[  361.440577]  [<ffffffff8111ee39>] ? __kmalloc+0x138/0x1b5
[  361.440582]  [<ffffffffa06d8ca7>] ? kmem_alloc_debug+0x189/0x311 [spl]
[  361.440597]  [<ffffffffa07ced12>] ? spa_get_stats+0x35/0x228 [zfs]
[  361.440611]  [<ffffffffa07f8e4d>] ? zfs_ioc_pool_stats+0x21/0x57 [zfs]
[  361.440625]  [<ffffffffa07fbe62>] ? zfsdev_ioctl+0x326/0x3ba [zfs]
[  361.440627]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
[  361.440630]  [<ffffffff8112967e>] ? vfs_read+0xa8/0x14c
[  361.440632]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  361.440644] INFO: task zpool:11330 blocked for more than 120 seconds.
[  361.440727]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.440807] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.440918] zpool           D ffff88041fcd4ac0     0 11330  10942 0x20020000
[  361.440920]  ffff8803ffc17000 0000000000200086 0000000200000030 0000000000000001
[  361.440922]  ffff8803fd5a6010 0000000000014ac0 0000000000014ac0 ffff8803ffc17000
[  361.440925]  0000000000014ac0 ffff8803fd5a7fd8 ffff8803fd5a7fd8 0000000000014ac0
[  361.440927] Call Trace:
[  361.440930]  [<ffffffff81139b69>] ? __d_instantiate+0x8b/0xef
[  361.440932]  [<ffffffff8113b9ed>] ? dentry_kill+0x1b1/0x1bc
[  361.440934]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.440937]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[  361.440939]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[  361.440942]  [<ffffffffa06e35c8>] ? cv_wait_common+0x24e/0x344 [spl]
[  361.440946]  [<ffffffffa06e488c>] ? tsd_hash_search+0xcb/0x1a5 [spl]
[  361.440961]  [<ffffffffa07d4ca7>] ? spa_lookup+0x4b/0x50 [zfs]
[  361.440964]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[  361.440980]  [<ffffffffa07c3548>] ? rrw_enter_read+0x40/0x130 [zfs]
[  361.440994]  [<ffffffffa07b654a>] ? dsl_pool_hold+0x43/0x4c [zfs]
[  361.441006]  [<ffffffffa079d9c9>] ? dmu_objset_hold+0x1d/0x78 [zfs]
[  361.441010]  [<ffffffffa06d8ca7>] ? kmem_alloc_debug+0x189/0x311 [spl]
[  361.441023]  [<ffffffffa07f864d>] ? zfs_ioc_dataset_list_next+0x48/0x140 [zfs]
[  361.441037]  [<ffffffffa07fbe62>] ? zfsdev_ioctl+0x326/0x3ba [zfs]
[  361.441039]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
[  361.441041]  [<ffffffff8104b24a>] ? sys32_lstat64+0x11/0x29
[  361.441043]  [<ffffffff8112967e>] ? vfs_read+0xa8/0x14c
[  361.441045]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  361.441047] INFO: task txg_sync:11332 blocked for more than 120 seconds.
[  361.441130]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.441210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.441321] txg_sync        D ffff88041fd54ac0     0 11332      2 0x00000000
[  361.441323]  ffff88040a6ee000 0000000000000046 ffff8803fbdfe010 0000000000014ac0
[  361.441325]  ffff8803fbdfe010 0000000000014ac0 0000000000014ac0 ffff88040a6ee000
[  361.441327]  0000000000014ac0 ffff8803fbdfffd8 ffff8803fbdfffd8 0000000000014ac0
[  361.441330] Call Trace:
[  361.441332]  [<ffffffff810185f9>] ? sched_clock+0x13/0x16
[  361.441335]  [<ffffffff810767b1>] ? update_rq_clock+0x1e/0xec
[  361.441337]  [<ffffffff810711fa>] ? check_preempt_curr+0x36/0x62
[  361.441339]  [<ffffffff81071302>] ? ttwu_do_wakeup+0xf/0xab
[  361.441341]  [<ffffffff8101865f>] ? read_tsc+0x18/0x2e
[  361.441343]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[  361.441346]  [<ffffffffa06e35c8>] ? cv_wait_common+0x24e/0x344 [spl]
[  361.441349]  [<ffffffffa06e366a>] ? cv_wait_common+0x2f0/0x344 [spl]
[  361.441352]  [<ffffffff8107370c>] ? __wake_up+0x30/0x44
[  361.441353]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[  361.441368]  [<ffffffffa07c34c5>] ? rrw_enter_write+0x29/0x6c [zfs]
[  361.441383]  [<ffffffffa07bd8dd>] ? dsl_sync_task_sync+0x7e/0xca [zfs]
[  361.441397]  [<ffffffffa07b72b6>] ? dsl_pool_sync+0x37c/0x4d9 [zfs]
[  361.441399]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[  361.441414]  [<ffffffffa07c7778>] ? spa_sync+0x506/0x941 [zfs]
[  361.441416]  [<ffffffff8106acd9>] ? autoremove_wake_function+0x9/0x2e
[  361.441418]  [<ffffffff8107370c>] ? __wake_up+0x30/0x44
[  361.441433]  [<ffffffffa07d809a>] ? txg_sync_thread+0x334/0x54b [zfs]
[  361.441437]  [<ffffffffa06d65ca>] ? kmem_free_debug+0x124/0x16c [spl]
[  361.441451]  [<ffffffffa07d7d66>] ? txg_thread_exit+0x2b/0x2b [zfs]
[  361.441455]  [<ffffffffa06d9f97>] ? __thread_create+0x32f/0x32f [spl]
[  361.441458]  [<ffffffffa06da04d>] ? thread_generic_wrapper+0xb6/0xc1 [spl]
[  361.441460]  [<ffffffff8106a781>] ? kthread+0xb0/0xba
[  361.441462]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
[  361.441464]  [<ffffffff813c590c>] ? ret_from_fork+0x7c/0xb0
[  361.441466]  [<ffffffff8106a6d1>] ? __init_kthread_worker+0x2d/0x2d
[  361.441467] INFO: task vol_id:11342 blocked for more than 120 seconds.
[  361.441549]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.441629] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.441740] vol_id          D ffff88041fc54ac0     0 11342      1 0x20020000
[  361.441742]  ffff880037bfd000 0000000000000086 ffff880037bfd488 ffff88041fc54b48
[  361.441744]  ffff8800d927e010 0000000000014ac0 0000000000014ac0 ffff880037bfd000
[  361.441746]  0000000000014ac0 ffff8800d927ffd8 ffff8800d927ffd8 0000000000014ac0
[  361.441748] Call Trace:
[  361.441750]  [<ffffffff81076563>] ? finish_task_switch+0x5c/0xc3
[  361.441752]  [<ffffffff813c3d63>] ? __schedule+0x754/0x7ea
[  361.441754]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[  361.441757]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[  361.441759]  [<ffffffff813c4df4>] ? _raw_spin_lock_irqsave+0x33/0x38
[  361.441762]  [<ffffffffa06e35c8>] ? cv_wait_common+0x24e/0x344 [spl]
[  361.441765]  [<ffffffffa06e48f6>] ? tsd_hash_search+0x135/0x1a5 [spl]
[  361.441780]  [<ffffffffa07d4ca7>] ? spa_lookup+0x4b/0x50 [zfs]
[  361.441782]  [<ffffffff8106acd0>] ? wake_up_bit+0x22/0x22
[  361.441797]  [<ffffffffa07c3548>] ? rrw_enter_read+0x40/0x130 [zfs]
[  361.441811]  [<ffffffffa07b654a>] ? dsl_pool_hold+0x43/0x4c [zfs]
[  361.441823]  [<ffffffffa079d8dd>] ? dmu_objset_own+0x2e/0xfd [zfs]
[  361.441836]  [<ffffffffa08200ab>] ? zvol_open+0xaf/0x276 [zfs]
[  361.441838]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[  361.441840]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[  361.441843]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[  361.441845]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[  361.441847]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.441849]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[  361.441850]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.441852]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[  361.441854]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[  361.441856]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[  361.441858]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[  361.441860]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[  361.441862]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[  361.441864]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[  361.441866]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[  361.441868]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.441870]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  361.441872]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[  361.441874]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  361.441875] INFO: task vol_id:11352 blocked for more than 120 seconds.
[  361.441958]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.442038] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.442149] vol_id          D ffff88041fc94ac0     0 11352  11347 0x20020000
[  361.442151]  ffff8803fb9a6000 0000000000000086 ffff8803fba26000 000088040c39ce80
[  361.442153]  ffff8803fba26010 0000000000014ac0 0000000000014ac0 ffff8803fb9a6000
[  361.442155]  0000000000014ac0 ffff8803fba27fd8 ffff8803fba27fd8 0000000000014ac0
[  361.442157] Call Trace:
[  361.442160]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[  361.442162]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[  361.442164]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[  361.442166]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[  361.442168]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[  361.442171]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[  361.442172]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[  361.442185]  [<ffffffffa0820049>] ? zvol_open+0x4d/0x276 [zfs]
[  361.442187]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[  361.442189]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[  361.442191]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[  361.442193]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[  361.442195]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442197]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[  361.442199]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442201]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[  361.442202]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[  361.442204]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[  361.442206]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[  361.442208]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[  361.442210]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[  361.442212]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[  361.442214]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[  361.442216]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442218]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  361.442219]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[  361.442221]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  361.442223] INFO: task vol_id:11362 blocked for more than 120 seconds.
[  361.442305]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.442385] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.442496] vol_id          D ffff88041fcd4ac0     0 11362  11357 0x20020000
[  361.442498]  ffff8803fce38000 0000000000000086 0000000000000029 ffff88040a9a87c0
[  361.442500]  ffff8803fba64010 0000000000014ac0 0000000000014ac0 ffff8803fce38000
[  361.442502]  0000000000014ac0 ffff8803fba65fd8 ffff8803fba65fd8 0000000000014ac0
[  361.442504] Call Trace:
[  361.442506]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[  361.442508]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[  361.442510]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[  361.442513]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[  361.442515]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[  361.442517]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[  361.442519]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[  361.442520]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[  361.442533]  [<ffffffffa0820049>] ? zvol_open+0x4d/0x276 [zfs]
[  361.442535]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[  361.442537]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[  361.442539]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[  361.442541]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[  361.442543]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442545]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[  361.442547]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442549]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[  361.442550]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[  361.442552]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[  361.442554]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[  361.442556]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[  361.442558]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[  361.442560]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[  361.442562]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[  361.442564]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442566]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  361.442567]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[  361.442569]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  361.442571] INFO: task vol_id:11372 blocked for more than 120 seconds.
[  361.442654]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.442734] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.442845] vol_id          D ffff88041fcd4ac0     0 11372  11366 0x20020000
[  361.442846]  ffff8803fb9c5000 0000000000000086 ffff8803fbade000 000088040aa770c0
[  361.442849]  ffff8803fbade010 0000000000014ac0 0000000000014ac0 ffff8803fb9c5000
[  361.442851]  0000000000014ac0 ffff8803fbadffd8 ffff8803fbadffd8 0000000000014ac0
[  361.442853] Call Trace:
[  361.442855]  [<ffffffff810eb3ea>] ? __alloc_pages_nodemask+0x190/0x910
[  361.442857]  [<ffffffff81104fd7>] ? handle_mm_fault+0x9c3/0x9f8
[  361.442859]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[  361.442861]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[  361.442863]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[  361.442865]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[  361.442877]  [<ffffffffa0820049>] ? zvol_open+0x4d/0x276 [zfs]
[  361.442879]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[  361.442881]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[  361.442883]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[  361.442885]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[  361.442887]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442889]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[  361.442891]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442893]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[  361.442895]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[  361.442896]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[  361.442898]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[  361.442900]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[  361.442902]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[  361.442904]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[  361.442906]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[  361.442908]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.442910]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  361.442912]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[  361.442914]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  361.442915] INFO: task vol_id:11382 blocked for more than 120 seconds.
[  361.442998]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.443078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.443189] vol_id          D ffff88041fcd4ac0     0 11382  11377 0x20020000
[  361.443190]  ffff8803fbac4000 0000000000000086 0000000000000029 ffff88040ee82480
[  361.443193]  ffff8803fba6c010 0000000000014ac0 0000000000014ac0 ffff8803fbac4000
[  361.443195]  0000000000014ac0 ffff8803fba6dfd8 ffff8803fba6dfd8 0000000000014ac0
[  361.443197] Call Trace:
[  361.443199]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[  361.443201]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[  361.443204]  [<ffffffff812a2de5>] ? extract_buf+0x97/0x14d
[  361.443205]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[  361.443208]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[  361.443209]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[  361.443211]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[  361.443213]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[  361.443226]  [<ffffffffa0820049>] ? zvol_open+0x4d/0x276 [zfs]
[  361.443228]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[  361.443230]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[  361.443232]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[  361.443233]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[  361.443235]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443237]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[  361.443239]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443241]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[  361.443243]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[  361.443245]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[  361.443246]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[  361.443248]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[  361.443251]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[  361.443252]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[  361.443254]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[  361.443256]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443258]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  361.443260]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[  361.443262]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  361.443264] INFO: task vol_id:11392 blocked for more than 120 seconds.
[  361.443346]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.443426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.443537] vol_id          D ffff88041fcd4ac0     0 11392  11387 0x20020000
[  361.443539]  ffff8803fbac8000 0000000000000086 0000000000000029 ffff88040a5d2880
[  361.443541]  ffff8803fbb2e010 0000000000014ac0 0000000000014ac0 ffff8803fbac8000
[  361.443543]  0000000000014ac0 ffff8803fbb2ffd8 ffff8803fbb2ffd8 0000000000014ac0
[  361.443545] Call Trace:
[  361.443547]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[  361.443549]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[  361.443551]  [<ffffffff8110328c>] ? __do_fault+0x3ed/0x40a
[  361.443554]  [<ffffffff81104bc0>] ? handle_mm_fault+0x5ac/0x9f8
[  361.443556]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[  361.443558]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[  361.443560]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[  361.443562]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[  361.443574]  [<ffffffffa0820049>] ? zvol_open+0x4d/0x276 [zfs]
[  361.443576]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[  361.443578]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[  361.443580]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[  361.443582]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[  361.443584]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443586]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[  361.443588]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443590]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[  361.443591]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[  361.443593]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[  361.443595]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[  361.443597]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[  361.443599]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[  361.443601]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[  361.443603]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[  361.443605]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443607]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  361.443608]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[  361.443610]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
[  361.443612] INFO: task vol_id:11402 blocked for more than 120 seconds.
[  361.443695]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.443774] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.443886] vol_id          D ffff88041fcd4ac0     0 11402  11397 0x20020000
[  361.443887]  ffff8803fb9bf000 0000000000000086 0000000000000029 ffff88040c6e0780
[  361.443889]  ffff8803fbbca010 0000000000014ac0 0000000000014ac0 ffff8803fb9bf000
[  361.443891]  0000000000014ac0 ffff8803fbbcbfd8 ffff8803fbbcbfd8 0000000000014ac0
[  361.443893] Call Trace:
[  361.443895]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[  361.443897]  [<ffffffff810e4d2b>] ? unlock_page+0xd/0x1b
[  361.443900]  [<ffffffff8110e8a7>] ? page_add_new_anon_rmap+0x68/0x12a
[  361.443902]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443904]  [<ffffffff81104fd7>] ? handle_mm_fault+0x9c3/0x9f8
[  361.443906]  [<ffffffff813c41ab>] ? schedule_preempt_disabled+0x9/0xc
[  361.443908]  [<ffffffff813c1fd0>] ? __mutex_lock_slowpath+0x1ab/0x1f6
[  361.443910]  [<ffffffff811e454b>] ? get_disk+0x36/0x52
[  361.443911]  [<ffffffff813c1d72>] ? mutex_lock+0x12/0x22
[  361.443924]  [<ffffffffa0820049>] ? zvol_open+0x4d/0x276 [zfs]
[  361.443926]  [<ffffffff81155819>] ? __blkdev_get+0x10a/0x3df
[  361.443928]  [<ffffffff8113e249>] ? inode_init_always+0xff/0x1b5
[  361.443930]  [<ffffffff81155cd2>] ? blkdev_get+0x1e4/0x2f1
[  361.443932]  [<ffffffff8113d2c4>] ? unlock_new_inode+0x4e/0x59
[  361.443934]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443935]  [<ffffffff81154ef8>] ? bdget+0x122/0x12a
[  361.443937]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443939]  [<ffffffff81155ddf>] ? blkdev_get+0x2f1/0x2f1
[  361.443941]  [<ffffffff81127082>] ? do_dentry_open+0x16c/0x22b
[  361.443943]  [<ffffffff8112720f>] ? finish_open+0x2c/0x35
[  361.443945]  [<ffffffff81134d9d>] ? do_last+0x9fd/0xc4a
[  361.443947]  [<ffffffff81135249>] ? path_openat+0x25f/0x5bd
[  361.443949]  [<ffffffff81141561>] ? mntput_no_expire+0x1b/0x16c
[  361.443951]  [<ffffffff811356a2>] ? do_filp_open+0x2d/0x75
[  361.443952]  [<ffffffff8111e32c>] ? kmem_cache_alloc+0x114/0x194
[  361.443954]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.443956]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  361.443958]  [<ffffffff81126cfa>] ? do_sys_open+0x146/0x1d6
[  361.443960]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
FransUrbo commented 10 years ago

A spl-debug dump (done just after a stack trace) can be found at http://bayour.com/misc/spl-log.1386019137.12181.

FransUrbo commented 10 years ago

I now seem to have access to a working dtrace4linux if that helps...

FransUrbo commented 10 years ago

Sorry, oups...

behlendorf commented 10 years ago

@FransUrbo I think there's enough information in http://bayour.com/misc/syslog_zpool_import.txt to unwind the deadlock. I've started working my way through it, I'll let you know if I need anymore.

FransUrbo commented 10 years ago

GREATE!! Thanx!

behlendorf commented 10 years ago

@FransUrbo As near as I can tell everything is blocked on the dp_config_rwlock rrwlock because there is a writer holding it and not releasing it. Unfortunately, I'm not able to tell which process is holding the lock.

I've written a debug patch behlendorf/zfs@97e460204d39db39e4231c513f1c7e2f02b0d097 (lightly tested) which will print to the console the process name of pid of the holder after 60 seconds. If you can recreate the issue with the patch applied hopefully we can identify the offending process and get its stack. Just be careful to build with --enable-debug to enable the patch. By default it's disabled.

Also from what I've seen the zvol threads are all victims of this deadlock. But that doesn't really jibe with your setting zvol_inhibit_dev to avoid the issue. Is it still true that setting that module option prevents the issue?

--- zpool:23852 --- mutex_lock+0x12/0x22 <- spa_namespace_lock spa_open_common+0x51/0x2f4 [zfs] spa_get_stats+0x35/0x228 [zfs] zfs_ioc_pool_stats+0x21/0x57 [zfs] zfsdev_ioctl+0x335/0x3c0 [zfs] compat_sys_ioctl+0x1bf/0xf30

--- zpool:24089 --- cv_wait_common+0xd4/0x164 [spl] rrw_enter_read+0x40/0x130 [zfs] <- dp_config_rwlock (READER) dsl_pool_config_enter dsl_pool_hold+0x43/0x4c [zfs] dmu_objset_hold+0x1d/0x78 [zfs] zfs_ioc_dataset_list_next+0x48/0x140 [zfs] zfsdev_ioctl+0x335/0x3c0 [zfs] compat_sys_ioctl+0x1bf/0xf30

--- txg_sync --- cv_wait_common+0xd4/0x164 [spl] rrw_enter_write+0x29/0x6c [zfs] <- dp_config_rwlock (WRITER) dsl_sync_task_sync+0x7e/0xca [zfs] dsl_pool_sync+0x37c/0x4d9 [zfs] spa_sync+0x506/0x941 [zfs] <- spa_config SCL_CONFIG txg_sync_thread+0x334/0x54b [zfs] __thread_create+0x2e4/0x2e4 [spl] thread_generic_wrapper+0x6a/0x73 [spl] kthread+0xb0/0xba

--- vol_id --- cv_wait_common+0xd4/0x164 [spl] rrw_enter_read+0x40/0x130 [zfs] <- dp_config_rwlock (READER) dsl_pool_hold+0x43/0x4c [zfs] dmu_objset_own+0x2e/0xfd [zfs] zvol_open+0xaf/0x276 [zfs] <- spa_namespace_lock, zvol_state_lock __blkdev_get+0x10a/0x3df blkdev_get+0x1e4/0x2f1 bdget+0x122/0x12a blkdev_get+0x2f1/0x2f1 do_dentry_open+0x16c/0x22b finish_open+0x2c/0x35 do_last+0x9fd/0xc4a path_openat+0x25f/0x5bd do_filp_open+0x2d/0x75 do_sys_open+0x146/0x1d6

--- vol_id --- mutex_lock+0x12/0x22 <- zvol_state_lock vol_open+0x4d/0x276 [zfs] __blkdev_get+0x10a/0x3df blkdev_get+0x1e4/0x2f1 bdget+0x122/0x12a blkdev_get+0x2f1/0x2f1 do_dentry_open+0x16c/0x22b finish_open+0x2c/0x35 do_last+0x9fd/0xc4a path_openat+0x25f/0x5bd do_filp_open+0x2d/0x75 do_sys_open+0x146/0x1d6

FransUrbo commented 10 years ago

Also from what I've seen the zvol threads are all victims of this deadlock. But that doesn't really jibe with your setting zvol_inhibit_dev to avoid the issue. Is it still true that setting that module option prevents the issue?

That is partly correct. Setting this to 1 allows the pool to be imported and filesystems to be mounted and accessed, but does NOT allow me to access (send/receive etc) any volumes.

As soon as I do, I get a deadlock...

I've written a debug patch behlendorf/zfs@97e4602 (lightly tested) which will print to the console the process name of pid of the holder after 60 seconds.

celia(SSD1):~# modprobe zfs
celia(SSD1):~# zpool import -d /dev/disk/by-id -aNf

Message from syslogd@Celia at Dec  5 09:48:49 ...
 kernel:[  140.100530] SPLError: 11062:0:(txg.c:666:txg_wait_synced()) ASSERTION(!dsl_pool_config_held(dp)) failed

Message from syslogd@Celia at Dec  5 09:48:49 ...
 kernel:[  140.100652] SPLError: 11062:0:(txg.c:666:txg_wait_synced()) SPL PANIC

And the call trace in the log. Waiting a minute, I get:

Dec  5 09:49:51 Celia kernel: [  202.053577] ZFS: rrl->rr_writer=          (null) (/-1)

Which is then repeated every minute... The whole log at http://bayour.com/misc/debug_zpool_import.txt.

dweeezil commented 10 years ago

I've got a feeling that the port of illumos 3464 and its effect on the zvol code is the likely culprit here and/or the interaction between the zvol code and the DSL pool configuration lock as a result of the sync task restructuring.

dweeezil commented 10 years ago

@FransUrbo Have you tried setting zil_replay_disable=1?

FransUrbo commented 10 years ago

@dweeezil I have now - no change. Just for the record, my problem originates before any of the Illumos commits...

dweeezil commented 10 years ago

OK, I didn't realize that. When I saw dmu_objset_own() and zvol together, I thought this might have been related to the illumos 3464 work because the zvol code under ZoL is very Linux specific. It sounds like you're running into a deadlock, the cause of which has been around for awhile. I suggested disabling zil replay because that's one of the things it does when creating the minors.

FransUrbo commented 10 years ago

@dweeezil I'm not saying that it ISN'T with those commits, it's just that my problem(s) originated/started before this. To get around the problems I had then, I had to upgrade (to a version that HAD/HAVE those commits). But that problem is still not resolved satisfactory...

Maybe I shouldn't have said anything about that. It is perfectly possible that whatever happened about a month ago is different from what's happening now. Or not related at all (two different problems?).

But I DID try to disable the zil replay as I said and it didn't change anything, I still got the deadlock...

behlendorf commented 10 years ago

@dweeezil I believe you're right. The deadlock was introduced by the Illumos 3464 work. It just went unnoticed because it requires either a) debugging be enabled to hit the ASSERT, or b) a large number of zvols to make hitting this deadlock reasonably likely.

@FransUrbo the ASSERT in the last log you posted makes everything clear. This issue is that we are holding the dsl_pool_config lock while calling txg_wait_synced. This can cause a deadlock because the txg_sync thread may need to acquire the dsl_pool_config lock.

spl_debug_bug+0x8d/0xd6 [spl] txg_wait_synced+0x60/0x257 [zfs] <- Blocks on sync, holding dsl_pool_config lock zil_destroy+0x28/0x3b8 [zfs] __zvol_create_minor+0x565/0x7bc [zfs] zvol_create_minors_cb+0x1d/0x21 [zfs] dmu_objset_find_impl+0x473/0x489 [zfs] <- Takes dsl_pool_config lock dmu_objset_find+0x44/0x66 [zfs] zvol_create_minors+0xa8/0x17f [zfs] spa_import+0x709/0x71a [zfs] zfs_ioc_pool_import+0xad/0xf4 [zfs] zfsdev_ioctl+0x39a/0x42e [zfs]

This isn't an issue under Illumos because they don't create all minors for the zvols at the end of spa_import() with a zvol_create_minors() function.

FreeBSD does use a zvol_create_minors() function but it's structured to avoid this case. Their version does not rely on dmu_objset_find() which takes the dsl_pool_pool lock over the callback to iterate through the datasets. Instead they've implemented similar logic but are care to ensure they're not holding dsl_pool_config lock during minor creation.

As an aside setting 'zvol_inhibit_dev' avoids the issue in ZoL by ensure we never call dmu_objset_find().

We're going to need to do something very similar to the FreeBSD approach to resolve this. I'll try and put together an initial patch tomorrow based on the FreeBSD code. It looks like it should be pretty straight forward. @FransUrbo thanks for providing the debug on this! I'm thrilled we caught this before cutting a new tag, this has to be fixed by 0.6.3 get's tagged.

behlendorf commented 10 years ago

@FransUrbo @dweeezil I've posted a patch which should resolve your issue. It's a much bigger change than is required but it brings us back in sync with FreeBSD and Illumos in regards to handling minors. It's largely based on the current FreeBSD implementation and appears to work correctly although I've only lightly tested it. Could you guys please give a look and let me know if you encounter any issues. It's not quite 100% done but it's close.

FransUrbo commented 10 years ago

Still get both a SPLError and a stack trace. Although this one looks slightly different:

[  139.834623] ZFS: Loaded module v0.6.2-1.20131207_crypto_iscsi_mine_cherry (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[  147.374640] SPL: using hostid 0xa8c00845
[  186.531659] SPLError: 11114:0:(txg.c:667:txg_wait_synced()) ASSERTION(!dsl_pool_config_held(dp)) failed
[  186.531787] SPLError: 11114:0:(txg.c:667:txg_wait_synced()) SPL PANIC
[  186.531870] SPL: Showing stack for process 11114
[  186.531872] CPU: 3 PID: 11114 Comm: zpool Tainted: P           O 3.12.0+scst+tf.1 #5
[  186.531873] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX, BIOS 0901 11/24/2011
[  186.531875]  0000000000000003 ffffffff813c18b9 0000000000000000 ffffffffa071fb18
[  186.531879]  ffffffffa0736b34 ffff8804020e2800 0000000000000001 0000000000000000
[  186.531881]  ffff8803f5fe8000 ffffffffa085db14 ffffffffa0901e49 0000000000000000
[  186.531884] Call Trace:
[  186.531889]  [<ffffffff813c18b9>] ? dump_stack+0x41/0x58
[  186.531895]  [<ffffffffa071fb18>] ? spl_debug_bug+0x8d/0xd6 [spl]
[  186.531914]  [<ffffffffa085db14>] ? txg_wait_synced+0x60/0x257 [zfs]
[  186.531928]  [<ffffffffa08b3d5c>] ? zil_destroy+0x28/0x3b8 [zfs]
[  186.531940]  [<ffffffffa08b43d3>] ? zil_replay+0x33/0x138 [zfs]
[  186.531944]  [<ffffffff8107362e>] ? complete_all+0x31/0x41
[  186.531946]  [<ffffffff812b8096>] ? device_pm_sleep_init+0x3b/0x5d
[  186.531959]  [<ffffffffa08ca619>] ? __zvol_create_minor+0x565/0x7bc [zfs]
[  186.531974]  [<ffffffffa083f09f>] ? refcount_remove_many+0x2ba/0x2ff [zfs]
[  186.531987]  [<ffffffffa08ca8f7>] ? zvol_create_minor+0x87/0x9d [zfs]
[  186.531999]  [<ffffffffa08cc0a7>] ? zvol_create_minors+0x94/0x2ec [zfs]
[  186.532011]  [<ffffffffa08cc29b>] ? zvol_create_minors+0x288/0x2ec [zfs]
[  186.532024]  [<ffffffffa08cc29b>] ? zvol_create_minors+0x288/0x2ec [zfs]
[  186.532039]  [<ffffffffa0852087>] ? spa_import+0x709/0x71a [zfs]
[  186.532043]  [<ffffffffa07235ca>] ? kmem_free_debug+0x124/0x16c [spl]
[  186.532047]  [<ffffffffa07aea42>] ? nvlist_lookup_common+0x4d/0x94 [znvpair]
[  186.532061]  [<ffffffffa0890cce>] ? zfs_ioc_pool_import+0xad/0xf4 [zfs]
[  186.532064]  [<ffffffffa07c8e6f>] ? pool_namecheck+0xd1/0x112 [zcommon]
[  186.532078]  [<ffffffffa0893b68>] ? zfsdev_ioctl+0x39a/0x42e [zfs]
[  186.532081]  [<ffffffff81045eed>] ? do_page_fault+0x3ed/0x439
[  186.532083]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
[  186.532086]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  186.532089]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  186.532091]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
FransUrbo commented 10 years ago

... and the rest of it:

[  251.092164] ZFS: rrl->rr_writer=          (null) (/-1)
[  311.035916] ZFS: rrl->rr_writer=          (null) (/-1)
[  361.436692] INFO: task zpool:11114 blocked for more than 120 seconds.
[  361.436779]       Tainted: P           O 3.12.0+scst+tf.1 #5
[  361.436860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.436973] zpool           D ffff88041fcd4ac0     0 11114  10964 0x20020002
[  361.436977]  ffff88040c39b000 0000000000200086 ffffffff810a2924 ffffffff8106814c
[  361.436980]  ffff88040d034010 0000000000014ac0 0000000000014ac0 ffff88040c39b000
[  361.436983]  0000000000014ac0 ffff88040d035fd8 ffff88040d035fd8 0000000000014ac0
[  361.436986] Call Trace:
[  361.436992]  [<ffffffff810a2924>] ? is_module_text_address+0x5/0xc
[  361.436996]  [<ffffffff8106814c>] ? __kernel_text_address+0x5b/0x61
[  361.436999]  [<ffffffff8101396d>] ? dump_trace+0x249/0x258
[  361.437002]  [<ffffffff81013a92>] ? show_stack_log_lvl+0xf8/0x107
[  361.437008]  [<ffffffffa071fb5f>] ? spl_debug_bug+0xd4/0xd6 [spl]
[  361.437029]  [<ffffffffa085db14>] ? txg_wait_synced+0x60/0x257 [zfs]
[  361.437044]  [<ffffffffa08b3d5c>] ? zil_destroy+0x28/0x3b8 [zfs]
[  361.437058]  [<ffffffffa08b43d3>] ? zil_replay+0x33/0x138 [zfs]
[  361.437061]  [<ffffffff8107362e>] ? complete_all+0x31/0x41
[  361.437064]  [<ffffffff812b8096>] ? device_pm_sleep_init+0x3b/0x5d
[  361.437078]  [<ffffffffa08ca619>] ? __zvol_create_minor+0x565/0x7bc [zfs]
[  361.437095]  [<ffffffffa083f09f>] ? refcount_remove_many+0x2ba/0x2ff [zfs]
[  361.437109]  [<ffffffffa08ca8f7>] ? zvol_create_minor+0x87/0x9d [zfs]
[  361.437127]  [<ffffffffa08cc0a7>] ? zvol_create_minors+0x94/0x2ec [zfs]
[  361.437139]  [<ffffffffa08cc29b>] ? zvol_create_minors+0x288/0x2ec [zfs]
[  361.437152]  [<ffffffffa08cc29b>] ? zvol_create_minors+0x288/0x2ec [zfs]
[  361.437167]  [<ffffffffa0852087>] ? spa_import+0x709/0x71a [zfs]
[  361.437171]  [<ffffffffa07235ca>] ? kmem_free_debug+0x124/0x16c [spl]
[  361.437175]  [<ffffffffa07aea42>] ? nvlist_lookup_common+0x4d/0x94 [znvpair]
[  361.437189]  [<ffffffffa0890cce>] ? zfs_ioc_pool_import+0xad/0xf4 [zfs]
[  361.437192]  [<ffffffffa07c8e6f>] ? pool_namecheck+0xd1/0x112 [zcommon]
[  361.437206]  [<ffffffffa0893b68>] ? zfsdev_ioctl+0x39a/0x42e [zfs]
[  361.437208]  [<ffffffff81045eed>] ? do_page_fault+0x3ed/0x439
[  361.437211]  [<ffffffff81169163>] ? compat_sys_ioctl+0x1bf/0xf30
[  361.437214]  [<ffffffff813c4d93>] ? _raw_spin_unlock+0x9/0xb
[  361.437217]  [<ffffffff8113fa09>] ? __alloc_fd+0xfa/0x10c
[  361.437219]  [<ffffffff813c7909>] ? ia32_do_call+0x13/0x13
behlendorf commented 10 years ago

@FransUrbo Thanks for the quick turnaround, clearly I missed something in the locking. I wasn't able to reproduce this issue in any of my initial testing even with debugging enabled. Is there anything special about the way your zvols are configured? The stack suggest yours are three level deep, pool/dataset/zvol is that so?