Closed disaster123 closed 8 years ago
Well, crap. This v2 was actually spposed to fix a deadlock, not cause one. It hasn't happened for me yet, but it's clear from the trace that the interaction between the cleaner and balance is still broken. Could even be that the patch only works on newer kernels where the locks behave slightly differently. This is not something I'm prepared to investigate, so I've just reverted it again. I'll also send mail to Wang Xiaoguang, maybe the trace tells him something.
btrfs balance fails / deadlocks with:
[ 1080.347007] INFO: task btrfs-cleaner:4934 blocked for more than 120 seconds. [ 1080.389695] Tainted: G O 4.4.19+52-ph #1 [ 1080.432553] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1080.477583] btrfs-cleaner D ffff8807a7677d08 0 4934 2 0x00080000 [ 1080.522884] ffff8807a7677d08 ffff88017cb12500 ffff88083a730000 ffff8807a7678000 [ 1080.569107] ffff88085e6527b8 ffff88085e6527a0 ffffffff00000000 ffffffff00000001 [ 1080.615501] ffff8807a7677d20 ffffffff8d698ce5 ffff88083a730000 ffff8807a7677da0 [ 1080.662157] Call Trace: [ 1080.707754] [] schedule+0x35/0x90
[ 1080.754545] [] rwsem_down_write_failed+0x1f3/0x350
[ 1080.802679] [] call_rwsem_down_write_failed+0x17/0x30
[ 1080.850044] [] down_write+0x24/0x40
[ 1080.898598] [] btrfs_delete_unused_bgs+0x10d/0x5b0 [btrfs]
[ 1080.947295] [] ? schedule+0x34a/0x860
[ 1080.996082] [] cleaner_kthread+0x193/0x1d0 [btrfs]
[ 1081.045930] [] ? btrfs_destroy_pinned_extent+0xb0/0xb0 [btrfs]
[ 1081.096428] [] kthread+0xdb/0x100
[ 1081.146250] [] ? kthread_park+0x60/0x60
[ 1081.197154] [] ret_from_fork+0x3f/0x70
[ 1081.247365] [] ? kthread_park+0x60/0x60
[ 1081.297592] INFO: task btrfs-transacti:4935 blocked for more than 120 seconds.
[ 1081.349202] Tainted: G O 4.4.19+52-ph #1
[ 1081.400795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1081.454142] btrfs-transacti D ffff88082c003dc0 0 4935 2 0x00080000
[ 1081.508614] ffff88082c003dc0 ffff88085df94a00 ffff88083a734a00 ffff88082c004000
[ 1081.563823] ffff88085f2461f0 ffff88085f246000 ffff88085f2461f0 0000000000000000
[ 1081.619051] ffff88082c003dd8 ffffffff8d698ce5 ffff8807a59bfcc0 ffff88082c003e20
[ 1081.674528] Call Trace:
[ 1081.729432] [] schedule+0x35/0x90
[ 1081.785004] [] wait_current_trans.isra.21+0xa1/0x100 [btrfs]
[ 1081.841173] [] ? wait_woken+0x90/0x90
[ 1081.897038] [] start_transaction+0x2aa/0x4f0 [btrfs]
[ 1081.954660] [] btrfs_attach_transaction+0x17/0x20 [btrfs]
[ 1082.013213] [] transaction_kthread+0x19e/0x200 [btrfs]
[ 1082.071554] [] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs]
[ 1082.130514] [] kthread+0xdb/0x100
[ 1082.190059] [] ? kthread_park+0x60/0x60
[ 1082.249265] [] ret_from_fork+0x3f/0x70
[ 1082.307801] [] ? kthread_park+0x60/0x60
[ 1082.366666] INFO: task btrfs:5056 blocked for more than 120 seconds.
[ 1082.427172] Tainted: G O 4.4.19+52-ph #1
[ 1082.486946] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1082.549700] btrfs D ffff8807a761fa38 0 5056 4969 0x00080000
[ 1082.613173] ffff8807a761fa38 ffffffff8dc0e500 ffff88085b882500 ffff8807a7620000
[ 1082.677373] ffff8807a761fa70 ffff8807a59bfcc0 ffff88085c0801f0 ffff88085ba4d000
[ 1082.742097] ffff8807a761fa50 ffffffff8d698ce5 ffff88085c2f5360 ffff8807a761fac0
[ 1082.806215] Call Trace:
[ 1082.869114] [] schedule+0x35/0x90
[ 1082.931851] [] btrfs_commit_transaction+0x275/0xa50 [btrfs]
[ 1082.995089] [] ? start_transaction+0x9a/0x4f0 [btrfs]
[ 1083.057452] [] ? wait_woken+0x90/0x90
[ 1083.118271] [] prepare_to_relocate+0xfe/0x130 [btrfs]
[ 1083.178688] [] relocate_block_group+0x3e/0x5f0 [btrfs]
[ 1083.240171] [] ? btrfs_wait_ordered_range+0xa4/0x120 [btrfs]
[ 1083.302626] [] btrfs_relocate_block_group+0x1b8/0x2a0 [btrfs]
[ 1083.364903] [] btrfs_relocate_chunk.isra.38+0x49/0xd0 [btrfs]
[ 1083.426716] [] __btrfs_balance+0x5a1/0xc30 [btrfs]
[ 1083.488788] [] btrfs_balance+0x2a9/0x630 [btrfs]
[ 1083.549880] [] btrfs_ioctl_balance+0x172/0x380 [btrfs]
[ 1083.611466] [] btrfs_ioctl+0x5b4/0x2b60 [btrfs]
[ 1083.672560] [] ? mem_cgroup_try_charge+0x9c/0x1b0
[ 1083.733353] [] ? lru_cache_add_active_or_unevictable+0x27/0xa0
[ 1083.794509] [] ? handle_mm_fault+0xd2a/0x1890
[ 1083.855169] [] ? acct_account_cputime+0x1c/0x20
[ 1083.915838] [] ? account_user_time+0x5f/0x80
[ 1083.976570] [] do_vfs_ioctl+0x2ba/0x490
[ 1084.037331] [] ? context_tracking_exit+0x1d/0x20
[ 1084.098228] [] ? enter_from_user_mode+0x1f/0x50
[ 1084.158428] [] ? syscall_trace_enter_phase1+0xbc/0x110
[ 1084.218444] [] SyS_ioctl+0x41/0x70
[ 1084.277619] [] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1204.344663] INFO: task btrfs-cleaner:4934 blocked for more than 120 seconds.
[ 1204.344664] Tainted: G O 4.4.19+52-ph #1
[ 1204.344664] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1204.344666] btrfs-cleaner D ffff8807a7677d08 0 4934 2 0x00080000
[ 1204.344668] ffff8807a7677d08 ffff88017cb12500 ffff88083a730000 ffff8807a7678000
[ 1204.344669] ffff88085e6527b8 ffff88085e6527a0 ffffffff00000000 ffffffff00000001
[ 1204.344669] ffff8807a7677d20 ffffffff8d698ce5 ffff88083a730000 ffff8807a7677da0
[ 1204.344670] Call Trace:
[ 1204.344674] [] schedule+0x35/0x90
[ 1204.344676] [] rwsem_down_write_failed+0x1f3/0x350
[ 1204.344678] [] call_rwsem_down_write_failed+0x17/0x30
[ 1204.344679] [] down_write+0x24/0x40
[ 1204.344709] [] btrfs_delete_unused_bgs+0x10d/0x5b0 [btrfs]
[ 1204.344710] [] ? schedule+0x34a/0x860
[ 1204.344718] [] cleaner_kthread+0x193/0x1d0 [btrfs]
[ 1204.344726] [] ? btrfs_destroy_pinned_extent+0xb0/0xb0 [btrfs]
[ 1204.344728] [] kthread+0xdb/0x100
[ 1204.344729] [] ? kthread_park+0x60/0x60
[ 1204.344731] [] ret_from_fork+0x3f/0x70
[ 1204.344732] [] ? kthread_park+0x60/0x60
[ 1204.344733] INFO: task btrfs-transacti:4935 blocked for more than 120 seconds.
[ 1204.344733] Tainted: G O 4.4.19+52-ph #1
[ 1204.344733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1204.344735] btrfs-transacti D ffff88082c003dc0 0 4935 2 0x00080000
[ 1204.344735] ffff88082c003dc0 ffff88085df94a00 ffff88083a734a00 ffff88082c004000
[ 1204.344736] ffff88085f2461f0 ffff88085f246000 ffff88085f2461f0 0000000000000000
[ 1204.344737] ffff88082c003dd8 ffffffff8d698ce5 ffff8807a59bfcc0 ffff88082c003e20
[ 1204.344737] Call Trace:
[ 1204.344738] [] schedule+0x35/0x90
[ 1204.344746] [] wait_current_trans.isra.21+0xa1/0x100 [btrfs]
[ 1204.344747] [] ? wait_woken+0x90/0x90
[ 1204.344756] [] start_transaction+0x2aa/0x4f0 [btrfs]
[ 1204.344763] [] btrfs_attach_transaction+0x17/0x20 [btrfs]
[ 1204.344770] [] transaction_kthread+0x19e/0x200 [btrfs]
[ 1204.344777] [] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs]
[ 1204.344778] [] kthread+0xdb/0x100
[ 1204.344779] [] ? kthread_park+0x60/0x60
[ 1204.344780] [] ret_from_fork+0x3f/0x70
[ 1204.344781] [] ? kthread_park+0x60/0x60
[ 1204.344783] INFO: task btrfs:5056 blocked for more than 120 seconds.
[ 1204.344783] Tainted: G O 4.4.19+52-ph #1
[ 1204.344783] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1204.344784] btrfs D ffff8807a761fa38 0 5056 4969 0x00080000
[ 1204.344785] ffff8807a761fa38 ffffffff8dc0e500 ffff88085b882500 ffff8807a7620000
[ 1204.344786] ffff8807a761fa70 ffff8807a59bfcc0 ffff88085c0801f0 ffff88085ba4d000
[ 1204.344786] ffff8807a761fa50 ffffffff8d698ce5 ffff88085c2f5360 ffff8807a761fac0
[ 1204.344786] Call Trace:
[ 1204.344787] [] schedule+0x35/0x90
[ 1204.344794] [] btrfs_commit_transaction+0x275/0xa50 [btrfs]
[ 1204.344801] [] ? start_transaction+0x9a/0x4f0 [btrfs]
[ 1204.344802] [] ? wait_woken+0x90/0x90
[ 1204.344811] [] prepare_to_relocate+0xfe/0x130 [btrfs]
[ 1204.344820] [] relocate_block_group+0x3e/0x5f0 [btrfs]
[ 1204.344829] [] ? btrfs_wait_ordered_range+0xa4/0x120 [btrfs]
[ 1204.344837] [] btrfs_relocate_block_group+0x1b8/0x2a0 [btrfs]
[ 1204.344846] [] btrfs_relocate_chunk.isra.38+0x49/0xd0 [btrfs]
[ 1204.344854] [] btrfs_balance+0x5a1/0xc30 [btrfs]
[ 1204.344862] [] btrfs_balance+0x2a9/0x630 [btrfs]
[ 1204.344871] [] btrfs_ioctl_balance+0x172/0x380 [btrfs]
[ 1204.344879] [] btrfs_ioctl+0x5b4/0x2b60 [btrfs]
[ 1204.344881] [] ? mem_cgroup_try_charge+0x9c/0x1b0
[ 1204.344883] [] ? lru_cache_add_active_or_unevictable+0x27/0xa0
[ 1204.344884] [] ? handle_mm_fault+0xd2a/0x1890
[ 1204.344886] [] ? acct_account_cputime+0x1c/0x20
[ 1204.344888] [] ? account_user_time+0x5f/0x80
[ 1204.344891] [] do_vfs_ioctl+0x2ba/0x490
[ 1204.344893] [] ? context_tracking_exit+0x1d/0x20
[ 1204.344894] [] ? enter_from_user_mode+0x1f/0x50
[ 1204.344895] [] ? syscall_trace_enter_phase1+0xbc/0x110
[ 1204.344896] [] SyS_ioctl+0x41/0x70
[ 1204.344897] [] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1324.351978] INFO: task btrfs-cleaner:4934 blocked for more than 120 seconds.
[ 1324.465568] Tainted: G O 4.4.19+52-ph #1
[ 1324.512239] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1324.560597] btrfs-cleaner D ffff8807a7677d08 0 4934 2 0x00080000
[ 1324.609132] ffff8807a7677d08 ffff88017cb12500 ffff88083a730000 ffff8807a7678000
[ 1324.658439] ffff88085e6527b8 ffff88085e6527a0 ffffffff00000000 ffffffff00000001
[ 1324.707093] ffff8807a7677d20 ffffffff8d698ce5 ffff88083a730000 ffff8807a7677da0
[ 1324.755894] Call Trace:
[ 1324.803194] [] schedule+0x35/0x90
[ 1324.852174] [] rwsem_down_write_failed+0x1f3/0x350
[ 1324.900393] [] call_rwsem_down_write_failed+0x17/0x30
[ 1324.949516] [] down_write+0x24/0x40
[ 1324.997415] [] btrfs_delete_unused_bgs+0x10d/0x5b0 [btrfs]
[ 1325.047090] [] ? __schedule+0x34a/0x860
[ 1325.095979] [] cleaner_kthread+0x193/0x1d0 [btrfs]
[ 1325.145282] [] ? btrfs_destroy_pinned_extent+0xb0/0xb0 [btrfs]
[ 1325.195504] [] kthread+0xdb/0x100
[ 1325.245458] [] ? kthread_park+0x60/0x60
[ 1325.295174] [] ret_from_fork+0x3f/0x70
[ 1325.344522] [] ? kthread_park+0x60/0x60
[ 1325.393559] INFO: task btrfs-transacti:4935 blocked for more than 120 seconds.
[ 1325.443710] Tainted: G O 4.4.19+52-ph #1
[ 1325.494341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1325.546291] btrfs-transacti D ffff88082c003dc0 0 4935 2 0x00080000
[ 1325.599179] ffff88082c003dc0 ffff88085df94a00 ffff88083a734a00 ffff88082c004000
[ 1325.652999] ffff88085f2461f0 ffff88085f246000 ffff88085f2461f0 0000000000000000
[ 1325.706312] ffff88082c003dd8 ffffffff8d698ce5 ffff8807a59bfcc0 ffff88082c003e20
[ 1325.759922] Call Trace:
[ 1325.812530] [] schedule+0x35/0x90
[ 1325.865164] [] wait_current_trans.isra.21+0xa1/0x100 [btrfs]
[ 1325.919795] [] ? wait_woken+0x90/0x90
[ 1325.973947] [] start_transaction+0x2aa/0x4f0 [btrfs]
[ 1326.029055] [] btrfs_attach_transaction+0x17/0x20 [btrfs]
[ 1326.085454] [] transaction_kthread+0x19e/0x200 [btrfs]
[ 1326.141747] [] ? btrfs_cleanup_transaction+0x540/0x540 [btrfs]
[ 1326.198500] [] kthread+0xdb/0x100
[ 1326.255931] [] ? kthread_park+0x60/0x60
[ 1326.313867] [] ret_from_fork+0x3f/0x70
[ 1326.371424] [] ? kthread_park+0x60/0x60
[ 1326.429558] INFO: task btrfs:5056 blocked for more than 120 seconds.
[ 1326.488241] Tainted: G O 4.4.19+52-ph #1
[ 1326.548042] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1326.609739] btrfs D ffff8807a761fa38 0 5056 4969 0x00080000
[ 1326.672491] ffff8807a761fa38 ffffffff8dc0e500 ffff88085b882500 ffff8807a7620000
[ 1326.736520] ffff8807a761fa70 ffff8807a59bfcc0 ffff88085c0801f0 ffff88085ba4d000
[ 1326.800730] ffff8807a761fa50 ffffffff8d698ce5 ffff88085c2f5360 ffff8807a761fac0
[ 1326.864090] Call Trace:
[ 1326.927625] [] schedule+0x35/0x90
[ 1326.990810] [] btrfs_commit_transaction+0x275/0xa50 [btrfs]
[ 1327.053897] [] ? start_transaction+0x9a/0x4f0 [btrfs]
[ 1327.116552] [] ? wait_woken+0x90/0x90
[ 1327.178074] [] prepare_to_relocate+0xfe/0x130 [btrfs]
[ 1327.239454] [] relocate_block_group+0x3e/0x5f0 [btrfs]
[ 1327.301737] [] ? btrfs_wait_ordered_range+0xa4/0x120 [btrfs]
[ 1327.364502] [] btrfs_relocate_block_group+0x1b8/0x2a0 [btrfs]
[ 1327.426902] [] btrfs_relocate_chunk.isra.38+0x49/0xd0 [btrfs]
[ 1327.488863] [] btrfs_balance+0x5a1/0xc30 [btrfs]
[ 1327.550433] [] btrfs_balance+0x2a9/0x630 [btrfs]
[ 1327.611484] [] btrfs_ioctl_balance+0x172/0x380 [btrfs]
[ 1327.672394] [] btrfs_ioctl+0x5b4/0x2b60 [btrfs]
[ 1327.732483] [] ? mem_cgroup_try_charge+0x9c/0x1b0
[ 1327.792942] [] ? lru_cache_add_active_or_unevictable+0x27/0xa0
[ 1327.854017] [] ? handle_mm_fault+0xd2a/0x1890
[ 1327.914841] [] ? acct_account_cputime+0x1c/0x20
[ 1327.975319] [] ? account_user_time+0x5f/0x80
[ 1328.035877] [] do_vfs_ioctl+0x2ba/0x490
[ 1328.096154] [] ? context_tracking_exit+0x1d/0x20
[ 1328.156838] [] ? enter_from_user_mode+0x1f/0x50
[ 1328.217183] [] ? syscall_trace_enter_phase1+0xbc/0x110
[ 1328.277246] [] SyS_ioctl+0x41/0x70
[ 1328.336080] [] entry_SYSCALL_64_fastpath+0x12/0x71