Open kdave opened 3 years ago
And because the background reclaiming runs all the time I can't start the balance manually.
So after some time the system chunks get reclaimed but it's still too slow.
On one filesytem I removed all the data and let it finish the autoreclaim, numbers were decreasing. I managed to sneak in one balance command that sped it up and now all the numbers are almost zero (expected values).
There's a final lockup after all block groups are filled:
# btrfs fi df /mnt
Data, single: total=7.35GiB, used=194.12MiB, zone_unusable=7.16GiB
System, single: total=178.00MiB, used=688.00KiB, zone_unusable=176.08MiB
Metadata, single: total=2.47GiB, used=3.27MiB, zone_unusable=2.44GiB
GlobalReserve, single: total=3.25MiB, used=0.00B
[ 7379.192958] INFO: task kworker/u16:8:2479 blocked for more than 491 seconds.
[ 7379.200225] Tainted: G W 5.13.0-git+ #779
[ 7379.206198] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7379.214240] task:kworker/u16:8 state:D stack:10216 pid: 2479 ppid: 2 flags:0x00004000
[ 7379.214264] Workqueue: events_unbound btrfs_reclaim_bgs_work [btrfs]
[ 7379.222870] Call Trace:
[ 7379.222879] __schedule+0x27d/0x730
[ 7379.222897] ? lock_acquire+0xa0/0x150
[ 7379.222910] ? btrfs_wait_ordered_roots+0x66/0x240 [btrfs]
[ 7379.223075] schedule+0x59/0xc0
[ 7379.223085] schedule_preempt_disabled+0x11/0x20
[ 7379.223091] __mutex_lock+0x2d0/0x910
[ 7379.223105] btrfs_wait_ordered_roots+0x66/0x240 [btrfs]
[ 7379.223269] ? btrfs_wait_nocow_writers+0x33/0xc0 [btrfs]
[ 7379.223461] btrfs_relocate_block_group+0x165/0x340 [btrfs]
[ 7379.223655] btrfs_relocate_chunk+0x38/0x120 [btrfs]
[ 7379.223849] btrfs_reclaim_bgs_work.cold+0x73/0x167 [btrfs]
[ 7379.224073] process_one_work+0x262/0x600
[ 7379.224097] worker_thread+0x4c/0x320
[ 7379.224107] ? process_one_work+0x600/0x600
[ 7379.224116] kthread+0x135/0x160
[ 7379.224127] ? set_kthread_struct+0x40/0x40
[ 7379.224135] ret_from_fork+0x1f/0x30
[ 7379.224173] INFO: task kworker/u16:7:16155 blocked for more than 491 seconds.
[ 7379.224178] Tainted: G W 5.13.0-git+ #779
[ 7379.224182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7379.224186] task:kworker/u16:7 state:D stack: 9608 pid:16155 ppid: 2 flags:0x00004000
[ 7379.224203] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[ 7379.224412] Call Trace:
[ 7379.224418] __schedule+0x27d/0x730
[ 7379.224431] ? usleep_range+0x90/0x90
[ 7379.224438] schedule+0x59/0xc0
[ 7379.224450] schedule_timeout+0xed/0x130
[ 7379.224460] ? lock_acquire+0xa0/0x150
[ 7379.224466] ? __wait_for_common+0x4f/0x170
[ 7379.224477] ? lock_release+0x9c/0x140
[ 7379.224486] ? trace_hardirqs_on+0x1b/0xf0
[ 7379.224495] __wait_for_common+0xc2/0x170
[ 7379.224509] btrfs_wait_ordered_extents+0x3d6/0x470 [btrfs]
[ 7379.224707] btrfs_wait_ordered_roots+0x175/0x240 [btrfs]
[ 7379.394482] shrink_delalloc+0x12b/0x1f0 [btrfs]
[ 7379.394598] flush_space+0x1bf/0x2f0 [btrfs]
[ 7379.403802] btrfs_async_reclaim_data_space+0xb2/0x150 [btrfs]
[ 7379.409866] process_one_work+0x262/0x600
[ 7379.409875] worker_thread+0x4c/0x320
[ 7379.409878] ? process_one_work+0x600/0x600
[ 7379.409883] kthread+0x135/0x160
[ 7379.409887] ? set_kthread_struct+0x40/0x40
[ 7379.409892] ret_from_fork+0x1f/0x30
[ 7379.409935] INFO: task kworker/u16:5:2964 blocked for more than 491 seconds.
[ 7379.440534] Tainted: G W 5.13.0-git+ #779
[ 7379.440537] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7379.440539] task:kworker/u16:5 state:D stack:10472 pid: 2964 ppid: 2 flags:0x00004000
[ 7379.462995] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[ 7379.463113] Call Trace:
[ 7379.463116] __schedule+0x27d/0x730
[ 7379.475870] schedule+0x59/0xc0
[ 7379.475875] io_schedule+0x42/0x70
[ 7379.482723] wait_on_page_bit_common+0x105/0x380
[ 7379.482733] ? page_cache_delete_batch+0x160/0x160
[ 7379.482739] extent_write_cache_pages+0x339/0x450 [btrfs]
[ 7379.497966] extent_writepages+0x54/0xc0 [btrfs]
[ 7379.502800] do_writepages+0x35/0xd0
[ 7379.502808] ? lock_release+0x9c/0x140
[ 7379.510440] __filemap_fdatawrite_range+0xce/0x110
[ 7379.510447] ? __smp_call_single_queue+0x39/0x70
[ 7379.520158] btrfs_start_ordered_extent+0x98/0x1d0 [btrfs]
[ 7379.525850] ? lock_acquire+0xa0/0x150
[ 7379.529701] ? btrfs_work_helper+0x5e/0x1d0 [btrfs]
[ 7379.534767] btrfs_run_ordered_extent_work+0x1a/0x30 [btrfs]
[ 7379.540600] btrfs_work_helper+0xd5/0x1d0 [btrfs]
[ 7379.545477] process_one_work+0x262/0x600
[ 7379.549581] worker_thread+0x4c/0x320
[ 7379.553334] ? process_one_work+0x600/0x600
[ 7379.557613] kthread+0x135/0x160
[ 7379.560946] ? set_kthread_struct+0x40/0x40
[ 7379.565230] ret_from_fork+0x1f/0x30
[ 7379.568950] INFO: task kworker/u16:12:25822 blocked for more than 491 seconds.
[ 7379.576304] Tainted: G W 5.13.0-git+ #779
[ 7379.576306] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7379.576308] task:kworker/u16:12 state:D stack:10032 pid:25822 ppid: 2 flags:0x00004000
[ 7379.576320] Workqueue: writeback wb_workfn (flush-btrfs-6)
[ 7379.576328] Call Trace:
[ 7379.576331] __schedule+0x27d/0x730
[ 7379.576338] schedule+0x59/0xc0
[ 7379.576342] io_schedule+0x42/0x70
[ 7379.576346] wait_on_page_bit_common+0x105/0x380
[ 7379.576353] ? page_cache_delete_batch+0x160/0x160
[ 7379.626731] extent_write_cache_pages+0x339/0x450 [btrfs]
[ 7379.632332] extent_writepages+0x54/0xc0 [btrfs]
[ 7379.632418] do_writepages+0x35/0xd0
[ 7379.640932] ? lock_acquired+0xa6/0x150
[ 7379.640936] ? lock_release+0x9c/0x140
[ 7379.640939] ? lock_acquire+0xa0/0x150
[ 7379.652643] ? writeback_sb_inodes+0x152/0x550
[ 7379.657194] __writeback_single_inode+0x61/0x400
[ 7379.661918] writeback_sb_inodes+0x20b/0x550
[ 7379.666297] __writeback_inodes_wb+0x4c/0xe0
[ 7379.670669] wb_writeback+0x2ee/0x4c0
[ 7379.674438] wb_do_writeback+0x21b/0x230
[ 7379.678470] wb_workfn+0x7a/0x2c0
[ 7379.681889] ? put_prev_task_fair+0x21/0x30
[ 7379.686173] ? pick_next_task+0xbb6/0xdd0
[ 7379.690288] ? lock_acquire+0xa0/0x150
[ 7379.694138] ? finish_task_switch.isra.0+0x76/0x380
[ 7379.699124] ? lock_acquire+0xa0/0x150
[ 7379.702975] ? process_one_work+0x1df/0x600
[ 7379.707259] process_one_work+0x262/0x600
[ 7379.711377] worker_thread+0x4c/0x320
[ 7379.715142] ? process_one_work+0x600/0x600
[ 7379.719425] kthread+0x135/0x160
[ 7379.722750] ? set_kthread_struct+0x40/0x40
[ 7379.727045] ret_from_fork+0x1f/0x30
[ 7379.730748] INFO: lockdep is turned off.
...
With 2MB zone size and a git workload, I'm observing that the system block group does not get reclaimed:
The log is full of the reclaiming messages but it's just metadata or data, so system will fill the whole system sooner or later (total size is 10G).
Same on a fs with 4MB zones:
The initial offset of the system bg is 8388608 so the relocation would not pick it soon enough and it may need to be prioritized. After the test ran a few more minutes I found first occurence of the system chunk relocation, but just for one of the filesystems.