Open MaisenbacherD opened 3 years ago
Hi Dennis,
Can you do me a favour? Can you dump the content of /sys/kernel/debug/block/nvme0n1/*
Hi Johannes,
Sure here is the dump:
rg "" /sys/kernel/debug/block/nvme0n1/
poll_stat:read (512 Bytes): samples=0
poll_stat:write (512 Bytes): samples=0
poll_stat:read (1024 Bytes): samples=0
poll_stat:write (1024 Bytes): samples=0
poll_stat:read (2048 Bytes): samples=0
poll_stat:write (2048 Bytes): samples=0
poll_stat:read (4096 Bytes): samples=0
poll_stat:write (4096 Bytes): samples=0
poll_stat:read (8192 Bytes): samples=0
poll_stat:write (8192 Bytes): samples=0
poll_stat:read (16384 Bytes): samples=0
poll_stat:write (16384 Bytes): samples=0
poll_stat:read (32768 Bytes): samples=0
poll_stat:write (32768 Bytes): samples=0
poll_stat:read (65536 Bytes): samples=0
poll_stat:write (65536 Bytes): samples=0
state:SAME_COMP|NONROT|IO_STAT|DISCARD|INIT_DONE|STATS|REGISTERED|PCI_P2PDMA|ZONE_RESETALL|NOWAIT
write_hints:hint0: 0
write_hints:hint1: 0
write_hints:hint2: 0
write_hints:hint3: 0
write_hints:hint4: 0
pm_only:0
hctx1/sched_tags_bitmap:00000000: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/sched_tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/state:
hctx0/flags:alloc_policy=FIFO SHOULD_MERGE
hctx0/cpu0/merged:0
hctx0/cpu0/dispatched:676 24
hctx0/cpu0/completed:676 24
hctx0/ctx_map:00000000: 00
hctx0/tags:nr_tags=1023
hctx0/tags:nr_reserved_tags=0
hctx0/tags:active_queues=0
hctx0/tags:
hctx0/tags:bitmap_tags:
hctx0/tags:depth=1023
hctx0/tags:busy=0
hctx0/tags:cleared=20
hctx0/tags:bits_per_word=64
hctx0/tags:map_nr=16
hctx0/tags:alloc_hint={595, 645, 437, 333}
hctx0/tags:wake_batch=8
hctx0/tags:wake_index=0
hctx0/tags:ws_active=0
hctx0/tags:ws={
hctx0/tags: {.wait_cnt=8, .wait=inactive},
hctx0/tags: {.wait_cnt=8, .wait=inactive},
hctx0/tags: {.wait_cnt=8, .wait=inactive},
hctx0/tags: {.wait_cnt=8, .wait=inactive},
hctx0/tags: {.wait_cnt=8, .wait=inactive},
hctx0/tags: {.wait_cnt=8, .wait=inactive},
hctx0/tags: {.wait_cnt=8, .wait=inactive},
hctx0/tags: {.wait_cnt=8, .wait=inactive},
hctx0/tags:}
hctx0/tags:round_robin=0
hctx0/tags:min_shallow_depth=4294967295
hctx0/tags_bitmap:00000000: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/tags_bitmap:00000020: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/tags_bitmap:00000030: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/tags_bitmap:00000040: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/tags_bitmap:00000050: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/tags_bitmap:00000060: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/tags_bitmap:00000070: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/sched_tags_bitmap:00000000: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/sched_tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
hctx0/io_poll:considered=0
hctx0/io_poll:invoked=0
hctx0/io_poll:success=0
hctx0/sched_tags:nr_tags=256
hctx0/sched_tags:nr_reserved_tags=0
hctx0/sched_tags:active_queues=0
hctx0/sched_tags:
hctx0/sched_tags:bitmap_tags:
hctx0/sched_tags:depth=256
hctx0/sched_tags:busy=0
hctx0/sched_tags:cleared=29
hctx0/sched_tags:bits_per_word=64
hctx0/sched_tags:map_nr=4
hctx0/sched_tags:alloc_hint={220, 30, 105, 40}
hctx0/sched_tags:wake_batch=8
hctx0/sched_tags:wake_index=0
hctx0/sched_tags:ws_active=0
hctx0/sched_tags:ws={
hctx0/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx0/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx0/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx0/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx0/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx0/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx0/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx0/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx0/sched_tags:}
hctx0/sched_tags:round_robin=0
hctx0/sched_tags:min_shallow_depth=192
hctx0/queued:700
hctx0/dispatched: 0 0
hctx0/dispatched: 1 617
hctx0/dispatched: 2 0
hctx0/dispatched: 4 0
hctx0/dispatched: 8 0
hctx0/dispatched: 16 0
hctx0/dispatched: 32+ 0
hctx0/run:670
hctx0/dispatch_busy:0
hctx0/type:default
hctx0/active:0
hctx1/state:
hctx1/flags:alloc_policy=FIFO SHOULD_MERGE
hctx1/ctx_map:00000000: 00
hctx1/tags:nr_tags=1023
hctx1/tags:nr_reserved_tags=0
hctx1/tags:active_queues=0
hctx1/tags:
hctx1/tags:bitmap_tags:
hctx1/tags:depth=1023
hctx1/tags:busy=0
hctx1/tags:cleared=29
hctx1/tags:bits_per_word=64
hctx1/tags:map_nr=16
hctx1/tags:alloc_hint={865, 860, 824, 140}
hctx1/tags:wake_batch=8
hctx1/tags:wake_index=0
hctx1/tags:ws_active=0
hctx1/tags:ws={
hctx1/tags: {.wait_cnt=8, .wait=inactive},
hctx1/tags: {.wait_cnt=8, .wait=inactive},
hctx1/tags: {.wait_cnt=8, .wait=inactive},
hctx1/tags: {.wait_cnt=8, .wait=inactive},
hctx1/tags: {.wait_cnt=8, .wait=inactive},
hctx1/tags: {.wait_cnt=8, .wait=inactive},
hctx1/tags: {.wait_cnt=8, .wait=inactive},
hctx1/tags: {.wait_cnt=8, .wait=inactive},
hctx1/tags:}
hctx1/tags:round_robin=0
hctx1/tags:min_shallow_depth=4294967295
hctx1/tags_bitmap:00000000: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/tags_bitmap:00000020: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/tags_bitmap:00000030: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/tags_bitmap:00000040: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/tags_bitmap:00000050: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/tags_bitmap:00000060: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/tags_bitmap:00000070: 0000 0000 0000 0000 0000 0000 0000 0000
hctx1/io_poll:considered=0
hctx1/io_poll:invoked=0
hctx1/io_poll:success=0
hctx1/sched_tags:nr_tags=256
hctx1/sched_tags:nr_reserved_tags=0
hctx1/sched_tags:active_queues=0
hctx1/sched_tags:
hctx1/sched_tags:bitmap_tags:
hctx1/sched_tags:depth=256
hctx1/sched_tags:busy=0
hctx1/sched_tags:cleared=29
hctx1/sched_tags:bits_per_word=64
hctx1/sched_tags:map_nr=4
hctx1/sched_tags:alloc_hint={25, 94, 77, 111}
hctx1/sched_tags:wake_batch=8
hctx1/sched_tags:wake_index=0
hctx1/sched_tags:ws_active=0
hctx1/sched_tags:ws={
hctx1/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx1/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx1/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx1/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx1/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx1/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx1/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx1/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx1/sched_tags:}
hctx1/sched_tags:round_robin=0
hctx1/sched_tags:min_shallow_depth=192
hctx1/run:3949
hctx1/active:0
hctx1/queued:3691
hctx1/dispatch_busy:0
hctx1/dispatched: 0 0
hctx1/dispatched: 1 3678
hctx1/dispatched: 2 0
hctx1/dispatched: 4 0
hctx1/dispatched: 8 0
hctx1/dispatched: 16 0
hctx1/dispatched: 32+ 0
hctx1/type:default
hctx1/cpu1/dispatched:3690 1
hctx2/queued:4312
hctx2/state:
hctx2/flags:alloc_policy=FIFO SHOULD_MERGE
hctx2/dispatched: 0 0
hctx2/dispatched: 1 4296
hctx2/dispatched: 2 2
hctx2/dispatched: 4 0
hctx2/dispatched: 8 1
hctx2/dispatched: 16 0
hctx2/dispatched: 32+ 0
hctx2/tags_bitmap:00000000: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/tags_bitmap:00000020: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/tags_bitmap:00000030: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/tags_bitmap:00000040: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/tags_bitmap:00000050: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/tags_bitmap:00000060: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/tags_bitmap:00000070: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/ctx_map:00000000: 00
hctx2/tags:nr_tags=1023
hctx2/tags:nr_reserved_tags=0
hctx2/tags:active_queues=0
hctx2/tags:
hctx2/tags:bitmap_tags:
hctx2/tags:depth=1023
hctx2/tags:busy=0
hctx2/tags:cleared=22
hctx2/tags:bits_per_word=64
hctx2/tags:map_nr=16
hctx2/tags:alloc_hint={487, 117, 149, 543}
hctx2/tags:wake_batch=8
hctx2/tags:wake_index=0
hctx2/tags:ws_active=0
hctx2/tags:ws={
hctx2/tags: {.wait_cnt=8, .wait=inactive},
hctx2/tags: {.wait_cnt=8, .wait=inactive},
hctx2/tags: {.wait_cnt=8, .wait=inactive},
hctx2/tags: {.wait_cnt=8, .wait=inactive},
hctx2/tags: {.wait_cnt=8, .wait=inactive},
hctx2/tags: {.wait_cnt=8, .wait=inactive},
hctx2/tags: {.wait_cnt=8, .wait=inactive},
hctx2/tags: {.wait_cnt=8, .wait=inactive},
hctx2/tags:}
hctx2/tags:round_robin=0
hctx2/tags:min_shallow_depth=4294967295
hctx2/run:4655
hctx2/active:0
hctx2/sched_tags:nr_tags=256
hctx2/sched_tags:nr_reserved_tags=0
hctx2/sched_tags:active_queues=0
hctx2/sched_tags:
hctx2/sched_tags:bitmap_tags:
hctx2/sched_tags:depth=256
hctx2/sched_tags:busy=0
hctx2/sched_tags:cleared=57
hctx2/sched_tags:bits_per_word=64
hctx2/sched_tags:map_nr=4
hctx2/sched_tags:alloc_hint={204, 18, 84, 249}
hctx2/sched_tags:wake_batch=8
hctx2/sched_tags:wake_index=0
hctx2/sched_tags:ws_active=0
hctx2/sched_tags:ws={
hctx2/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx2/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx2/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx2/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx2/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx2/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx2/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx2/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx2/sched_tags:}
hctx2/sched_tags:round_robin=0
hctx2/sched_tags:min_shallow_depth=192
hctx2/io_poll:considered=0
hctx2/io_poll:invoked=0
hctx2/io_poll:success=0
hctx2/type:default
hctx2/sched_tags_bitmap:00000000: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/sched_tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
hctx2/dispatch_busy:0
hctx1/cpu1/merged:0
hctx1/cpu1/completed:3690 1
hctx2/cpu2/dispatched:4295 17
hctx2/cpu2/completed:4295 17
hctx2/cpu2/merged:0
hctx3/active:0
hctx3/dispatched: 0 0
hctx3/dispatched: 1 2660
hctx3/dispatched: 2 0
hctx3/dispatched: 4 0
hctx3/dispatched: 8 0
hctx3/dispatched: 16 0
hctx3/dispatched: 32+ 0
sched/owned_by_driver:0 0 0
hctx3/sched_tags_bitmap:00000000: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/sched_tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/state:
sched/batching:1
hctx3/ctx_map:00000000: 00
hctx3/flags:alloc_policy=FIFO SHOULD_MERGE
sched/starved:0
hctx3/tags:nr_tags=1023
hctx3/tags:nr_reserved_tags=0
hctx3/tags:active_queues=0
hctx3/tags:
hctx3/tags:bitmap_tags:
hctx3/tags:depth=1023
hctx3/tags:busy=0
hctx3/tags:cleared=16
hctx3/tags:bits_per_word=64
hctx3/tags:map_nr=16
hctx3/tags:alloc_hint={22, 494, 538, 271}
hctx3/tags:wake_batch=8
hctx3/tags:wake_index=0
hctx3/tags:ws_active=0
hctx3/tags:ws={
hctx3/tags: {.wait_cnt=8, .wait=inactive},
hctx3/tags: {.wait_cnt=8, .wait=inactive},
hctx3/tags: {.wait_cnt=8, .wait=inactive},
hctx3/tags: {.wait_cnt=8, .wait=inactive},
hctx3/tags: {.wait_cnt=8, .wait=inactive},
hctx3/tags: {.wait_cnt=8, .wait=inactive},
hctx3/tags: {.wait_cnt=8, .wait=inactive},
hctx3/tags: {.wait_cnt=8, .wait=inactive},
hctx3/tags:}
hctx3/tags:round_robin=0
hctx3/tags:min_shallow_depth=4294967295
sched/async_depth:192
hctx3/tags_bitmap:00000000: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/tags_bitmap:00000020: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/tags_bitmap:00000030: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/tags_bitmap:00000040: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/tags_bitmap:00000050: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/tags_bitmap:00000060: 0000 0000 0000 0000 0000 0000 0000 0000
hctx3/tags_bitmap:00000070: 0000 0000 0000 0000 0000 0000 0000 0000
sched/queued:0 0 0
hctx3/io_poll:considered=0
hctx3/io_poll:invoked=0
hctx3/io_poll:success=0
hctx3/queued:2660
hctx3/sched_tags:nr_tags=256
hctx3/sched_tags:nr_reserved_tags=0
hctx3/sched_tags:active_queues=0
hctx3/sched_tags:
hctx3/sched_tags:bitmap_tags:
hctx3/sched_tags:depth=256
hctx3/sched_tags:busy=0
hctx3/sched_tags:cleared=16
hctx3/sched_tags:bits_per_word=64
hctx3/sched_tags:map_nr=4
hctx3/sched_tags:alloc_hint={104, 142, 94, 177}
hctx3/sched_tags:wake_batch=8
hctx3/sched_tags:wake_index=0
hctx3/sched_tags:ws_active=0
hctx3/sched_tags:ws={
hctx3/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx3/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx3/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx3/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx3/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx3/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx3/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx3/sched_tags: {.wait_cnt=8, .wait=inactive},
hctx3/sched_tags:}
hctx3/sched_tags:round_robin=0
hctx3/sched_tags:min_shallow_depth=192
hctx3/dispatch_busy:0
rqos/wbt/curr_win_nsec:100000000
rqos/wbt/id:0
hctx3/run:2874
rqos/wbt/min_lat_nsec:2000000
rqos/wbt/enabled:1
rqos/wbt/wb_normal:8
rqos/wbt/inflight:0: inflight 0
rqos/wbt/inflight:1: inflight 0
rqos/wbt/inflight:2: inflight 0
rqos/wbt/unknown_cnt:0
hctx3/type:default
rqos/wbt/wb_background:4
hctx3/cpu3/merged:0
hctx3/cpu3/dispatched:2655 5
hctx3/cpu3/completed:2655 5
@morbidrsa @naota @damien-lemoal
Do you know if ZNS should work now? Which branch should Dennis test ?
@MatiasBjorling @MaisenbacherD I think it is the same problem @kawasaki is hitting on null_blk with xfstests' generic/127, so it's not ZNS specific but something different.
It looks like page write back is stuck somewhere (or the write back bit is not getting cleared).
@MaisenbacherD I think I do have a temporary workaround for you:
echo 0 > /sys/fs/btrfs/$FSID/bg_reclaim_threshold
Hope it helps, fingers crossed
Or even better, can you try with this patch:
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index c917867a4261..9cdef5e8f6b7 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1915,6 +1915,7 @@ int btrfs_zone_finish(struct btrfs_block_group *block_group)
block_group->alloc_offset = block_group->zone_capacity;
block_group->free_space_ctl->free_space = 0;
btrfs_clear_treelog_bg(block_group);
+ btrfs_clear_data_reloc_bg(block_group);
spin_unlock(&block_group->lock);
ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
@@ -1997,6 +1998,7 @@ void btrfs_zone_finish_endio(struct btrfs_fs_info *fs_info, u64 logical, u64 len
ASSERT(block_group->alloc_offset == block_group->zone_capacity);
ASSERT(block_group->free_space_ctl->free_space == 0);
btrfs_clear_treelog_bg(block_group);
+ btrfs_clear_data_reloc_bg(block_group);
spin_unlock(&block_group->lock);
map = block_group->physical_map;
This hunk is also needed:
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a79001caffae..212c94c9ce97 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3798,9 +3798,8 @@ static int do_allocation_zoned(struct btrfs_block_group *block_group,
*/
spin_lock(&fs_info->relocation_bg_lock);
data_reloc_bytenr = fs_info->data_reloc_bg;
- if (data_reloc_bytenr &&
- ((ffe_ctl->for_data_reloc && bytenr != data_reloc_bytenr) ||
- (!ffe_ctl->for_data_reloc && bytenr == data_reloc_bytenr)))
+ if ((ffe_ctl->for_data_reloc && bytenr != data_reloc_bytenr) ||
+ (!ffe_ctl->for_data_reloc && bytenr == data_reloc_bytenr))
skip = true;
spin_unlock(&fs_info->relocation_bg_lock);
if (skip)
Full verification using fstests is still ongoing from my side.
@morbidrsa Thank you for the suggestions! I applied your two patches on top of https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/log/?h=for-next at commit 279373dee83ee91b65a30c10792ab8c96114a201
.
The patches (as well as setting bg_reclaim_threshold
to 0
) are not fixing the issue. Getting a hung_task with a slightly different trace:
[ 726.328648] INFO: task rocksdb:high0:11085 blocked for more than 241 seconds.
[ 726.329839] Not tainted 5.16.0-rc1+ #1
[ 726.330484] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 726.331603] task:rocksdb:high0 state:D stack: 0 pid:11085 ppid: 11082 flags:0x00000000
[ 726.331608] Call Trace:
[ 726.331611] <TASK>
[ 726.331614] __schedule+0x2e5/0x9d0
[ 726.331622] schedule+0x58/0xd0
[ 726.331626] io_schedule+0x3f/0x70
[ 726.331629] __folio_lock+0x125/0x200
[ 726.331634] ? find_get_entries+0x1bc/0x240
[ 726.331638] ? filemap_invalidate_unlock_two+0x40/0x40
[ 726.331642] truncate_inode_pages_range+0x5b2/0x770
[ 726.331649] truncate_inode_pages_final+0x44/0x50
[ 726.331653] btrfs_evict_inode+0x67/0x480
[ 726.331658] evict+0xd0/0x180
[ 726.331661] iput+0x13f/0x200
[ 726.331664] do_unlinkat+0x1c0/0x2b0
[ 726.331668] __x64_sys_unlink+0x23/0x30
[ 726.331670] do_syscall_64+0x3b/0xc0
[ 726.331674] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 726.331677] RIP: 0033:0x7fb9490a171b
[ 726.331681] RSP: 002b:00007fb943ffac68 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[ 726.331684] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb9490a171b
[ 726.331686] RDX: 00007fb943ffb040 RSI: 000055a6bbe6ec20 RDI: 00007fb94400d300
[ 726.331687] RBP: 00007fb943ffad00 R08: 0000000000000000 R09: 0000000000000000
[ 726.331688] R10: 0000000000000031 R11: 0000000000000246 R12: 00007fb943ffb000
[ 726.331690] R13: 00007fb943ffb040 R14: 0000000000000000 R15: 00007fb943ffd260
[ 726.331693] </TASK>
Before the introduction of folios, the hang was on wait_on_page_bit_common
. Now it is on __folio_lock
. This probably means that this is the same deadlock pattern: one of the pages to be evicted is marked for writeback (locked), but never unlocked. Is this a race with rebalance ? The extents of the file that is being deleted are being moved to another zone, but the extent pages never unlocked once the write finishes ? This means that the pattern can only happen if some of the extents of the file have not yet been written out, but still being looked at by rebalance... Or is rebalance ignoring unwritten extents with invalid pages, but does not unlock the pages ? Some race of this sort seems the likely root cause.
Balance is just using the normal buffered IO path (it litterly calls into readpages and writepages), so no extra deadlock there. And as Dennis deactivated auto reclaim I suspect it's not guilty this time.
But I have no idea either.
Can you check current Linus' master + https://lore.kernel.org/linux-btrfs/20211213034338.949507-1-naohiro.aota@wdc.com/
I have checkout out Linus' master at a7904a538933c525096ca2ccde1e60d0ee62c08e and applied Naohiro's patch. Can't test if the hung_task is resolved. I am getting a "No space left on device" now.
I did an blkzone report
and it looks like the maximum number of open zones is reached.
Also getting a "No space left on device" with a simple fio workload on a 4TB ZNS drive with btrfs:
sudo fio --filename=/mnt/btrfs/testrun/testfile --size=800G --direct=1 --rw=randwrite --bs=64K --iodepth=1 --verify=crc32c --name=test
However after doing a blkzone report
there are just 5 zones implicitly open. 152 zones are full and 3533 zones are empty.
Did not have time to investigate deeper than that till now. Could also be that this is a discussion for another thread.
When giving a RocksDB workload on a Btrfs formated ZNS SSD a hung_task issue occours:
The steps to reproduce this behavior:
Kernel in use: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/log/?h=for-next at commit 23b461ade5edb7861021c6cddd5bca9b6c9285c2
Installing RocksDB's db_bench:
Format ZNS SSD (here /dev/nvme0n1) with Btrfs and prepare it for the RocksDB benchmark:
Now run the quick_performance test, which causes the hung_task: