deadlock when running xfstests generic/013

problame commented 3 years ago

ZFS master deadlocks when running xfstests test generic/013.

dmesg


[    0.000000] Linux version 5.9.0-0.bpo.5-amd64 (debian-kernel@lists.debian.org) (gcc-8 (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP Debian 5.9.15-1~bpo10+1 (2020-12-31)
...
[  567.808888] ZFS: Loaded module v2.0.0-rc1_313_g06346cc5b, ZFS pool version 5000, ZFS filesystem version 5
[  669.487279] EXT4-fs (nvme1n1p2): mounted filesystem with ordered data mode. Opts: (null)
[  678.239314] run fstests generic/013 at 2021-03-24 17:35:59
[  846.934445] INFO: task fsstress:19007 blocked for more than 120 seconds.
[  846.934504]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.934559] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.934610] task:fsstress        state:D stack:    0 pid:19007 ppid: 19006 flags:0x00000000
[  846.934615] Call Trace:
[  846.934625]  __schedule+0x2be/0x780
[  846.934631]  schedule+0x40/0xb0
[  846.934636]  io_schedule+0x12/0x40
[  846.934641]  wait_on_page_bit+0x127/0x230
[  846.934647]  ? xas_find+0x174/0x1b0
[  846.934652]  ? __page_cache_alloc+0x90/0x90
[  846.934658]  truncate_inode_pages_range+0x39e/0x800
[  846.934761]  ? dmu_free_long_range+0x3cf/0x4c0 [zfs]
[  846.934920]  zfs_freesp+0x403/0x430 [zfs]
[  846.935081]  ? zfs_zaccess+0x216/0x410 [zfs]
[  846.935087]  ? _cond_resched+0x15/0x30
[  846.935237]  zfs_space+0xb9/0x220 [zfs]
[  846.935386]  zpl_fallocate_common+0x102/0x240 [zfs]
[  846.935394]  ? common_file_perm+0x64/0x170
[  846.935398]  vfs_fallocate+0x13b/0x260
[  846.935402]  ksys_fallocate+0x3c/0x80
[  846.935405]  __x64_sys_fallocate+0x1a/0x20
[  846.935410]  do_syscall_64+0x33/0x80
[  846.935414]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.935419] RIP: 0033:0x7f9a85611437
[  846.935421] RSP: 002b:00007ffdf7f23488 EFLAGS: 00000246 ORIG_RAX: 000000000000011d
[  846.935424] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f9a85611437
[  846.935426] RDX: 0000000000046fa1 RSI: 0000000000000003 RDI: 0000000000000004
[  846.935428] RBP: 000000000000001d R08: 000000000000007c R09: 00007ffdf7f231c2
[  846.935429] R10: 00000000000896da R11: 0000000000000246 R12: 0000000000000003
[  846.935431] R13: 00000000000896da R14: 0000000000046fa1 R15: 000055804f250c40
[  846.935436] INFO: task fsstress:19008 blocked for more than 120 seconds.
[  846.935483]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.935537] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.935588] task:fsstress        state:D stack:    0 pid:19008 ppid: 19006 flags:0x00000000
[  846.935590] Call Trace:
[  846.935596]  __schedule+0x2be/0x780
[  846.935601]  schedule+0x40/0xb0
[  846.935606]  schedule_preempt_disabled+0xa/0x10
[  846.935611]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.935778]  zil_commit_impl+0x147/0xe00 [zfs]
[  846.935793]  ? spl_kmem_alloc+0xb6/0x100 [spl]
[  846.935806]  ? tsd_hash_search.isra.2+0x42/0x90 [spl]
[  846.935818]  ? tsd_set+0x21c/0x480 [spl]
[  846.935823]  ? _cond_resched+0x15/0x30
[  846.935988]  zfs_fsync+0x71/0x100 [zfs]
[  846.936135]  zpl_fsync+0x67/0xa0 [zfs]
[  846.936140]  do_fsync+0x38/0x70
[  846.936144]  __x64_sys_fdatasync+0x13/0x20
[  846.936149]  do_syscall_64+0x33/0x80
[  846.936153]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.936155] RIP: 0033:0x7f9a856132c4
[  846.936157] RSP: 002b:00007ffdf7f23948 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[  846.936160] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f9a856132c4
[  846.936161] RDX: 00007ffdf7f238b0 RSI: 00007ffdf7f238b0 RDI: 0000000000000004
[  846.936163] RBP: 0000000000000077 R08: 00007f9a856ddc40 R09: 00007ffdf7f23596
[  846.936164] R10: fffffffffffff1be R11: 0000000000000246 R12: 00000000000001f4
[  846.936166] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f24a740
[  846.936170] INFO: task fsstress:19009 blocked for more than 120 seconds.
[  846.936215]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.936269] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.936319] task:fsstress        state:D stack:    0 pid:19009 ppid: 19006 flags:0x00004000
[  846.936322] Call Trace:
[  846.936327]  __schedule+0x2be/0x780
[  846.936332]  schedule+0x40/0xb0
[  846.936336]  schedule_preempt_disabled+0xa/0x10
[  846.936341]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.936506]  zil_commit_impl+0x147/0xe00 [zfs]
[  846.936512]  ? __schedule+0x2c6/0x780
[  846.936517]  ? __wake_up_common_lock+0x87/0xc0
[  846.936520]  ? prepare_to_wait_event+0xb6/0x140
[  846.936524]  ? _raw_spin_lock_irqsave+0x30/0x40
[  846.936527]  ? finish_wait+0x3f/0x80
[  846.936531]  ? _cond_resched+0x15/0x30
[  846.936535]  ? __ia32_sys_fdatasync+0x20/0x20
[  846.936682]  zfs_sync+0x73/0x1a0 [zfs]
[  846.936829]  zpl_sync_fs+0x4b/0x70 [zfs]
[  846.936834]  iterate_supers+0x98/0x100
[  846.936838]  ksys_sync+0x60/0xb0
[  846.936841]  __do_sys_sync+0xa/0x10
[  846.936846]  do_syscall_64+0x33/0x80
[  846.936849]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.936852] RIP: 0033:0x7f9a85613287
[  846.936853] RSP: 002b:00007ffdf7f23978 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
[  846.936856] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f9a85613287
[  846.936857] RDX: 0000000000000000 RSI: 000000004800ce16 RDI: 000000000000000d
[  846.936859] RBP: 000055804f259560 R08: 0000000000000046 R09: 0000558051124410
[  846.936861] R10: fffffffffffff1be R11: 0000000000000206 R12: 00000000000001f4
[  846.936862] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f247450
[  846.936868] INFO: task fsstress:19012 blocked for more than 120 seconds.
[  846.936912]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.936966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.937016] task:fsstress        state:D stack:    0 pid:19012 ppid: 19006 flags:0x00004000
[  846.937019] Call Trace:
[  846.937024]  __schedule+0x2be/0x780
[  846.937029]  schedule+0x40/0xb0
[  846.937033]  schedule_preempt_disabled+0xa/0x10
[  846.937037]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.937202]  zil_commit_impl+0x147/0xe00 [zfs]
[  846.937207]  ? __schedule+0x2c6/0x780
[  846.937211]  ? __wake_up_common_lock+0x87/0xc0
[  846.937215]  ? prepare_to_wait_event+0xb6/0x140
[  846.937218]  ? finish_wait+0x3f/0x80
[  846.937223]  ? _cond_resched+0x15/0x30
[  846.937226]  ? __ia32_sys_fdatasync+0x20/0x20
[  846.937376]  zfs_sync+0x73/0x1a0 [zfs]
[  846.937525]  zpl_sync_fs+0x4b/0x70 [zfs]
[  846.937528]  iterate_supers+0x98/0x100
[  846.937532]  ksys_sync+0x60/0xb0
[  846.937536]  __do_sys_sync+0xa/0x10
[  846.937540]  do_syscall_64+0x33/0x80
[  846.937544]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.937546] RIP: 0033:0x7f9a85613287
[  846.937547] RSP: 002b:00007ffdf7f23978 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
[  846.937549] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f9a85613287
[  846.937551] RDX: 00000000ffffffff RSI: 000000005bca59f9 RDI: 0000000000000005
[  846.937552] RBP: 000055804f259560 R08: 000000000000003a R09: 00007ffdf7f23377
[  846.937554] R10: fffffffffffff1be R11: 0000000000000206 R12: 00000000000001f4
[  846.937556] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f247450
[  846.937559] INFO: task fsstress:19015 blocked for more than 120 seconds.
[  846.937604]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.937657] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.937707] task:fsstress        state:D stack:    0 pid:19015 ppid: 19006 flags:0x00004000
[  846.937709] Call Trace:
[  846.937714]  __schedule+0x2be/0x780
[  846.937719]  schedule+0x40/0xb0
[  846.937723]  schedule_preempt_disabled+0xa/0x10
[  846.937727]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.937894]  zil_commit_impl+0x147/0xe00 [zfs]
[  846.937899]  ? __schedule+0x2c6/0x780
[  846.937903]  ? __wake_up_common_lock+0x87/0xc0
[  846.937906]  ? prepare_to_wait_event+0xb6/0x140
[  846.937910]  ? finish_wait+0x3f/0x80
[  846.937914]  ? _cond_resched+0x15/0x30
[  846.937918]  ? __ia32_sys_fdatasync+0x20/0x20
[  846.938067]  zfs_sync+0x73/0x1a0 [zfs]
[  846.938218]  zpl_sync_fs+0x4b/0x70 [zfs]
[  846.938221]  iterate_supers+0x98/0x100
[  846.938225]  ksys_sync+0x60/0xb0
[  846.938228]  __do_sys_sync+0xa/0x10
[  846.938233]  do_syscall_64+0x33/0x80
[  846.938236]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.938238] RIP: 0033:0x7f9a85613287
[  846.938239] RSP: 002b:00007ffdf7f23978 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
[  846.938242] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f9a85613287
[  846.938243] RDX: 0000000000000000 RSI: 0000000074a68820 RDI: 000000000000000a
[  846.938245] RBP: 000055804f259560 R08: 0000000000000032 R09: 0000558051124410
[  846.938246] R10: fffffffffffff1be R11: 0000000000000206 R12: 00000000000001f4
[  846.938248] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f247450
[  846.938253] INFO: task fsstress:19019 blocked for more than 120 seconds.
[  846.938328]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.938386] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.938441] task:fsstress        state:D stack:    0 pid:19019 ppid: 19006 flags:0x00000000
[  846.938448] Call Trace:
[  846.938456]  __schedule+0x2be/0x780
[  846.938465]  schedule+0x40/0xb0
[  846.938474]  schedule_preempt_disabled+0xa/0x10
[  846.938482]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.938651]  zil_commit_impl+0x147/0xe00 [zfs]
[  846.938671]  ? spl_kmem_alloc+0xb6/0x100 [spl]
[  846.938689]  ? tsd_hash_search.isra.2+0x42/0x90 [spl]
[  846.938706]  ? tsd_set+0x21c/0x480 [spl]
[  846.938715]  ? _cond_resched+0x15/0x30
[  846.938885]  zfs_fsync+0x71/0x100 [zfs]
[  846.939039]  zpl_fsync+0x67/0xa0 [zfs]
[  846.939050]  do_fsync+0x38/0x70
[  846.939057]  __x64_sys_fdatasync+0x13/0x20
[  846.939066]  do_syscall_64+0x33/0x80
[  846.939074]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.939080] RIP: 0033:0x7f9a856132c4
[  846.939085] RSP: 002b:00007ffdf7f23948 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[  846.939097] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f9a856132c4
[  846.939103] RDX: 00007ffdf7f238b0 RSI: 00007ffdf7f238b0 RDI: 0000000000000004
[  846.939109] RBP: 000000000000001e R08: 0000000000000000 R09: 00007ffdf7f23597
[  846.939114] R10: fffffffffffff1be R11: 0000000000000246 R12: 00000000000001f4
[  846.939119] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f24a740
[  846.939124] INFO: task fsstress:19024 blocked for more than 120 seconds.
[  846.939169]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.939223] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.939274] task:fsstress        state:D stack:    0 pid:19024 ppid: 19006 flags:0x00004000
[  846.939277] Call Trace:
[  846.939282]  __schedule+0x2be/0x780
[  846.939286]  schedule+0x40/0xb0
[  846.939290]  schedule_preempt_disabled+0xa/0x10
[  846.939295]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.939460]  zil_commit_impl+0x147/0xe00 [zfs]
[  846.939471]  ? __schedule+0x2c6/0x780
[  846.939480]  ? __wake_up_common_lock+0x87/0xc0
[  846.939486]  ? prepare_to_wait_event+0xb6/0x140
[  846.939494]  ? _raw_spin_lock_irqsave+0x30/0x40
[  846.939502]  ? finish_wait+0x3f/0x80
[  846.939511]  ? _cond_resched+0x15/0x30
[  846.939521]  ? __ia32_sys_fdatasync+0x20/0x20
[  846.939674]  zfs_sync+0x73/0x1a0 [zfs]
[  846.939827]  zpl_sync_fs+0x4b/0x70 [zfs]
[  846.939836]  iterate_supers+0x98/0x100
[  846.939844]  ksys_sync+0x60/0xb0
[  846.939852]  __do_sys_sync+0xa/0x10
[  846.939860]  do_syscall_64+0x33/0x80
[  846.939867]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.939873] RIP: 0033:0x7f9a85613287
[  846.939880] RSP: 002b:00007ffdf7f23978 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
[  846.939890] RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00007f9a85613287
[  846.939897] RDX: 00000000ffffffff RSI: 000000002afcc2ab RDI: 0000000000000017
[  846.939903] RBP: 000055804f259560 R08: 000000000000003b R09: 00007ffdf7f23296
[  846.939909] R10: fffffffffffff1be R11: 0000000000000206 R12: 00000000000001f4
[  846.939915] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f247450
[  846.939924] INFO: task fsstress:19028 blocked for more than 120 seconds.
[  846.939972]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.940025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.940076] task:fsstress        state:D stack:    0 pid:19028 ppid: 19006 flags:0x00000000
[  846.940078] Call Trace:
[  846.940083]  __schedule+0x2be/0x780
[  846.940088]  schedule+0x40/0xb0
[  846.940092]  schedule_preempt_disabled+0xa/0x10
[  846.940097]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.940262]  zil_commit_impl+0x147/0xe00 [zfs]
[  846.940279]  ? spl_kmem_alloc+0xb6/0x100 [spl]
[  846.940295]  ? tsd_hash_search.isra.2+0x42/0x90 [spl]
[  846.940312]  ? tsd_set+0x21c/0x480 [spl]
[  846.940323]  ? _cond_resched+0x15/0x30
[  846.940489]  zfs_fsync+0x71/0x100 [zfs]
[  846.940637]  zpl_fsync+0x67/0xa0 [zfs]
[  846.940643]  do_fsync+0x38/0x70
[  846.940646]  __x64_sys_fsync+0x10/0x20
[  846.940651]  do_syscall_64+0x33/0x80
[  846.940654]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.940656] RIP: 0033:0x7f9a856f9a71
[  846.940657] RSP: 002b:00007ffdf7f23948 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[  846.940660] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f9a856f9a71
[  846.940661] RDX: 00007ffdf7f238b0 RSI: 00007ffdf7f238b0 RDI: 0000000000000004
[  846.940663] RBP: 0000000000000012 R08: 00007f9a856ddc40 R09: 00007ffdf7f23597
[  846.940664] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000001f4
[  846.940666] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f24a480
[  846.940671] INFO: task fsstress:19034 blocked for more than 120 seconds.
[  846.940718]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.940773] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.940823] task:fsstress        state:D stack:    0 pid:19034 ppid: 19006 flags:0x00004000
[  846.940825] Call Trace:
[  846.940830]  __schedule+0x2be/0x780
[  846.940836]  ? __switch_to_asm+0x36/0x70
[  846.940840]  schedule+0x40/0xb0
[  846.940844]  schedule_preempt_disabled+0xa/0x10
[  846.940848]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.940853]  ? finish_wait+0x3f/0x80
[  846.940858]  ? wb_wait_for_completion+0x75/0x80
[  846.940861]  ? __x64_sys_tee+0xc0/0xc0
[  846.940864]  sync_inodes_sb+0x111/0x2c0
[  846.940869]  ? __x64_sys_tee+0xc0/0xc0
[  846.940871]  iterate_supers+0x98/0x100
[  846.940874]  ksys_sync+0x40/0xb0
[  846.940878]  __do_sys_sync+0xa/0x10
[  846.940882]  do_syscall_64+0x33/0x80
[  846.940885]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.940887] RIP: 0033:0x7f9a85613287
[  846.940888] RSP: 002b:00007ffdf7f23978 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
[  846.940891] RAX: ffffffffffffffda RBX: 0000000000000023 RCX: 00007f9a85613287
[  846.940892] RDX: 0000000000000000 RSI: 000000006e603b4c RDI: 0000000000000023
[  846.940894] RBP: 000055804f259560 R08: 0000000000000026 R09: 0000558051124560
[  846.940896] R10: fffffffffffff1be R11: 0000000000000206 R12: 00000000000001f4
[  846.940897] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f247450
[  846.940903] INFO: task fsstress:19038 blocked for more than 120 seconds.
[  846.940947]       Tainted: P           OE     5.9.0-0.bpo.5-amd64 #1 Debian 5.9.15-1~bpo10+1
[  846.941002] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.941052] task:fsstress        state:D stack:    0 pid:19038 ppid: 19006 flags:0x00000000
[  846.941054] Call Trace:
[  846.941059]  __schedule+0x2be/0x780
[  846.941064]  schedule+0x40/0xb0
[  846.941068]  schedule_preempt_disabled+0xa/0x10
[  846.941072]  __mutex_lock.isra.9+0x2b5/0x4a0
[  846.941237]  zil_commit_impl+0x147/0xe00 [zfs]
[  846.941249]  ? spl_kmem_alloc+0xb6/0x100 [spl]
[  846.941264]  ? tsd_hash_search.isra.2+0x42/0x90 [spl]
[  846.941276]  ? tsd_set+0x21c/0x480 [spl]
[  846.941281]  ? _cond_resched+0x15/0x30
[  846.941442]  zfs_fsync+0x71/0x100 [zfs]
[  846.941588]  zpl_fsync+0x67/0xa0 [zfs]
[  846.941594]  do_fsync+0x38/0x70
[  846.941597]  __x64_sys_fsync+0x10/0x20
[  846.941602]  do_syscall_64+0x33/0x80
[  846.941605]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  846.941607] RIP: 0033:0x7f9a856f9a71
[  846.941608] RSP: 002b:00007ffdf7f23948 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[  846.941611] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f9a856f9a71
[  846.941612] RDX: 00007ffdf7f238b0 RSI: 00007ffdf7f238b0 RDI: 0000000000000004
[  846.941613] RBP: 000000000000002e R08: 0000000000000000 R09: 00007ffdf7f23597
[  846.941615] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000001f4
[  846.941616] R13: 0000000051eb851f R14: 00007ffdf7f239f0 R15: 000055804f24a480

for i in $(pgrep fsstress); do cat /proc/$i/stack; done


19006
[<0>] do_wait+0x1c3/0x230
[<0>] kernel_wait4+0xa6/0x140
[<0>] __do_sys_wait4+0x83/0x90
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19007
[<0>] wait_on_page_bit+0x127/0x230
[<0>] truncate_inode_pages_range+0x39e/0x800
[<0>] zfs_freesp+0x403/0x430 [zfs]
[<0>] zfs_space+0xb9/0x220 [zfs]
[<0>] zpl_fallocate_common+0x102/0x240 [zfs]
[<0>] vfs_fallocate+0x13b/0x260
[<0>] ksys_fallocate+0x3c/0x80
[<0>] __x64_sys_fallocate+0x1a/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19008
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fdatasync+0x13/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19009
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_sync+0x73/0x1a0 [zfs]
[<0>] zpl_sync_fs+0x4b/0x70 [zfs]
[<0>] iterate_supers+0x98/0x100
[<0>] ksys_sync+0x60/0xb0
[<0>] __do_sys_sync+0xa/0x10
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19012
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_sync+0x73/0x1a0 [zfs]
[<0>] zpl_sync_fs+0x4b/0x70 [zfs]
[<0>] iterate_supers+0x98/0x100
[<0>] ksys_sync+0x60/0xb0
[<0>] __do_sys_sync+0xa/0x10
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19015
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_sync+0x73/0x1a0 [zfs]
[<0>] zpl_sync_fs+0x4b/0x70 [zfs]
[<0>] iterate_supers+0x98/0x100
[<0>] ksys_sync+0x60/0xb0
[<0>] __do_sys_sync+0xa/0x10
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19019
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fdatasync+0x13/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19024
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_sync+0x73/0x1a0 [zfs]
[<0>] zpl_sync_fs+0x4b/0x70 [zfs]
[<0>] iterate_supers+0x98/0x100
[<0>] ksys_sync+0x60/0xb0
[<0>] __do_sys_sync+0xa/0x10
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19028
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19034
[<0>] sync_inodes_sb+0x111/0x2c0
[<0>] iterate_supers+0x98/0x100
[<0>] ksys_sync+0x40/0xb0
[<0>] __do_sys_sync+0xa/0x10
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19038
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19042
[<0>] cv_wait_common+0xf8/0x130 [spl]
[<0>] zfs_rangelock_enter_impl+0x467/0x560 [zfs]
[<0>] zfs_get_data+0xbb/0x350 [zfs]
[<0>] zil_commit_impl+0x9bb/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fdatasync+0x13/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19045
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19049
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_sync+0x73/0x1a0 [zfs]
[<0>] zpl_sync_fs+0x4b/0x70 [zfs]
[<0>] iterate_supers+0x98/0x100
[<0>] ksys_sync+0x60/0xb0
[<0>] __do_sys_sync+0xa/0x10
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19053
[<0>] wait_on_page_bit+0x127/0x230
[<0>] __filemap_fdatawait_range+0x89/0xf0
[<0>] filemap_fdatawait_keep_errors+0x1a/0x40
[<0>] sync_inodes_sb+0x226/0x2c0
[<0>] iterate_supers+0x98/0x100
[<0>] ksys_sync+0x40/0xb0
[<0>] __do_sys_sync+0xa/0x10
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19056
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19060
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zpl_writepages+0xd3/0x160 [zfs]
[<0>] do_writepages+0x41/0xd0
[<0>] __filemap_fdatawrite_range+0xcb/0x100
[<0>] filemap_write_and_wait_range+0x38/0x90
[<0>] zpl_fsync+0x37/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fdatasync+0x13/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19065
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fdatasync+0x13/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19069
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19073
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_sync+0x73/0x1a0 [zfs]
[<0>] zpl_sync_fs+0x4b/0x70 [zfs]
[<0>] iterate_supers+0x98/0x100
[<0>] ksys_sync+0x60/0xb0
[<0>] __do_sys_sync+0xa/0x10
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
19077
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Analysis

The various zil_commit_impl() calls are blocked waiting on a mutex, presumable zl_lock. They showed up over time and are presumably irrelevant for this bug.

This thread is blocked waiting for a rangelock:

19042
[<0>] cv_wait_common+0xf8/0x130 [spl]
[<0>] zfs_rangelock_enter_impl+0x467/0x560 [zfs]
[<0>] zfs_get_data+0xbb/0x350 [zfs]
[<0>] zil_commit_impl+0x9bb/0xe00 [zfs]
[<0>] zfs_fsync+0x71/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fdatasync+0x13/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

This thread is holding a rangelock and not making progress.

19007
[<0>] wait_on_page_bit+0x127/0x230
[<0>] truncate_inode_pages_range+0x39e/0x800
[<0>] zfs_freesp+0x403/0x430 [zfs]
[<0>] zfs_space+0xb9/0x220 [zfs]
[<0>] zpl_fallocate_common+0x102/0x240 [zfs]
[<0>] vfs_fallocate+0x13b/0x260
[<0>] ksys_fallocate+0x3c/0x80
[<0>] __x64_sys_fallocate+0x1a/0x20
[<0>] do_syscall_64+0x33/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Describe how to reproduce the problem

Clone xfstests from https://github.com/kdave/xfstests.git

Apply the following patch:

diff --git a/common/config b/common/config
index d4cf8089..66e56bb4 100644
--- a/common/config
+++ b/common/config
@@ -494,6 +494,8 @@ _check_device()
    # 9p and virtiofs mount tags are just plain strings, so anything is allowed
    # tmpfs doesn't use mount source, ignore
    ;;
+    zfs)
+        ;;
overlay)
    if [ ! -d "$dev" ]; then
        _fatal "common/config: $name ($dev) is not a directory for overlay"
diff --git a/common/rc b/common/rc
index 0ce3cb0d..dc3a7d0d 100644
--- a/common/rc
+++ b/common/rc
@@ -1644,7 +1644,7 @@ _require_test()
        _notrun "this test requires a valid \$TEST_DIR"
    fi
    ;;
-   9p|virtiofs)
+   9p|virtiofs|zfs)
    if [ -z "$TEST_DEV" ]; then
        _notrun "this test requires a valid \$TEST_DEV"
    fi

Build xfstests. No need to install it. Just run make.

Create the following local.config in the checkout directory

export TEST_DEV=dut
export TEST_DIR=/dut
export FSTYP=zfs

Run the test that induces the crash like so:

$ZFS_BUILDDIR/bin/zpool create -O mountpoint=legacy -O compression=off -O recordsize=4k dut /dev/nvme1n1p1
mkdir -p /dut
bin/mount.zfs dut /dut
pushd /home/cs/xfstests
./check 'generic/013'
popd

problame commented 3 years ago

I familiarized myself with what truncate_inode_pages_range actually does. ISTM that the following sequence might lead to a deadlock:

Thread1 calls fallocate() ... zfs_freesp() zfs_free_range() truncate_inode_pages_range() wait_on_page_bit()
- The bit we're waiting for here is, most likely, the writeback bit
- We hold the rangelock from zfs_free_range(): https://github.com/zfsonlinux/zfs/blob/1a0b4f566c4e2948e8df0ce43880ebf6123bad8c/module/os/linux/zfs/zfs_znode.c#L1608
The kernel's writeback thread(s?) call znode's i_mapping->writepages which is zpl_writepages(). That calls zpl_putpage() zfs_putpage() which also wants to enter the rangelock: https://github.com/zfsonlinux/zfs/blob/1c2358c12a673759845f70c57dade601cc12ed99/module/os/linux/zfs/zfs_vnops_os.c#L3522-L3523
Deadlock.

behlendorf commented 3 years ago

@problame your analysis looks correct to me, that's exactly how this deadlock can be hit. Perhaps one reasonable way to handle this is to just release the range lock in zfs_free_range() prior to truncating the page cache pages. The comment here regarding why we're taking the range lock seems overly strict. While it's true we need to keep the ARC and page cache in sync this is already handled in zfs_read() and zfs_write(). Both of which are careful to take the needed page locks when accessing/updating their respective memory mapped pages. Furthermore, any memory mapped page writes which happen concurrent with the truncate should behave exactly as before. They will be written when the dirtied address space in written out via zpl_writepages().

problame commented 3 years ago

While it's true we need to keep the ARC and page cache in sync this is already handled in zfs_read() and zfs_write() Both of which are careful to take the needed page locks when accessing/updating their respective memory mapped pages.

Just to be clear, you are referring to this code?

https://github.com/zfsonlinux/zfs/blob/1c2358c12a673759845f70c57dade601cc12ed99/module/zfs/zfs_vnops.c#L587-L590

Furthermore, any memory mapped page writes which happen concurrent with the truncate should behave exactly as before. They will be written when the dirtied address space in written out via zpl_writepages().

Seems reasonable. A truncate syscall would call truncate_inode_pages_range() so things are kept in sync.

behlendorf commented 3 years ago

Just to be clear, you are referring to this code?

Yes, and here for reads:

https://github.com/openzfs/zfs/blob/1c2358c12a673759845f70c57dade601cc12ed99/module/zfs/zfs_vnops.c#L268-L270

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

problame commented 2 years ago

I wouldn't be surprised if this is still an issue. Don't have a setup handy to repro this. @behlendorf would it be worth investigating and fixing this?

behlendorf commented 2 years ago

Yes, I suspect this is still an issue and one worth investigating. Particularly if we have a consistent reproducer.

openzfs / zfs

deadlock when running xfstests generic/013 #11792

Analysis

Describe how to reproduce the problem