openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.53k stars 1.74k forks source link

Process stuck at zfs_fsync #10440

Closed sebu06 closed 1 year ago

sebu06 commented 4 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version Buster
Linux Kernel 5.4.41-1-pve
Architecture x86_64
ZFS Version 0.8.4-pve1
SPL Version 0.8.4-pve1

I'm using the tool backy2 to backup a CEPH cluster to a raidz ZFS volume. The Backy2 Process uses a SQLite Database stored on in a dataset on the Backup pool. Sometimes during the Backup, the process gets stuck (D state). The stacktrace shows that it is stuck at a zfs-task:

root@zds:/proc/2958595$ cat stack
[<0>] cv_wait_common+0x104/0x130 [spl]
[<0>] __cv_wait+0x15/0x20 [spl]
[<0>] zil_commit_impl+0x241/0xd90 [zfs]
[<0>] zil_commit+0x3d/0x60 [zfs]
[<0>] zfs_fsync+0x77/0xe0 [zfs]
[<0>] zpl_fsync+0x68/0xa0 [zfs]
[<0>] vfs_fsync_range+0x48/0x80
[<0>] do_fsync+0x3d/0x70
[<0>] __x64_sys_fdatasync+0x17/0x20
[<0>] do_syscall_64+0x57/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

The problem happens not every day, but on a regular basis. It is quite annoying, because it requires a reboot of the cluster node. The cluster is a Proxmox6 cluster. The same setup was working for a long time on a Proxmox5 cluster (Kernel 4.15.18-29-pve, zfs 0.7.13-pve1~bpo2, Debian Stretch) and problems only started after the update.

sebu06 commented 4 years ago

by the way - there is a process stuck at the moment, which I did not kill. So if any additional data can be obtained from that I'm happy to help

sebu06 commented 4 years ago

thanks, I'll try recompiling with that fix and let you know if it occurs again

behlendorf commented 4 years ago

This looks very similar to openzfsonosx/openzfs#3. There's a tiny proposed fix in https://github.com/openzfsonosx/openzfs/commit/adeb4523bef7aaf8258d57a48b5898f6d1187864. This code was added in the 0.8.x releases, so that's consistent with your observation that it didn't happen with the earlier release.

snajpa commented 4 years ago

This bug is still relevant for me with current zfs master (as of today);

lockdep reveals this, it is probably relevant, I'm investigating further:

[  140.025805] ======================================================
[  140.025806] WARNING: possible circular locking dependency detected
[  140.025808] 5.9.0-rc1+ #51 Tainted: G           O     
[  140.025810] ------------------------------------------------------
[  140.025811] systemd-journal/3974 is trying to acquire lock:
[  140.025813] ffff94ba81261070 (&zilog->zl_lock){+.+.}-{3:3}, at: zil_alloc_lwb+0x1df/0x3e0 [zfs]
[  140.025970] 
               but task is already holding lock:
[  140.025971] ffff94ba9ba504f8 (&zcw->zcw_lock){+.+.}-{3:3}, at: zil_commit_impl+0x52b/0x1850 [zfs]
[  140.026100] 
               which lock already depends on the new lock.

[  140.026100] 
               the existing dependency chain (in reverse order) is:
[  140.026101] 
               -> #1 (&zcw->zcw_lock){+.+.}-{3:3}:
[  140.026110]        __mutex_lock+0xac/0x9e0
[  140.026113]        mutex_lock_nested+0x1b/0x20
[  140.026171]        zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs]
[  140.026271]        zil_commit_impl+0x12b0/0x1850 [zfs]
[  140.026366]        zil_commit+0x43/0x60 [zfs]
[  140.026478]        zpl_writepages+0xf8/0x1a0 [zfs]
[  140.026540]        do_writepages+0x43/0xf0
[  140.026544]        __filemap_fdatawrite_range+0xd5/0x110
[  140.026546]        filemap_write_and_wait_range+0x4b/0xb0
[  140.026601]        zpl_fsync+0x4d/0xb0 [zfs]
[  140.026648]        vfs_fsync_range+0x49/0x80
[  140.026650]        do_fsync+0x3d/0x70
[  140.026653]        __x64_sys_fsync+0x14/0x20
[  140.026658]        do_syscall_64+0x38/0x90
[  140.026662]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  140.026663] 
               -> #0 (&zilog->zl_lock){+.+.}-{3:3}:
[  140.026670]        __lock_acquire+0x1298/0x2460
[  140.026673]        lock_acquire+0xab/0x380
[  140.026675]        __mutex_lock+0xac/0x9e0
[  140.026676]        mutex_lock_nested+0x1b/0x20
[  140.026719]        zil_alloc_lwb+0x1df/0x3e0 [zfs]
[  140.026803]        zil_lwb_write_issue+0x265/0x3f0 [zfs]
[  140.026886]        zil_commit_impl+0x577/0x1850 [zfs]
[  140.026969]        zil_commit+0x43/0x60 [zfs]
[  140.027051]        zpl_writepages+0xf8/0x1a0 [zfs]
[  140.027116]        do_writepages+0x43/0xf0
[  140.027119]        __filemap_fdatawrite_range+0xd5/0x110
[  140.027122]        filemap_write_and_wait_range+0x4b/0xb0
[  140.027172]        zpl_fsync+0x4d/0xb0 [zfs]
[  140.027215]        vfs_fsync_range+0x49/0x80
[  140.027217]        do_fsync+0x3d/0x70
[  140.027220]        __x64_sys_fsync+0x14/0x20
[  140.027223]        do_syscall_64+0x38/0x90
[  140.027226]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  140.027227] 
               other info that might help us debug this:

[  140.027228]  Possible unsafe locking scenario:

[  140.027228]        CPU0                    CPU1
[  140.027229]        ----                    ----
[  140.027229]   lock(&zcw->zcw_lock);
[  140.027231]                                lock(&zilog->zl_lock);
[  140.027232]                                lock(&zcw->zcw_lock);
[  140.027233]   lock(&zilog->zl_lock);
[  140.027234] 
                *** DEADLOCK ***

[  140.027236] 2 locks held by systemd-journal/3974:
[  140.027236]  #0: ffff94ba812612b8 (&zilog->zl_issuer_lock){+.+.}-{3:3}, at: zil_commit_impl+0x4dc/0x1850 [zfs]
[  140.027342]  #1: ffff94ba9ba504f8 (&zcw->zcw_lock){+.+.}-{3:3}, at: zil_commit_impl+0x52b/0x1850 [zfs]
[  140.027452] 
               stack backtrace:
[  140.027455] CPU: 13 PID: 25 Comm: systemd-journal Tainted: G           O      5.9.0-rc1+ #51
[  140.027457] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[  140.027459] Call Trace:
[  140.027467]  dump_stack+0x9f/0xda
[  140.027472]  print_circular_bug.isra.0.cold+0x13d/0x142
[  140.027476]  check_noncircular+0x169/0x180
[  140.027483]  __lock_acquire+0x1298/0x2460
[  140.027489]  lock_acquire+0xab/0x380
[  140.027545]  ? zil_alloc_lwb+0x1df/0x3e0 [zfs]
[  140.027609]  __mutex_lock+0xac/0x9e0
[  140.027657]  ? zil_alloc_lwb+0x1df/0x3e0 [zfs]
[  140.027742]  ? zil_alloc_lwb+0x1df/0x3e0 [zfs]
[  140.027792]  ? spl_kmem_cache_alloc+0xc2/0x840 [spl]
[  140.027798]  mutex_lock_nested+0x1b/0x20
[  140.027840]  ? zio_buf_alloc+0x30/0x60 [zfs]
[  140.027882]  ? mutex_lock_nested+0x1b/0x20
[  140.027936]  zil_alloc_lwb+0x1df/0x3e0 [zfs]
[  140.028021]  zil_lwb_write_issue+0x265/0x3f0 [zfs]
[  140.028113]  zil_commit_impl+0x577/0x1850 [zfs]
[  140.028199]  zil_commit+0x43/0x60 [zfs]
[  140.028281]  zpl_writepages+0xf8/0x1a0 [zfs]
[  140.028325]  do_writepages+0x43/0xf0
[  140.028327]  ? _raw_spin_unlock+0x23/0x30
[  140.028330]  __filemap_fdatawrite_range+0xd5/0x110
[  140.028333]  filemap_write_and_wait_range+0x4b/0xb0
[  140.028376]  zpl_fsync+0x4d/0xb0 [zfs]
[  140.028418]  vfs_fsync_range+0x49/0x80
[  140.028421]  do_fsync+0x3d/0x70
[  140.028424]  __x64_sys_fsync+0x14/0x20
[  140.028428]  do_syscall_64+0x38/0x90
[  140.028430]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  140.028434] RIP: 0033:0x7f48b7837214
[  140.028436] Code: 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 f1 96 2d 00 8b 00 85 c0 75 13 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c f3 c3 66 90 53 89 fb 48 83 ec 10 e8 b4 94
[  140.028438] RSP: 002b:00007ffc919c1ef8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[  140.028440] RAX: ffffffffffffffda RBX: 000056389b01cb30 RCX: 00007f48b7837214
[  140.028441] RDX: 00007f48b2f36000 RSI: 000056389b01cd20 RDI: 0000000000000012
[  140.028442] RBP: 00007ffc919c2210 R08: 0005ad650b223649 R09: 000000005f3feab5
[  140.028444] R10: 00007ffc919f6000 R11: 0000000000000246 R12: 00000000ffffffff
[  140.028445] R13: 0000000000000000 R14: 000056389b01cb30 R15: 00007ffc919c21d0
snajpa commented 4 years ago

Proposed fix, testing:

diff --git a/module/zfs/zil.c b/module/zfs/zil.c
index 9dc20ba14..cf4d6d9dc 100644
--- a/module/zfs/zil.c
+++ b/module/zfs/zil.c
@@ -2513,8 +2513,8 @@ zil_commit_waiter_timeout(zilog_t *zilog, zil_commit_waiter_t *zcw)
         * elsewhere.
         */
        mutex_exit(&zcw->zcw_lock);
-       mutex_enter(&zilog->zl_issuer_lock);
        mutex_enter(&zcw->zcw_lock);
+       mutex_enter(&zilog->zl_issuer_lock);
snajpa commented 4 years ago

Oh, it's not going to be that simple :-D

snajpa commented 4 years ago

After noinlineing all functions in zil.c:

[   59.868232] ======================================================
[   59.868233] WARNING: possible circular locking dependency detected
[   59.868236] 5.9.0-rc1+ #51 Tainted: G           O     
[   59.868237] ------------------------------------------------------
[   59.868239] systemd-journal/3685 is trying to acquire lock:
[   59.868241] ffff989c8b1bc070 (&zilog->zl_lock){+.+.}-{3:3}, at: zil_alloc_lwb+0x1f8/0x4a0 [zfs]
[   59.868405] 
               but task is already holding lock:
[   59.868406] ffff989c6da5b1f8 (&zcw->zcw_lock){+.+.}-{3:3}, at: zil_commit_waiter_timeout+0x1ca/0x670 [zfs]
[   59.868542] 
               which lock already depends on the new lock.

[   59.868543] 
               the existing dependency chain (in reverse order) is:
[   59.868544] 
               -> #1 (&zcw->zcw_lock){+.+.}-{3:3}:
[   59.868552]        __mutex_lock+0xac/0x9e0
[   59.868555]        mutex_lock_nested+0x1b/0x20
[   59.868622]        zil_commit_waiter_skip+0x58/0x210 [zfs]
[   59.868765]        zil_prune_commit_list+0x1c9/0x250 [zfs]
[   59.868895]        zil_commit_writer.isra.0+0x118/0x280 [zfs]
[   59.869023]        zil_commit_impl+0x4a/0x150 [zfs]
[   59.869151]        zil_commit+0x5f/0x170 [zfs]
[   59.869278]        zfs_fsync+0x7c/0x110 [zfs]
[   59.869405]        zpl_fsync+0x93/0x120 [zfs]
[   59.869475]        vfs_fsync_range+0x49/0x80
[   59.869478]        do_fsync+0x3d/0x70
[   59.869482]        __x64_sys_fsync+0x14/0x20
[   59.869488]        do_syscall_64+0x38/0x90
[   59.869493]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   59.869495] 
               -> #0 (&zilog->zl_lock){+.+.}-{3:3}:
[   59.869503]        __lock_acquire+0x1298/0x2460
[   59.869507]        lock_acquire+0xab/0x380
[   59.869510]        __mutex_lock+0xac/0x9e0
[   59.869512]        mutex_lock_nested+0x1b/0x20
[   59.869577]        zil_alloc_lwb+0x1f8/0x4a0 [zfs]
[   59.869715]        zil_lwb_write_issue+0x2e8/0x580 [zfs]
[   59.869843]        zil_commit_waiter_timeout+0x236/0x670 [zfs]
[   59.869972]        zil_commit_waiter+0x2e9/0x4d0 [zfs]
[   59.870100]        zil_commit_impl+0x55/0x150 [zfs]
[   59.870227]        zil_commit+0x5f/0x170 [zfs]
[   59.870354]        zpl_writepages+0xf8/0x1a0 [zfs]
[   59.870423]        do_writepages+0x43/0xf0
[   59.870427]        __filemap_fdatawrite_range+0xd5/0x110
[   59.870430]        filemap_write_and_wait_range+0x4b/0xb0
[   59.870495]        zpl_fsync+0x4d/0x120 [zfs]
[   59.870561]        vfs_fsync_range+0x49/0x80
[   59.870565]        do_fsync+0x3d/0x70
[   59.870568]        __x64_sys_fsync+0x14/0x20
[   59.870571]        do_syscall_64+0x38/0x90
[   59.870575]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   59.870577] 
               other info that might help us debug this:

[   59.870578]  Possible unsafe locking scenario:

[   59.870579]        CPU0                    CPU1
[   59.870580]        ----                    ----
[   59.870580]   lock(&zcw->zcw_lock);
[   59.870583]                                lock(&zilog->zl_lock);
[   59.870585]                                lock(&zcw->zcw_lock);
[   59.870587]   lock(&zilog->zl_lock);
[   59.870589] 
                *** DEADLOCK ***

[   59.870591] 2 locks held by systemd-journal/3685:
[   59.870592]  #0: ffff989c8b1bc2b8 (&zilog->zl_issuer_lock){+.+.}-{3:3}, at: zil_commit_waiter_timeout+0x15c/0x670 [zfs]
[   59.870750]  #1: ffff989c6da5b1f8 (&zcw->zcw_lock){+.+.}-{3:3}, at: zil_commit_waiter_timeout+0x1ca/0x670 [zfs]
[   59.870881] 
               stack backtrace:
[   59.870884] CPU: 11 PID: 22 Comm: systemd-journal Tainted: G           O      5.9.0-rc1+ #51
[   59.870887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[   59.870889] Call Trace:
[   59.870896]  dump_stack+0x9f/0xda
[   59.870900]  print_circular_bug.isra.0.cold+0x13d/0x142
[   59.870904]  check_noncircular+0x169/0x180
[   59.870911]  __lock_acquire+0x1298/0x2460
[   59.870917]  lock_acquire+0xab/0x380
[   59.870983]  ? zil_alloc_lwb+0x1f8/0x4a0 [zfs]
[   59.871052]  __mutex_lock+0xac/0x9e0
[   59.871117]  ? zil_alloc_lwb+0x1f8/0x4a0 [zfs]
[   59.871204]  ? spl_kmem_cache_alloc+0xda/0xd50 [spl]
[   59.871286]  ? zil_alloc_lwb+0x1f8/0x4a0 [zfs]
[   59.871355]  mutex_lock_nested+0x1b/0x20
[   59.871357]  ? mutex_lock_nested+0x1b/0x20
[   59.871423]  zil_alloc_lwb+0x1f8/0x4a0 [zfs]
[   59.871553]  zil_lwb_write_issue+0x2e8/0x580 [zfs]
[   59.871715]  zil_commit_waiter_timeout+0x236/0x670 [zfs]
[   59.871844]  zil_commit_waiter+0x2e9/0x4d0 [zfs]
[   59.871984]  zil_commit_impl+0x55/0x150 [zfs]
[   59.872109]  zil_commit+0x5f/0x170 [zfs]
[   59.872257]  zpl_writepages+0xf8/0x1a0 [zfs]
[   59.872323]  do_writepages+0x43/0xf0
[   59.872327]  ? _raw_spin_unlock+0x23/0x30
[   59.872331]  __filemap_fdatawrite_range+0xd5/0x110
[   59.872336]  filemap_write_and_wait_range+0x4b/0xb0
[   59.872415]  zpl_fsync+0x4d/0x120 [zfs]
[   59.872482]  vfs_fsync_range+0x49/0x80
[   59.872487]  do_fsync+0x3d/0x70
[   59.872491]  __x64_sys_fsync+0x14/0x20
[   59.872495]  do_syscall_64+0x38/0x90
[   59.872499]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   59.872503] RIP: 0033:0x7fe5bf0a6214
[   59.872507] Code: 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8d 05 f1 96 2d 00 8b 00 85 c0 75 13 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c f3 c3 66 90 53 89 fb 48 83 ec 10 e8 b4 94
[   59.872509] RSP: 002b:00007ffde604ec88 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[   59.872513] RAX: ffffffffffffffda RBX: 00005585ac058640 RCX: 00007fe5bf0a6214
[   59.872515] RDX: 00007fe5ba7a5000 RSI: 0000000000000003 RDI: 0000000000000015
[   59.872517] RBP: 00007fe5becb4ea0 R08: 0000000000000005 R09: 0000000000000002
[   59.872518] R10: 00005585ac048010 R11: 0000000000000246 R12: 0000000000000002
[   59.872520] R13: 00000000fffffff0 R14: 00005585ac058640 R15: 00007ffde604f0d0
snajpa commented 3 years ago

It took a while, but the problem has manifested again, running with ~2 months old git HEAD:

[root@node15.prg.vpsfree.cz:~]# for pid in `ct ps -o pid 18205`; do pid=$(ps aux | grep journald | grep $pid | awk '{ print $2; }'); if [ "$pid" != "" ]; then ps aux | grep $pid; cat /proc/$pid/stack; fi; done
5071175+   12166  0.0  0.0  27432  4504 ?        Ds   11:51   0:00 /lib/systemd/systemd-journald
90308608  121662  0.0  0.0 217132   616 pts/3    Ss+   2020   0:00 agetty --login-program /nix/store/1l5yw8hyfpvcn2a5lzds5nvgaz519n67-shadow-4.8/bin/login --noclear --keep-baud pts/3 115200,38400,9600 vt220
root     3166910  0.0  0.0   6348  2268 pts/194  S+   13:07   0:00 grep 12166
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
5071175+  784154  0.0  0.0  27432  4456 ?        Ds   10:27   0:00 /lib/systemd/systemd-journald
root     3169606  0.0  0.0   6348  2272 pts/194  S+   13:08   0:00 grep 784154
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
5071175+  857385  0.0  0.0  27432  4692 ?        Ds   12:12   0:00 /lib/systemd/systemd-journald
root     3171751  0.0  0.0   6348  2236 pts/194  S+   13:08   0:00 grep 857385
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
5071175+ 1651466  0.0  0.0  27432  4460 ?        Ds   10:48   0:00 /lib/systemd/systemd-journald
root     3174521  0.0  0.0   6348  2176 pts/194  S+   13:08   0:00 grep 1651466
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
5071175+ 1703898  0.0  0.0  27432  4680 ?        Ds   12:34   0:00 /lib/systemd/systemd-journald
root     3174676  0.0  0.0   6348  2180 pts/194  S+   13:08   0:00 grep 1703898
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
5071175+ 2510625  0.0  0.0  27432  4432 ?        Ds   11:09   0:00 /lib/systemd/systemd-journald
root     3187083  0.0  0.0   6348  2272 pts/194  S+   13:08   0:00 grep 2510625
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
5071175+ 2545314  0.0  0.0  27432  4340 ?        Ds   09:24   0:00 /lib/systemd/systemd-journald
root     3188253  0.0  0.0   6348  2264 pts/194  S+   13:08   0:00 grep 2545314
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zpl_writepages+0xdd/0x180 [zfs]
[<0>] do_writepages+0x43/0xd0
[<0>] __filemap_fdatawrite_range+0xce/0x110
[<0>] filemap_write_and_wait_range+0x26/0x70
[<0>] zpl_fsync+0x37/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
5071175+ 2641300  0.0  0.0  27432  4612 ?        Ds   12:55   0:00 /lib/systemd/systemd-journald
root     3189963  0.0  0.0   6348  2276 pts/194  S+   13:08   0:00 grep 2641300
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
root     3191574  0.0  0.0   6348  2264 pts/194  S+   13:08   0:00 grep 3293685
5071175+ 3293685  0.0  0.0  27432  4440 ?        Ds   11:30   0:00 /lib/systemd/systemd-journald
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
root     3191923  0.0  0.0   6348  2272 pts/194  S+   13:08   0:00 grep 3337412
5071175+ 3337412  0.0  0.0  27432  4432 ?        Ds   09:45   0:00 /lib/systemd/systemd-journald
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
root     3195197  0.0  0.0   6348  2236 pts/194  S+   13:08   0:00 grep 4063111
5071175+ 4063111  0.0  0.0 213280 10808 ?        Dsl   2020  21:40 /lib/systemd/systemd-journald
[<0>] do_coredump+0x33f/0x1070
[<0>] get_signal+0x166/0x8f0
[<0>] arch_do_signal+0x30/0x700
[<0>] exit_to_user_mode_prepare+0xf9/0x160
[<0>] syscall_exit_to_user_mode+0x2c/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
root     3199445  0.0  0.0   6348  2268 pts/194  S+   13:08   0:00 grep 4188468
5071175+ 4188468  0.0  0.0  27432  4440 ?        Ds   10:06   0:00 /lib/systemd/systemd-journald
[<0>] zil_commit_impl+0x147/0xe00 [zfs]
[<0>] zfs_fsync+0x72/0x100 [zfs]
[<0>] zpl_fsync+0x67/0xa0 [zfs]
[<0>] do_fsync+0x38/0x70
[<0>] __x64_sys_fsync+0x10/0x20
[<0>] do_syscall_64+0x33/0x40
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
snajpa commented 3 years ago

Well, our issue seems to be only relevant when core-dumping - https://github.com/openzfs/zfs/pull/10875#issuecomment-767880997

AndyLavr commented 2 years ago

Well, our issue seems to be only relevant when core-dumping - #10875 (comment)

The real problem and it does exist - linux-next 20211118 + current zfs:

You can catch this deadlock when debug, use this config:

# The config is based on running daily CI for enterprise Linux distros to
# seek regressions on linux-next builds on different bare-metal and virtual
# platforms. It can be used for example,
#
# $ make ARCH=amd64 defconfig debug.config
#
# Keep alphabetically sorted inside each section.
#
# printk and dmesg options
#
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_PRINTK_CALLER=y
CONFIG_PRINTK_TIME=y
CONFIG_SYMBOLIC_ERRNAME=y
#
# Compile-time checks and compiler options
#
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_SECTION_MISMATCH=y
CONFIG_FRAME_WARN=2048
CONFIG_SECTION_MISMATCH_WARN_ONLY=y
#
# Generic Kernel Debugging Instruments
#
# CONFIG_UBSAN_ALIGNMENT is not set
# CONFIG_UBSAN_DIV_ZERO is not set
# CONFIG_UBSAN_TRAP is not set
# CONFIG_WARN_ALL_UNSEEDED_RANDOM is not set
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_FS_ALLOW_ALL=y
CONFIG_DEBUG_IRQFLAGS=y
CONFIG_UBSAN=y
CONFIG_UBSAN_BOOL=y
CONFIG_UBSAN_BOUNDS=y
CONFIG_UBSAN_ENUM=y
CONFIG_UBSAN_SHIFT=y
CONFIG_UBSAN_UNREACHABLE=y
#
# Memory Debugging
#
# CONFIG_DEBUG_PAGEALLOC is not set
# CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
# CONFIG_DEBUG_RODATA_TEST is not set
# CONFIG_DEBUG_WX is not set
# CONFIG_KFENCE is not set
# CONFIG_PAGE_POISONING is not set
# CONFIG_SLUB_STATS is not set
CONFIG_PAGE_EXTENSION=y
CONFIG_PAGE_OWNER=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_ENABLE_DEFAULT=1
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_OBJECTS_WORK=y
CONFIG_DEBUG_PER_CPU_MAPS=y
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_VIRTUAL=y
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_VM_RB=y
CONFIG_DEBUG_VM_VMACACHE=y
CONFIG_GENERIC_PTDUMP=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_INLINE=y
CONFIG_KASAN_VMALLOC=y
CONFIG_PTDUMP_DEBUGFS=y
CONFIG_SCHED_STACK_END_CHECK=y
CONFIG_SLUB_DEBUG_ON=y
#
# Debug Oops, Lockups and Hangs
#
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_PANIC_ON_OOPS=y
CONFIG_PANIC_TIMEOUT=0
CONFIG_SOFTLOCKUP_DETECTOR=y
#
# Lock Debugging (spinlocks, mutexes, etc...)
#
# CONFIG_PROVE_RAW_LOCK_NESTING is not set
CONFIG_PROVE_LOCKING=y
#
# Debug kernel data structures
#
CONFIG_BUG_ON_DATA_CORRUPTION=y
#
# RCU Debugging
#
CONFIG_PROVE_RCU=y
CONFIG_PROVE_RCU_LIST=y
#
# Tracers
#
CONFIG_BRANCH_PROFILE_NONE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
[  +0,010730] ======================================================
[  +0,000003] WARNING: possible circular locking dependency detected
[  +0,000003] 5.16.0-generic #20211119 Not tainted
[  +0,000004] ------------------------------------------------------
[  +0,000001] systemd-random-/883 is trying to acquire lock:
[  +0,000005] ffff88812d2ec068 (&zilog->zl_lock){+.+.}-{3:3}, at: zil_alloc_lwb+0x4a3/0xd20 [zfs]
[  +0,000321] 
              but task is already holding lock:
[  +0,000002] ffff888131d5d0a0 (&zcw->zcw_lock){+.+.}-{3:3}, at: zil_commit_waiter_timeout+0x26d/0xbe0 [zfs]
[  +0,000299] 
              which lock already depends on the new lock.

[  +0,000003] 
              the existing dependency chain (in reverse order) is:
[  +0,000002] 
              -> #1 (&zcw->zcw_lock){+.+.}-{3:3}:
[  +0,000012]        __lock_acquire+0xaaf/0x1750
[  +0,000010]        lock_acquire.part.0+0x105/0x350
[  +0,000006]        lock_acquire+0xf1/0x2b0
[  +0,000005]        __mutex_lock+0x151/0x13f0
[  +0,000006]        mutex_lock_nested+0x1b/0x20
[  +0,000006]        zil_commit_waiter_skip+0x6f/0x440 [zfs]
[  +0,000289]        zil_commit_impl+0xf80/0x20e0 [zfs]
[  +0,000279]        zil_commit+0xbc/0x120 [zfs]
[  +0,000284]        zfs_fsync+0x1db/0x3c0 [zfs]
[  +0,000286]        zpl_fsync+0x17e/0x290 [zfs]
[  +0,000282]        vfs_fsync_range+0xf6/0x220
[  +0,000007]        __x64_sys_fsync+0x56/0x90
[  +0,000005]        do_syscall_64+0x5c/0xc0
[  +0,000008]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0,000006] 
              -> #0 (&zilog->zl_lock){+.+.}-{3:3}:
[  +0,000012]        check_prev_add+0x162/0x20b0
[  +0,000007]        validate_chain+0xa9c/0xfc0
[  +0,000005]        __lock_acquire+0xaaf/0x1750
[  +0,000006]        lock_acquire.part.0+0x105/0x350
[  +0,000005]        lock_acquire+0xf1/0x2b0
[  +0,000005]        __mutex_lock+0x151/0x13f0
[  +0,000005]        mutex_lock_nested+0x1b/0x20
[  +0,000005]        zil_alloc_lwb+0x4a3/0xd20 [zfs]
[  +0,000282]        zil_lwb_write_issue+0x830/0x12c0 [zfs]
[  +0,000286]        zil_commit_waiter_timeout+0x403/0xbe0 [zfs]
[  +0,000289]        zil_commit_impl+0x8c3/0x20e0 [zfs]
[  +0,000287]        zil_commit+0xbc/0x120 [zfs]
[  +0,000290]        zfs_fsync+0x1db/0x3c0 [zfs]
[  +0,000284]        zpl_fsync+0x17e/0x290 [zfs]
[  +0,000283]        vfs_fsync_range+0xf6/0x220
[  +0,000007]        __x64_sys_fsync+0x56/0x90
[  +0,000005]        do_syscall_64+0x5c/0xc0
[  +0,000005]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0,000006] 
              other info that might help us debug this:

[  +0,000003]  Possible unsafe locking scenario:

[  +0,000002]        CPU0                    CPU1
[  +0,000001]        ----                    ----
[  +0,000002]   lock(&zcw->zcw_lock);
[  +0,000007]                                lock(&zilog->zl_lock);
[  +0,000005]                                lock(&zcw->zcw_lock);
[  +0,000006]   lock(&zilog->zl_lock);
[  +0,000005] 
               *** DEADLOCK ***

[  +0,000002] 2 locks held by systemd-random-/883:
[  +0,000004]  #0: ffff88812d2ec288 (&zilog->zl_issuer_lock){+.+.}-{3:3}, at: zil_commit_waiter_timeout+0x1b5/0xbe0 [zfs]
[  +0,000300]  #1: ffff888131d5d0a0 (&zcw->zcw_lock){+.+.}-{3:3}, at: zil_commit_waiter_timeout+0x26d/0xbe0 [zfs]
[  +0,000298] 
              stack backtrace:
[  +0,000003] CPU: 7 PID: 883 Comm: systemd-random- Not tainted 5.16.0-generic #20211119
[  +0,000007] Hardware name: Dell Inc. Precision M6600/04YY4M, BIOS A18 09/14/2018
[  +0,000004] Call Trace:
[  +0,000003]  <TASK>
[  +0,000004]  show_stack+0x52/0x58
[  +0,000008]  dump_stack_lvl+0x74/0xa6
[  +0,000007]  dump_stack+0x10/0x12
[  +0,000005]  print_circular_bug.cold+0x15d/0x16c
[  +0,000012]  check_noncircular+0x268/0x310
[  +0,000008]  ? print_circular_bug+0x1d0/0x1d0
[  +0,000007]  ? lock_acquire.part.0+0x105/0x350
[  +0,000006]  ? _raw_spin_lock_irqsave+0xe/0x10
[  +0,000009]  ? check_prev_add+0x20b0/0x20b0
[  +0,000013]  check_prev_add+0x162/0x20b0
[  +0,000005]  ? add_chain_cache+0x26a/0x860
[  +0,000006]  ? call_rcu_zapped+0xb0/0xb0
[  +0,000012]  validate_chain+0xa9c/0xfc0
[  +0,000011]  ? check_prev_add+0x20b0/0x20b0
[  +0,000013]  __lock_acquire+0xaaf/0x1750
[  +0,000015]  lock_acquire.part.0+0x105/0x350
[  +0,000007]  ? zil_alloc_lwb+0x4a3/0xd20 [zfs]
[  +0,000288]  ? lock_acquire.part.0+0x105/0x350
[  +0,000007]  ? _raw_spin_lock_irqsave+0xe/0x10
[  +0,000006]  ? rcu_read_unlock+0x50/0x50
[  +0,000010]  ? __kasan_check_read+0x11/0x20
[  +0,000013]  lock_acquire+0xf1/0x2b0
[  +0,000006]  ? zil_alloc_lwb+0x4a3/0xd20 [zfs]
[  +0,000291]  __mutex_lock+0x151/0x13f0
[  +0,000006]  ? zil_alloc_lwb+0x4a3/0xd20 [zfs]
[  +0,000290]  ? zil_alloc_lwb+0x4a3/0xd20 [zfs]
[  +0,000289]  ? _raw_spin_unlock_irqrestore+0xe/0x10
[  +0,000007]  ? create_object.isra.0+0x5a7/0xae0
[  +0,000009]  ? mutex_lock_io_nested+0x1240/0x1240
[  +0,000009]  ? rcu_read_lock_sched_held+0x46/0x80
[  +0,000008]  ? kmem_cache_alloc+0x326/0x3f0
[  +0,000012]  ? spl_kmem_cache_alloc+0x2a1/0x3d0 [spl]
[  +0,000030]  mutex_lock_nested+0x1b/0x20
[  +0,000005]  ? zio_buf_alloc+0x4f/0xa0 [zfs]
[  +0,000286]  ? mutex_lock_nested+0x1b/0x20
[  +0,000007]  zil_alloc_lwb+0x4a3/0xd20 [zfs]
[  +0,000289]  ? dsl_dataset_dirty+0x122/0x390 [zfs]
[  +0,232318]  zil_lwb_write_issue+0x830/0x12c0 [zfs]
[  +0,001872] systemd[1]: Started Journal Service.
[  -0,000061]  ? zil_lwb_add_block+0x8e0/0x8e0 [zfs]
[  +0,003970]  zil_commit_waiter_timeout+0x403/0xbe0 [zfs]
[  +0,001912]  zil_commit_impl+0x8c3/0x20e0 [zfs]
[  +0,000175]  ? __mutex_unlock_slowpath+0x160/0x3c0
[  +0,000012]  ? zil_itx_assign+0x1180/0x1180 [zfs]
[  +0,000165]  ? _raw_spin_unlock+0x23/0x40
[  +0,000005]  ? rrw_enter_read_impl+0x3e9/0x830 [zfs]
[  +0,000160]  zil_commit+0xbc/0x120 [zfs]
[  +0,000163]  zfs_fsync+0x1db/0x3c0 [zfs]
[  +0,000171]  zpl_fsync+0x17e/0x290 [zfs]
[  +0,000163]  ? zpl_iter_write+0x5e0/0x5e0 [zfs]
[  +0,000152]  vfs_fsync_range+0xf6/0x220
[  +0,000005]  ? __fget_light+0x57/0x240
[  +0,000003]  ? syscall_enter_from_user_mode+0x21/0x60
[  +0,000006]  __x64_sys_fsync+0x56/0x90
[  +0,000004]  do_syscall_64+0x5c/0xc0
[  +0,000005]  ? lockdep_hardirqs_on_prepare.part.0+0x1be/0x370
[  +0,000006]  ? irqentry_exit_to_user_mode+0x29/0x30
[  +0,000003]  ? irqentry_exit+0x63/0x90
[  +0,000003]  ? exc_page_fault+0x86/0xf0
[  +0,000003]  ? asm_exc_page_fault+0x8/0x30
[  +0,000005]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  +0,000004] RIP: 0033:0x7f86c54468e7
[  +0,000005] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 e3 5f f7 ff
[  +0,000003] RSP: 002b:00007ffecef47df8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[  +0,000005] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f86c54468e7
[  +0,000003] RDX: 0000000000000200 RSI: 0000000000000200 RDI: 0000000000000004
[  +0,000002] RBP: 0000000000000004 R08: 000055be174c1a00 R09: 0000000000000000
[  +0,000002] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000005
[  +0,000002] R13: 0000000000000000 R14: 000055be174c1cf0 R15: 0000000000000200
[  +0,000007]  </TASK>

I will report the result after testing this fix. Fix zfs_fsync deadlock

AndyLavr commented 2 years ago

The situation changed after the patch, TRAP disappeared, but the deadlock end remained.

[   46.278264] ======================================================
[   46.278268] WARNING: possible circular locking dependency detected
[   46.278270] 5.16.0-generic #20211121 Not tainted
[   46.278272] ------------------------------------------------------
[   46.278273] z_ioctl_int/427 is trying to acquire lock:
[   46.278276] ffff88814a9ecd88 (&zcw->zcw_lock){+.+.}-{3:3}, at: zil_lwb_flush_vdevs_done+0x67e/0x1270 [zfs]
[   46.302996] 
               but task is already holding lock:
[   46.303001] ffff88812d6f1068 (&zilog->zl_lock){+.+.}-{3:3}, at: zil_lwb_flush_vdevs_done+0x204/0x1270 [zfs]
[   46.315210] 
               which lock already depends on the new lock.

[   46.315222] 
               the existing dependency chain (in reverse order) is:
[   46.315223] 
               -> #1 (&zilog->zl_lock){+.+.}-{3:3}:
[   46.315232]        __lock_acquire+0xaaf/0x1750
[   46.315241]        lock_acquire.part.0+0x105/0x350
[   46.315244]        lock_acquire+0xf1/0x2b0
[   46.315248]        __mutex_lock+0x151/0x13f0
[   46.346712]        mutex_lock_nested+0x1b/0x20
[   46.346729]        zil_commit_waiter_link_lwb+0x137/0x930 [zfs]
[   46.352404]        zil_lwb_commit+0x8a7/0xc30 [zfs]
[   46.352595]        zil_process_commit_list+0x3df/0x1700 [zfs]
[   46.352758]        zil_commit_impl+0x1755/0x22b0 [zfs]
[   46.352919]        zil_commit+0xbc/0x120 [zfs]
[   46.353081]        zfs_fsync+0x1db/0x3c0 [zfs]
[   46.353242]        zpl_fsync+0x17e/0x290 [zfs]
[   46.353411]        vfs_fsync_range+0xf6/0x220
[   46.353416]        __x64_sys_fsync+0x56/0x90
[   46.353419]        do_syscall_64+0x5c/0xc0
[   46.353424]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[   46.353429] 
               -> #0 (&zcw->zcw_lock){+.+.}-{3:3}:
[   46.353438]        check_prev_add+0x162/0x20b0
[   46.353442]        validate_chain+0xa9c/0xfc0
[   46.353446]        __lock_acquire+0xaaf/0x1750
[   46.353449]        lock_acquire.part.0+0x105/0x350
[   46.353452]        lock_acquire+0xf1/0x2b0
[   46.353455]        __mutex_lock+0x151/0x13f0
[   46.353458]        mutex_lock_nested+0x1b/0x20
[   46.353461]        zil_lwb_flush_vdevs_done+0x67e/0x1270 [zfs]
[   46.353623]        zio_done+0xcbb/0x5600 [zfs]
[   46.353785]        zio_execute+0x1c5/0x450 [zfs]
[   46.353947]        taskq_thread+0x9de/0x15d0 [spl]
[   46.353963]        kthread+0x35d/0x420
[   46.353966]        ret_from_fork+0x22/0x30
[   46.353971] 
               other info that might help us debug this:

[   46.353973]  Possible unsafe locking scenario:

[   46.353974]        CPU0                    CPU1
[   46.353975]        ----                    ----
[   46.353976]   lock(&zilog->zl_lock);
[   46.353980]                                lock(&zcw->zcw_lock);
[   46.353983]                                lock(&zilog->zl_lock);
[   46.353986]   lock(&zcw->zcw_lock);
[   46.353989] 
                *** DEADLOCK ***

[   46.353990] 1 lock held by z_ioctl_int/427:
[   46.353993]  #0: ffff88812d6f1068 (&zilog->zl_lock){+.+.}-{3:3}, at: zil_lwb_flush_vdevs_done+0x204/0x1270 [zfs]
[   46.354163] 
               stack backtrace:
[   46.354165] CPU: 1 PID: 427 Comm: z_ioctl_int Not tainted 5.16.0-generic #20211121
[   46.354169] Hardware name: Dell Inc. Precision M6600/04YY4M, BIOS A18 09/14/2018
[   46.354172] Call Trace:
[   46.354174]  <TASK>
[   46.354176]  show_stack+0x52/0x58
[   46.354182]  dump_stack_lvl+0x74/0xa6
[   46.354187]  dump_stack+0x10/0x12
[   46.354190]  print_circular_bug.cold+0x15d/0x16c
[   46.354197]  check_noncircular+0x268/0x310
[   46.354202]  ? print_circular_bug+0x1d0/0x1d0
[   46.354208]  ? ret_from_fork+0x22/0x30
[   46.354215]  check_prev_add+0x162/0x20b0
[   46.354219]  ? add_chain_cache+0x26a/0x860
[   46.354223]  ? call_rcu_zapped+0xb0/0xb0
[   46.354229]  validate_chain+0xa9c/0xfc0
[   46.354235]  ? check_prev_add+0x20b0/0x20b0
[   46.354242]  __lock_acquire+0xaaf/0x1750
[   46.354246]  ? zio_data_buf_free+0x52/0xa0 [zfs]
[   46.354410]  ? zil_lwb_flush_vdevs_done+0x59d/0x1270 [zfs]
[   46.354572]  ? zio_done+0xcbb/0x5600 [zfs]
[   46.354752]  lock_acquire.part.0+0x105/0x350
[   46.354757]  ? zil_lwb_flush_vdevs_done+0x67e/0x1270 [zfs]
[   46.354922]  ? rcu_read_unlock+0x50/0x50
[   46.354926]  ? _raw_spin_unlock_irqrestore+0xe/0x10
[   46.354932]  ? mark_held_locks+0xa4/0xf0
[   46.354936]  ? __kasan_slab_free+0xde/0x110
[   46.354942]  lock_acquire+0xf1/0x2b0
[   46.354946]  ? zil_lwb_flush_vdevs_done+0x67e/0x1270 [zfs]
[   46.355112]  __mutex_lock+0x151/0x13f0
[   46.355116]  ? zil_lwb_flush_vdevs_done+0x67e/0x1270 [zfs]
[   46.355279]  ? __kasan_slab_free+0xde/0x110
[   46.355284]  ? zil_lwb_flush_vdevs_done+0x67e/0x1270 [zfs]
[   46.355447]  ? spl_kmem_cache_free+0x3ad/0x6a0 [spl]
[   46.355463]  ? mutex_lock_io_nested+0x1240/0x1240
[   46.355466]  ? kmem_cache_free+0x1bd/0x3d0
[   46.355472]  ? lockdep_hardirqs_on_prepare.part.0+0x1be/0x370
[   46.355479]  ? spl_kmem_cache_free+0x3f0/0x6a0 [spl]
[   46.355495]  mutex_lock_nested+0x1b/0x20
[   46.355498]  ? mutex_lock_nested+0x1b/0x20
[   46.355502]  zil_lwb_flush_vdevs_done+0x67e/0x1270 [zfs]
[   46.511317]  ? zil_create+0x670/0x670 [zfs]
[   46.513140]  ? zio_wait_for_children+0x1cb/0x4d0 [zfs]
[   46.513348] systemd[1]: Started Journal Service.
[   46.514977]  zio_done+0xcbb/0x5600 [zfs]
[   46.518333]  zio_execute+0x1c5/0x450 [zfs]
[   46.520161]  taskq_thread+0x9de/0x15d0 [spl]
[   46.520182]  ? taskq_thread_spawn+0x170/0x170 [spl]
[   46.520197]  ? wake_up_q+0x100/0x100
[   46.520203]  ? lock_acquire.part.0+0x105/0x350
[   46.520208]  ? _raw_spin_lock_irqsave+0xe/0x10
[   46.520213]  ? find_held_lock+0x35/0x130
[   46.520225]  ? zio_gang_tree_free+0xf0/0xf0 [zfs]
[   46.532048]  ? __kasan_check_read+0x11/0x20
[   46.532055]  ? __kthread_parkme+0xcb/0x1e0
[   46.532067]  ? __kasan_check_read+0x11/0x20
[   46.532070]  ? taskq_thread_spawn+0x170/0x170 [spl]
[   46.532086]  kthread+0x35d/0x420
[   46.540627]  ? set_kthread_struct+0x110/0x110
[   46.540632]  ret_from_fork+0x22/0x30
[   46.540650]  </TASK>
snajpa commented 2 years ago

@AndyLavr I'm not sure the issues are actually related, it's been a while and all I recall now is a good salad from the mix of info I had at the time :D

Are you experiencing this with any other linux revisions than the pretty recent -next?

Btw, do you think you'd be able to distill a reproducer from the workload you have?

AndyLavr commented 2 years ago

@snajpa

Are you experiencing this with any other linux revisions than the pretty recent -next?

I was debugging another issue and accidentally saw this. I only use -next.

Btw, do you think you'd be able to distill a reproducer from the workload you have?

This is one message during boot. I will send you dmesg by email.

snajpa commented 2 years ago

Ah, OK, I see, that's a lockdep warning about a potential deadlock; if you're not really hitting it, I guess you can ignore that one :)

I haven't seen the actual deadlock occur again, so it must be resolved for quite some time now. :)

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.