Open satabdidas opened 2 years ago
2.1.3 and 2.1.4 contain a couple of fixes for deadlocks, you might want to try 2.1.4 and see if it magically goes away.
2.1.3 and 2.1.4 contain a couple of fixes for deadlocks, you might want to try 2.1.4 and see if it magically goes away.
We reproduced this issue with 2.1.3. Trying with 2.1.4 is a good idea. Will try. Thanks
We are seeing the issue with ZFS 2.1.4 as well.
We upgraded the linux kernel to 5.15.44 and used zfs 2.1.4. We see almost same stack traces. But most importantly we see a memory corruption -
Jun 14 18:59:17 ip-198-19-23-132 kernel: list_add corruption. prev->next should be next (ffff000592c924c8), but was 0000000000000000. (prev=ffff0005c22ce870).
Jun 14 18:59:17 ip-198-19-23-132 kernel: ------------[ cut here ]------------
Jun 14 18:59:17 ip-198-19-23-132 kernel: kernel BUG at lib/list_debug.c:26!
Jun 14 18:59:18 ip-198-19-23-132 kernel: Internal error: Oops - BUG: 0 [#1] SMP
Jun 14 18:59:18 ip-198-19-23-132 kernel: Modules linked in: rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl lockd grace raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq xt_LOG nf_log_syslog xt_limit iptable_filter iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw iptable_security bpfilter bcache crc64 sunrpc nls_ascii nls_cp437 vfat fat dm_mirror dm_region_hash dm_log dm_mod dax aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce gf128mul sha2_ce sha256_arm64 sha1_ce button zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE) icp(POE) spl(OE) nbd binfmt_misc ena efivarfs
Jun 14 18:59:18 ip-198-19-23-132 kernel: CPU: 13 PID: 9342 Comm: nfsd Tainted: P OE 5.15.44-21.123.amzn2.aarch64 #1
Jun 14 18:59:18 ip-198-19-23-132 kernel: Hardware name: Amazon EC2 c6gn.16xlarge/, BIOS 1.0 11/1/2018
Jun 14 18:59:18 ip-198-19-23-132 kernel: pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Jun 14 18:59:18 ip-198-19-23-132 kernel: pc : __list_add_valid+0x88/0x8c
Jun 14 18:59:18 ip-198-19-23-132 kernel: lr : __list_add_valid+0x88/0x8c
Jun 14 18:59:18 ip-198-19-23-132 kernel: sp : ffff80000ea6b600
Jun 14 18:59:18 ip-198-19-23-132 kernel: x29: ffff80000ea6b600 x28: 0000000000000003 x27: ffff800000f21290
Jun 14 18:59:18 ip-198-19-23-132 kernel: x26: 0000000000000000 x25: ffff0005c22ce870 x24: ffff000592c924c8
Jun 14 18:59:18 ip-198-19-23-132 kernel: x23: ffff000592c924c4 x22: ffff80000ea6b668 x21: 0000000000000002
Jun 14 18:59:18 ip-198-19-23-132 kernel: x20: ffff000592c924b0 x19: ffffffffffffff00 x18: 0000000000000030
Jun 14 18:59:18 ip-198-19-23-132 kernel: x17: 3863343239633239 x16: 3530303066666666 x15: 0720072007200720
Jun 14 18:59:18 ip-198-19-23-132 kernel: x14: 072e072907300737 x13: 0720072007200720 x12: 072e072907300737
Jun 14 18:59:18 ip-198-19-23-132 kernel: x11: ffff800009de8ed8 x10: 00000000ffff0000 x9 : ffff800008237338
Jun 14 18:59:18 ip-198-19-23-132 kernel: x8 : 0000000000000000 x7 : ffff800009868ed8 x6 : 000000000002fffd
Jun 14 18:59:18 ip-198-19-23-132 kernel: x5 : 0000000000000000 x4 : ffff00226fc091c8 x3 : 0000000000000000
Jun 14 18:59:18 ip-198-19-23-132 kernel: x2 : 0000000000000000 x1 : ffff000422f1c180 x0 : 0000000000000075
Jun 14 18:59:18 ip-198-19-23-132 kernel: Call trace:
Jun 14 18:59:18 ip-198-19-23-132 kernel: __list_add_valid+0x88/0x8c
Jun 14 18:59:18 ip-198-19-23-132 kernel: rwsem_down_read_slowpath+0xe4/0x4d0
Jun 14 18:59:18 ip-198-19-23-132 kernel: down_read+0x4c/0xd0
Jun 14 18:59:18 ip-198-19-23-132 kernel: zap_lockdir_impl+0x29c/0x340 [zfs]
Jun 14 18:59:18 ip-198-19-23-132 kernel: zap_lockdir+0xc0/0xe0 [zfs]
Jun 14 18:59:18 ip-198-19-23-132 kernel: zap_lookup+0x68/0x118 [zfs]
Jun 14 18:59:18 ip-198-19-23-132 kernel: zfs_id_overblockquota+0xcc/0x168 [zfs]
Jun 14 18:59:18 ip-198-19-23-132 kernel: zfs_write+0x238/0xabc [zfs]
Jun 14 18:59:18 ip-198-19-23-132 kernel: zpl_iter_write+0xf8/0x180 [zfs]
Jun 14 18:59:18 ip-198-19-23-132 kernel: do_iter_readv_writev+0x108/0x1ac
Jun 14 18:59:18 ip-198-19-23-132 kernel: do_iter_write+0x98/0x188
Jun 14 18:59:18 ip-198-19-23-132 kernel: vfs_iter_write+0x28/0x3c
Jun 14 18:59:18 ip-198-19-23-132 kernel: nfsd_vfs_write+0x2b0/0x518 [nfsd]
Jun 14 18:59:18 ip-198-19-23-132 kernel: nfsd4_write+0x154/0x1e4 [nfsd]
Jun 14 18:59:18 ip-198-19-23-132 kernel: nfsd4_proc_compound+0x47c/0x6b0 [nfsd]
Jun 14 18:59:18 ip-198-19-23-132 kernel: nfsd_dispatch+0x108/0x238 [nfsd]
Jun 14 18:59:18 ip-198-19-23-132 kernel: svc_process_common+0x4a4/0x6f0 [sunrpc]
Jun 14 18:59:18 ip-198-19-23-132 kernel: svc_process+0x88/0xe0 [sunrpc]
Jun 14 18:59:18 ip-198-19-23-132 kernel: nfsd+0x100/0x15c [nfsd]
Jun 14 18:59:18 ip-198-19-23-132 kernel: kthread+0x114/0x118
I looked at the call chain from rwsem_down_read_slowpath
till __list_add_valid
-
rwsem_down_read_slowpath
|_ rwsem_add_waiter
|_ list_add_tail
|_ __list_add
|_ __list_add_valid
The __list_ad_valid
throws an error because the prev->next
in the list is not same as the current node code link
The call chain above is trying to add an entry to the krwlock_t->rw_rwlock->wait_list
and the kernel detects a memory corruption in the krwlock_t->rw_rwlock->wait_list
This could be a kernel issue or a ZFS issue or something else.
I am looking for a way to debug ZFS and find out if the memory corruption has been caused by ZFS. Is there anyway I can do that?
I don't recall any exciting fixes that would be relevant, but I'd suggest trying on 2.1.5 before you do much else.
You could try playing with kASAN, though I don't know if it'd catch this kind of problem, depending on precisely what it's doing. I believe that's supported on aarch64...
I tried kasan with 5.10 kernel but it didn't report any thing related. I'll try 2.1.5.
Here's what I got from kasan -
[Thu Jun 2 14:16:25 2022] ==================================================================
[Thu Jun 2 14:16:25 2022] BUG: KASAN: slab-out-of-bounds in raid_status+0x64/0x1120 [dm_raid]
[Thu Jun 2 14:16:25 2022] Read of size 4 at addr ffff0004b29b8348 by task vgs/3983
[Thu Jun 2 14:16:25 2022] CPU: 20 PID: 3983 Comm: vgs Tainted: P OE 5.10.112-108.499.amzn2.aarch64 #1
[Thu Jun 2 14:16:25 2022] Hardware name: Amazon EC2 c6gn.16xlarge/, BIOS 1.0 11/1/2018
[Thu Jun 2 14:16:25 2022] Call trace:
[Thu Jun 2 14:16:25 2022] dump_backtrace+0x0/0x300
[Thu Jun 2 14:16:25 2022] show_stack+0x24/0x30
[Thu Jun 2 14:16:25 2022] dump_stack+0x100/0x170
[Thu Jun 2 14:16:25 2022] print_address_description.constprop.0+0x68/0x20c
[Thu Jun 2 14:16:25 2022] __kasan_report+0xe0/0x138
[Thu Jun 2 14:16:25 2022] kasan_report+0x44/0xd0
[Thu Jun 2 14:16:25 2022] __asan_load4+0x94/0xcc
[Thu Jun 2 14:16:25 2022] raid_status+0x64/0x1120 [dm_raid]
[Thu Jun 2 14:16:25 2022] retrieve_status+0xd4/0x2c8 [dm_mod]
[Thu Jun 2 14:16:25 2022] table_status+0x140/0x1e0 [dm_mod]
[Thu Jun 2 14:16:25 2022] ctl_ioctl+0x240/0x448 [dm_mod]
[Thu Jun 2 14:16:25 2022] dm_ctl_ioctl+0x1c/0x38 [dm_mod]
[Thu Jun 2 14:16:25 2022] __arm64_sys_ioctl+0xdc/0x120
[Thu Jun 2 14:16:25 2022] el0_svc_common.constprop.0+0xbc/0x308
[Thu Jun 2 14:16:25 2022] do_el0_svc+0x90/0xb8
[Thu Jun 2 14:16:25 2022] el0_svc+0x20/0x30
[Thu Jun 2 14:16:25 2022] el0_sync_handler+0x9c/0x120
[Thu Jun 2 14:16:25 2022] Allocated by task 3757:
[Thu Jun 2 14:16:25 2022] kasan_save_stack+0x28/0x60
[Thu Jun 2 14:16:25 2022] __kasan_kmalloc.constprop.0+0xc8/0xe8
[Thu Jun 2 14:16:25 2022] kasan_kmalloc+0x10/0x1c
[Thu Jun 2 14:16:25 2022] kmem_cache_alloc_trace+0x1bc/0x328
[Thu Jun 2 14:16:25 2022] create_strip_zones+0xb4/0xf40 [raid0]
[Thu Jun 2 14:16:25 2022] raid0_run+0x358/0x3f4 [raid0]
[Thu Jun 2 14:16:25 2022] md_run+0x588/0xecc
[Thu Jun 2 14:16:25 2022] raid_ctr+0x638/0x15b4 [dm_raid]
[Thu Jun 2 14:16:25 2022] dm_table_add_target+0x284/0x580 [dm_mod]
[Thu Jun 2 14:16:25 2022] table_load+0x21c/0x4a8 [dm_mod]
[Thu Jun 2 14:16:25 2022] ctl_ioctl+0x240/0x448 [dm_mod]
[Thu Jun 2 14:16:25 2022] dm_ctl_ioctl+0x1c/0x38 [dm_mod]
[Thu Jun 2 14:16:25 2022] __arm64_sys_ioctl+0xdc/0x120
[Thu Jun 2 14:16:25 2022] el0_svc_common.constprop.0+0xbc/0x308
[Thu Jun 2 14:16:25 2022] do_el0_svc+0x90/0xb8
[Thu Jun 2 14:16:25 2022] el0_svc+0x20/0x30
[Thu Jun 2 14:16:25 2022] el0_sync_handler+0x9c/0x120
[Thu Jun 2 14:16:25 2022] The buggy address belongs to the object at ffff0004b29b8300
which belongs to the cache kmalloc-128 of size 128
[Thu Jun 2 14:16:25 2022] The buggy address is located 72 bytes inside of
128-byte region [ffff0004b29b8300, ffff0004b29b8380)
[Thu Jun 2 14:16:25 2022] The buggy address belongs to the page:
[Thu Jun 2 14:16:25 2022] page:000000003e79ac1c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4f29b8
[Thu Jun 2 14:16:25 2022] head:000000003e79ac1c order:1 compound_mapcount:0
[Thu Jun 2 14:16:25 2022] flags: 0x2fffff0000010200(slab|head)
[Thu Jun 2 14:16:25 2022] raw: 2fffff0000010200 dead000000000100 dead000000000122 ffff0003c000fc00
[Thu Jun 2 14:16:25 2022] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[Thu Jun 2 14:16:25 2022] page dumped because: kasan: bad access detected
One more point to mention, when we turn off ZSTD, the frametest workload runs fine. But when we turn it on, we can see the hang within a few seconds.
I don't recall any exciting fixes that would be relevant, but I'd suggest trying on 2.1.5 before you do much else.
Tried running with zfs 2.1.5 and I could still recreate the deadlock with frametest.
It may change the behavior because of timing and using more memory, but if you create a debug build (./configure --enable-debug) and once the module is loaded, before running the test,
echo "1" > /sys/module/zfs/parameters/reference_tracking_enable
then zfs will record the name of the function that took the rwlock but never released it. Examine rc_list using crash.
Thank you for the pointer. Will do.
On Tue, Aug 16, 2022 at 9:47 AM Paul Zuchowski @.***> wrote:
It may change the behavior because of timing and using more memory, but if you create a debug build (./configure --enable-debug) and once the module is loaded, before running the test,
echo "1" > /sys/module/zfs/parameters/reference_tracking_enable
then zfs will record the name of the function that took the rwlock but never released it. Examine rc_list using crash.
— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/13419#issuecomment-1216662689, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAW3AC4B67FYJP4GLSHSG63VZOLWXANCNFSM5VDELEZQ . You are receiving this because you authored the thread.Message ID: @.***>
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Describe the issue you are experiencing Exposing ZFS on linux over NFS on AWS ec2 and doing rsync (or other sustained write-only load) appears to intermittently lead to a deadlock on
zap->zap_rwlock
. The server becomes unresponsive and we can't ssh to it. We have to reboot the host to be able to bring the host back online.Describe how to reproduce the issue
cp
commands in parallel from the client. It takes some time for the host to be unresponsive but it surely becomes so.Including any warning/errors/backtraces from the system logs
We see the following stack trace when the issue occurs -
By commenting out the call to
zfs_id_overblockquota
fromzfs_write
, the deadlock does not reproduce (with the likely side effect that quota handling will be broken).From the crash dump we are suspecting another process is the owner of the lock
zap_rwlock
and it didn't release it.Here's how we have the above suspicion. We inspected the
zap_rwlock
structure which is of typekrwlock_t
. Here's how reached to it.Here's a snippet of
bt -FFx
output -We then inspected
ffff0017ab8ced00:dmu_buf_impl_t
Then we inspected
db_user
since it gets assigned tozap
from here -zap_t *zap = dmu_buf_get_user(db);
Then we checked the owner field which is
-281453659007423 (0xFFFF0004F6A25641)
and we found a task for0xFFFF0004F6A25640
-And here's the current stack trace of the above task -
Since the above stack trace is not within any zfs functions, we suspect this thread took the lock and exited without releasing it.