Open jjakob opened 3 years ago
This happened to me this morning on a recently built Promox VE 7 server running sanoid/syncoid every minute. FreeNAS Mini server which uses an ASRock Rack C2750D4I. I've had two of these servers running on FreeBSD 12.2 for years w/o issue. Just recently converted them over to Proxmox/ZFS on Linux.
Almost all zfs/zpool commands froze completely (ctrl-c did not work) though I was able to run a zpool status
. I have three pools and for some reason my main storage pool reported this:
pool: storage
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 1.45T in 05:41:01 with 0 errors on Mon Sep 13 21:55:24 2021
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-WDC_WD60EFZX-68B3FN0_1 ONLINE 0 0 0
ata-WDC_WD60EFRX-68MYMN1_2 ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_3 ONLINE 0 0 0
ata-WDC_WD60EFRX-68MYMN1_4 ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_5 ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_6 ONLINE 0 0 0
ata-WDC_WD60EFRX-68L0BN1_7 ONLINE 0 0 0
ata-WDC_WD60EFRX-68MYMN1_8 ONLINE 0 0 0
errors: 1 data errors, use '-v' for a list
This was a freshly built pool as of a few weeks ago. The only thing I've done to it since I built it was to swap out what I found out to be an SMR drive (all drives are now CMR). As you can see, the resilver completed successfully. No errors found.
The filesystems could still be accessed, though accessing a snapshot directory hung the ls
command.
When I ran zpool status -v
it hung indefinitely. I had to reboot the box. When the box came back, zpool status
reported no errors whatsoever. I'm running a scrub now.
I could find no other errors in any of the logs.
# uname -a
Linux vmh01 5.11.22-3-pve #1 SMP PVE 5.11.22-7 (Wed, 18 Aug 2021 15:06:12 +0200) x86_64 GNU/Linux
# zfs version
zfs-2.0.5-pve1
zfs-kmod-2.0.5-pve1
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
SMP PTI
CPU: 2 PID: 2767864 Comm: zfs Tainted: P O 5.11.22-3-pve #1
Hardware name: iXsystems FREENAS-MINI-2.0/C2750D4I, BIOS P3.20 03/26/2018
RIP: 0010:zap_lockdir_impl+0x2a6/0x7b0 [zfs]
Code: 86 d8 00 00 00 00 00 00 00 e8 f6 7e f3 e2 4d 89 ae 98 00 00 00 e9 47 fe ff ff 48 8b 43 18 b9 a3 01 00 00 31 f6 bf 28 01 00 00 <48> 8b 10 48 8b 40 08 48 89 95 58 ff ff ff 48 c7 c2 18 65 8a c0 48
RSP: 0018:ffffa5b24dcfb830 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8db0fe204d80 RCX: 00000000000001a3
RDX: ffff8db056839700 RSI: 0000000000000000 RDI: 0000000000000128
RBP: ffffa5b24dcfb8f0 R08: 0000000000000001 R09: 0000000000000000
R10: 2000000000000000 R11: 0000008000000000 R12: 0000000000000002
R13: ffff8db1e3d64800 R14: 0000000000000000 R15: ffffa5b24dcfb968
FS: 00007fa1d9c747c0(0000) GS:ffff8db69fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000058b98e000 CR4: 00000000001026e0
Call Trace:
? dbuf_read+0x128/0x550 [zfs]
zap_lockdir+0x8c/0xb0 [zfs]
zap_lookup+0x50/0x100 [zfs]
zfs_get_zplprop+0xb7/0x1a0 [zfs]
dmu_send_impl+0xf01/0x1550 [zfs]
? dnode_rele+0x3d/0x50 [zfs]
? dbuf_rele_and_unlock+0x1fd/0x610 [zfs]
dmu_send_obj+0x24c/0x360 [zfs]
zfs_ioc_send+0x11d/0x2c0 [zfs]
? zfs_ioc_send+0x2c0/0x2c0 [zfs]
zfsdev_ioctl_common+0x72c/0x940 [zfs]
? __check_object_size+0x5d/0x150
zfsdev_ioctl+0x57/0xe0 [zfs]
__x64_sys_ioctl+0x91/0xc0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa1da24ecc7
Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffdc3efca18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000056233d62caa0 RCX: 00007fa1da24ecc7
RDX: 00007ffdc3efca50 RSI: 0000000000005a1c RDI: 0000000000000003
RBP: 00007ffdc3f00440 R08: 000000000000000f R09: 0000000000000028
R10: 0000000000000080 R11: 0000000000000246 R12: 00007ffdc3efca50
R13: 00007ffdc3f05270 R14: 000056233d62cab0 R15: 0000000000000000
Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bpfilter softdog nfnetlink_log nfnetlink ipmi_ssif intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper intel_cstate pcspkr at24 joydev input_leds ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core nfsd iscsi_tcp libiscsi_tcp libiscsi auth_rpcgss scsi_transport_iscsi nfs_acl lockd grace drm sunrpc nfs_ssc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbkbd usbmouse usbhid hid gpio_ich mpt3sas i2c_i801 ehci_pci igb raid_class
ahci i2c_algo_bit crc32_pclmul i2c_smbus lpc_ich ehci_hcd i2c_ismt dca libahci scsi_transport_sas
CR2: 0000000000000000
---[ end trace bfdfd0e448c4e64f ]---
RIP: 0010:zap_lockdir_impl+0x2a6/0x7b0 [zfs]
Code: 86 d8 00 00 00 00 00 00 00 e8 f6 7e f3 e2 4d 89 ae 98 00 00 00 e9 47 fe ff ff 48 8b 43 18 b9 a3 01 00 00 31 f6 bf 28 01 00 00 <48> 8b 10 48 8b 40 08 48 89 95 58 ff ff ff 48 c7 c2 18 65 8a c0 48
RSP: 0018:ffffa5b24dcfb830 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8db0fe204d80 RCX: 00000000000001a3
RDX: ffff8db056839700 RSI: 0000000000000000 RDI: 0000000000000128
RBP: ffffa5b24dcfb8f0 R08: 0000000000000001 R09: 0000000000000000
R10: 2000000000000000 R11: 0000008000000000 R12: 0000000000000002
R13: ffff8db1e3d64800 R14: 0000000000000000 R15: ffffa5b24dcfb968
FS: 00007fa1d9c747c0(0000) GS:ffff8db69fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000058b98e000 CR4: 00000000001026e0
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
Hi, I seem to have hit a similar issue while doing send. The system is ubuntu 22.04 with stock ZFS (2.1.4-0ubuntu0.1), encryption is enabled.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 9 PID: 2985410 Comm: zfs Tainted: P O 5.15.0-48-generic #54-Ubuntu
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./E3C246D4U2-2T, BIOS L2.02P 10/19/2020
RIP: 0010:mzap_open+0x37/0x330 [zfs]
Code: e5 41 57 49 89 f7 31 f6 41 56 41 55 41 54 53 48 89 d3 48 83 ec 10 48 8b 42 18 48 89 7d d0 48 c7 c2 a8 d3 e6 c0 bf 28 01 00 00 <4c> 8b 30 48 8b 40 08 48 89 45 c8 e8 89 1b 6a ff 48 c7 c2 0c 8d f2
RSP: 0018:ffffa6dd8ed937c8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8dc0d6897000 RCX: 00000000000001a1
RDX: ffffffffc0e6d3a8 RSI: 0000000000000000 RDI: 0000000000000128
RBP: ffffa6dd8ed93800 R08: 0000000000000000 R09: 0020000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8dbfc4cef800
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
FS: 00007f60a5e8f7c0(0000) GS:ffff8dc67f640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000b129fa003 CR4: 00000000003726e0
Call Trace:
<TASK>
zap_lockdir_impl+0x2cd/0x3a0 [zfs]
zap_lockdir+0x92/0xb0 [zfs]
zap_lookup_norm+0x5c/0xd0 [zfs]
? dnode_special_open+0x4d/0x90 [zfs]
zap_lookup+0x16/0x20 [zfs]
zfs_get_zplprop+0xb7/0x1b0 [zfs]
setup_featureflags+0x21b/0x260 [zfs]
dmu_send_impl+0xdd/0xbf0 [zfs]
? dnode_rele+0x39/0x50 [zfs]
? dmu_buf_rele+0xe/0x20 [zfs]
? zap_unlockdir+0x46/0x60 [zfs]
dmu_send_obj+0x265/0x340 [zfs]
zfs_ioc_send+0xe8/0x2c0 [zfs]
? dump_bytes_cb+0x30/0x30 [zfs]
zfsdev_ioctl_common+0x683/0x740 [zfs]
? __check_object_size.part.0+0x4a/0x150
? _copy_from_user+0x2e/0x70
zfsdev_ioctl+0x57/0xf0 [zfs]
__x64_sys_ioctl+0x92/0xd0
do_syscall_64+0x59/0xc0
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f60a670eaff
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
RSP: 002b:00007ffe33c53e70 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000055d027f23bc0 RCX: 00007f60a670eaff
RDX: 00007ffe33c54300 RSI: 0000000000005a1c RDI: 0000000000000003
RBP: 00007ffe33c578f0 R08: 000055d027f27ba0 R09: 0000000000000000
R10: 0000000000000009 R11: 0000000000000246 R12: 000055d027f22660
R13: 00007ffe33c54300 R14: 000055d027f23bd0 R15: 0000000000000000
</TASK>
Modules linked in: tls nvme_fabrics xt_nat vhost_net vhost vhost_iotlb tap ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntr>
i2c_algo_bit drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops aesni_intel cec rc_core crypto_simd cryptd ixgbe nvme intel_lpss_pci drm xfrm_algo i2c_i801 a>
CR2: 0000000000000000
---[ end trace 63e074585d96fd8a ]---
RIP: 0010:mzap_open+0x37/0x330 [zfs]
Code: e5 41 57 49 89 f7 31 f6 41 56 41 55 41 54 53 48 89 d3 48 83 ec 10 48 8b 42 18 48 89 7d d0 48 c7 c2 a8 d3 e6 c0 bf 28 01 00 00 <4c> 8b 30 48 8b 40 08 48 89 45 c8 e8 89 1b 6a ff 48 c7 c2 0c 8d f2
RSP: 0018:ffffa6dd8ed937c8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8dc0d6897000 RCX: 00000000000001a1
RDX: ffffffffc0e6d3a8 RSI: 0000000000000000 RDI: 0000000000000128
RBP: ffffa6dd8ed93800 R08: 0000000000000000 R09: 0020000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8dbfc4cef800
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
FS: 00007f60a5e8f7c0(0000) GS:ffff8dc67f640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000b129fa003 CR4: 00000000003726e0
Four days ago I upgraded from ubuntu 20.04 with stock ZFS 0.8 which never had any issue in more than 2 years of heavy usage (snapshots of VM zvol devices sent to backup server every 15 minutes).
After the upgrade, I did a "zpool upgrade -a". Everything seemed OK until this morning where I found the machine stuck while doing a regular "send".
Additional information : for two days prior to the crash, syncoid was failing to send snapshots of a particular path, more precisely the receiver was closing because of "invalid backup stream":
sanoid[1559281]: Sending incremental rpool/ROOT/ubuntu_wrbx68/var/log@autosnap_2022-10-11_01:00:06_hourly ... syncoid_marseille1_XXXovh1_2022-10-11:01:02
:04-GMT00:00 (~ 27.7 MB):
sanoid[1562717]: warning: cannot send 'rpool/ROOT/ubuntu_wrbx68/var/log@syncoid_marseille1_XXXovh1_2022-10-11:01:00:08-GMT00:00': Input/output error
zed[1562857]: eid=16005 class=data pool='rpool' priority=2 err=5 flags=0x180 bookmark=55451:0:0:869
zed[1562860]: eid=16006 class=data pool='rpool' priority=2 err=5 flags=0x180 bookmark=55451:0:0:12
zed[1562862]: eid=16007 class=authentication pool='rpool' priority=2 err=5 flags=0x80 bookmark=55451:0:0:12
zed[1562864]: eid=16008 class=data pool='rpool' priority=2 err=5 flags=0x80 bookmark=55451:0:0:12
sanoid[1562715]: cannot receive incremental stream: invalid backup stream
sanoid[1562719]: zstd: error 25 : Write error : Broken pipe (cannot write compressed block)
sanoid[1559281]: CRITICAL ERROR: zfs send -I 'rpool/ROOT/ubuntu_wrbx68/var/log'@'autosnap_2022-10-11_01:00:06_hourly' 'rpool/ROOT/ubuntu_wrbx68/var/log'@'syncoid_marseille1_XXXovh1_2022-10-11:01:02:04-GMT00:00' | pv -p -t -e -r -b -s 29023416 | zstd -3 | mbuffer -q -s 128k -m 16M 2>/dev/null | ssh -S /tmp/syncoid-ovh1backup@marseille1-1665450113 ovh1backup@marseille1 ' mbuffer -q -s 128k -m 16M 2>/dev/null | zstd -dc | sudo zfs receive -F '"'"'tank/CRYPT/OVH1/rpool/ROOT/ubuntu_wrbx68/var/log'"'"' 2>&1' failed: 256 at /usr/sbin/syncoid line 817.
Indeed, "zpool status" was reporting one snapshot of this path to be corrupted. I have since then started a scrub and while "zpool status" still reports a permanent error, it's now on an anonymous file and not anymore linked to the snapshot, furthermore syncoid does not fail anymore. Hopefully the interrupted "send" was the trigger of the kernel crash and I won't get it anymore anytime soon.
Encountered this today during a raw encrypted zfs send/recv using syncoid. File operations still work, but zpool/zfs commands are unresponsive.
Debian bullseye, using ZFS 2.1.9 and Linux kernel 6.0.12 from backports
zfs-2.1.9-1~bpo11+1
zfs-kmod-2.1.9-1~bpo11+1
[72358.039249] BUG: kernel NULL pointer dereference, address: 0000000000000000
[72358.039266] #PF: supervisor read access in kernel mode
[72358.039273] #PF: error_code(0x0000) - not-present page
[72358.039280] PGD 0 P4D 0
[72358.039286] Oops: 0000 [#1] PREEMPT SMP NOPTI
[72358.039293] CPU: 2 PID: 1039602 Comm: zfs Tainted: P OE 6.0.0-0.deb11.6-amd64 #1 Debian 6.0.12-1~bpo11+1
[72358.039305] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 6042 04/28/2022
[72358.039315] RIP: 0010:zap_lockdir_impl+0x29e/0x750 [zfs]
[72358.039383] Code: 86 d8 00 00 00 00 00 00 00 e8 ce a6 14 db 4d 89 a6 98 00 00 00 e9 48 fe ff ff 48 8b 43 18 b9 a1 01 00 00 31 f6 bf 28 01 00 00 <48> 8b 10 48 8b 40 08 48 89 54 24 20 48 c7 c2 a8 58 69 c1 48 89 44
[72358.039398] RSP: 0018:ffffa970d57e3850 EFLAGS: 00010246
[72358.039406] RAX: 0000000000000000 RBX: ffff96927bb6c780 RCX: 00000000000001a1
[72358.039414] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000128
[72358.039422] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
[72358.039430] R10: 2000000000000000 R11: 0000008000000000 R12: ffff9692a17cb800
[72358.039439] R13: 0000000000000001 R14: 0000000000000000 R15: ffffa970d57e3980
[72358.039448] FS: 00007f218d70a7c0(0000) GS:ffff9697ae480000(0000) knlGS:0000000000000000
[72358.039457] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[72358.039464] CR2: 0000000000000000 CR3: 000000084dd04000 CR4: 0000000000750ee0
[72358.039472] PKRU: 55555554
[72358.039477] Call Trace:
[72358.039483] <TASK>
[72358.039491] ? zio_wait+0x1fc/0x2b0 [zfs]
[72358.039544] zap_lockdir+0x90/0xc0 [zfs]
[72358.039584] ? _raw_spin_unlock+0x15/0x30
[72358.039592] zap_lookup+0x48/0x100 [zfs]
[72358.039629] zfs_get_zplprop+0xb3/0x1a0 [zfs]
[72358.039671] dmu_send_impl+0xe56/0x1450 [zfs]
[72358.039708] ? _raw_spin_unlock_irqrestore+0x23/0x40
[72358.039716] ? dsl_dataset_disown+0x90/0x90 [zfs]
[72358.039753] ? taskq_dispatch_ent+0x101/0x1b0 [spl]
[72358.039763] ? preempt_count_add+0x70/0xa0
[72358.039771] ? _raw_spin_lock+0x13/0x40
[72358.039777] ? _raw_spin_unlock+0x15/0x30
[72358.039783] ? dbuf_rele_and_unlock+0x20a/0x690 [zfs]
[72358.039818] dmu_send_obj+0x252/0x340 [zfs]
[72358.039852] ? ttwu_queue_wakelist+0xbc/0x100
[72358.039862] zfs_ioc_send+0xe8/0x2d0 [zfs]
[72358.039905] ? zfs_ioc_send+0x2d0/0x2d0 [zfs]
[72358.039941] zfsdev_ioctl_common+0x750/0x9b0 [zfs]
[72358.039977] ? kmalloc_large_node+0x70/0x80
[72358.039985] ? __kmalloc_node+0x2cc/0x390
[72358.039991] zfsdev_ioctl+0x53/0xe0 [zfs]
[72358.040032] __x64_sys_ioctl+0x8b/0xc0
[72358.040039] do_syscall_64+0x3b/0xc0
[72358.040050] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[72358.040054] RIP: 0033:0x7f218d9f85f7
[72358.040059] Code: 00 00 00 48 8b 05 99 c8 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 c8 0d 00 f7 d8 64 89 01 48
[72358.040074] RSP: 002b:00007ffd2e8f2ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[72358.040083] RAX: ffffffffffffffda RBX: 0000563e40b1bcf0 RCX: 00007f218d9f85f7
[72358.040091] RDX: 00007ffd2e8f2be0 RSI: 0000000000005a1c RDI: 0000000000000003
[72358.040099] RBP: 00007ffd2e8f65d0 R08: 000000000000000f R09: 0000000000000000
[72358.040407] R10: 0000000000000009 R11: 0000000000000246 R12: 0000563e40ba7910
[72358.040673] R13: 00007ffd2e8f2be0 R14: 0000563e40b1bd00 R15: 0000000000000000
[72358.040917] </TASK>
[72358.041144] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha sctp ip6_udp_tunnel udp_tunnel veth xt_nat xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype br_netfilter macvtap macvlan vhost_net vhost vhost_iotlb tap tun ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge lz4 lz4_compress zram stp llc overlay zsmalloc intel_rapl_msr intel_rapl_common amdgpu nls_ascii nls_cp437 vfat amd64_edac fat edac_mce_amd snd_usb_audio snd_hda_codec_hdmi gpu_sched drm_buddy snd_hda_intel zfs(POE) snd_usbmidi_lib drm_display_helper snd_intel_dspcfg snd_intel_sdw_acpi snd_rawmidi cec zunicode(POE) zzstd(OE) snd_hda_codec zlua(OE) kvm_amd snd_seq_device zavl(POE) rc_core mc snd_hda_core eeepc_wmi icp(POE) kvm snd_hwdep zcommon(POE) asus_wmi platform_profile battery ccp znvpair(POE) drm_ttm_helper snd_pcm
[72358.041177] sparse_keymap ledtrig_audio sp5100_tco ttm rapl rfkill spl(OE) joydev wmi_bmof pcspkr mxm_wmi efi_pstore k10temp watchdog snd_timer rng_core drm_kms_helper snd sg soundcore evdev button drm fuse configfs efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear vfio_pci vfio_pci_core irqbypass vfio_virqfd vfio_iommu_type1 vfio hid_generic raid1 usbhid md_mod hid crc32_pclmul crc32c_intel sd_mod ghash_clmulni_intel nvme nvme_core t10_pi xhci_pci ahci libahci xhci_hcd aesni_intel crc64_rocksoft_generic libata crypto_simd e1000e igb cryptd usbcore i2c_algo_bit scsi_mod dca crc64_rocksoft ptp crc_t10dif i2c_piix4 crct10dif_generic crct10dif_pclmul crc64 pps_core crct10dif_common usb_common scsi_common video gpio_amdpt wmi gpio_generic
[72358.043553] CR2: 0000000000000000
[72358.044164] ---[ end trace 0000000000000000 ]---
[72358.741371] RIP: 0010:zap_lockdir_impl+0x29e/0x750 [zfs]
[72358.741371] Code: 86 d8 00 00 00 00 00 00 00 e8 ce a6 14 db 4d 89 a6 98 00 00 00 e9 48 fe ff ff 48 8b 43 18 b9 a1 01 00 00 31 f6 bf 28 01 00 00 <48> 8b 10 48 8b 40 08 48 89 54 24 20 48 c7 c2 a8 58 69 c1 48 89 44
[72358.741371] RSP: 0018:ffffa970d57e3850 EFLAGS: 00010246
[72358.741371] RAX: 0000000000000000 RBX: ffff96927bb6c780 RCX: 00000000000001a1
[72358.741371] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000128
[72358.741371] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
[72358.741371] R10: 2000000000000000 R11: 0000008000000000 R12: ffff9692a17cb800
[72358.741371] R13: 0000000000000001 R14: 0000000000000000 R15: ffffa970d57e3980
[72358.741371] FS: 00007f218d70a7c0(0000) GS:ffff9697ae480000(0000) knlGS:0000000000000000
[72358.741371] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[72358.741371] CR2: 0000000000000000 CR3: 000000084dd04000 CR4: 0000000000750ee0
[72358.741371] PKRU: 55555554
FWIW, I haven't had any more panics since my last report. Currently running zfs v2.1.13-pve1, kernel 6.2.16-19-pve. Still running sanoid taking snapshots every 15 minutes and syncoid once a day.
System information
Describe the problem you're observing
The bug happened after a nightly syncoid run (that uses zfs send and zfs receive) which froze the filesystem until the system was rebooted.
Describe how to reproduce the problem
I could not reproduce the bug on a newer kernel/zfs version (kernel 5.4.106-1-pve, zfs 2.0.4-pve1), but the bug happens only sometimes, so I'll see over time if it still occurs.
Include any warning/errors/backtraces from the system logs