openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.7k stars 1.76k forks source link

Linux 'kernel NULL pointer dereference' on zfs send of a specific encrypted snapshot #12275

Closed t-m-w closed 2 years ago

t-m-w commented 3 years ago

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 21.04
Linux Kernel 5.11.0-22-generic
Architecture amd64
ZFS Version 2.0.2-1ubuntu5
SPL Version 2.0.2-1ubuntu5

UPDATE: The version of zfs is the official Ubuntu version. I initially posted this when I was using a ppa; I've since returned to the official Ubuntu zfs and experience the same problem, but I've provided new logs and info in this post after changing versions. I've also provided a download for an affected zpool.

Describe the problem you're observing

Every time I run a zfs send that involves a particular encrypted snapshot, I receive a NULL pointer dereference in the kernel. If also using zfs receive, it stops all future-executed zfs commands from responding and prevents the computer from rebooting cleanly. In some cases, the involved send process may be 'Killed'. The pool remains busy and cannot be exported, and further instability/hung behavior may result.

Describe how to reproduce the problem

Download the compressed sparse image of my affected zpool here: https://drive.google.com/file/d/1SFL-OMob8DRTJJIsq2O77SkLy3knptaJ/view?usp=sharing sha256sum: afbc980346d75808daf8cc7a3d85a27290c09c35010b39670103d91c43afcfc3

  1. Ensure you are prepared for a possible crash/hang/downtime. Use on a test system if possible.
  2. Extract the image to a filesystem that supports sparse files and to a location only writable by root, e.g.: sudo tar -C /root -xvf rpool_borked.tar.xz rpool_borked.img
  3. Import the pool: sudo zpool import -N rpool_borked -o readonly=on -d /root/rpool_borked.img
  4. (recommended) Monitor dmesg in another terminal: sudo dmesg --follow
  5. Attempt to use zfs send on an affected snapshot: sudo zfs send -v -w rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348 > send.bin
  6. You may see repeated messages of showing "16.0E" having been sent (obviously not true) and should see the BUG in dmesg.
  7. Cancel the send if it continues running, and try to export the pool: zpool export rpool_borked
  8. The pool is busy (stuck).

Further notes

Include any warning/errors/backtraces from the system logs

Kernel output, original server, from a zfs send performed from within dropbear-initramfs (minimal environment):

BUG: kernel NULL pointer dereference, address: 0000000000000030
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] SMP NOPTI
CPU: 10 PID: 1768 Comm: zfs Tainted: P           OE     5.11.0-22-generic #23-Ubuntu
Hardware name: System manufacturer System Product Name/TUF B450M-PLUS GAMING, BIOS 3002 03/11/2021
RIP: 0010:dmu_dump_write+0x24a/0x320 [zfs]
Code: 8b 45 78 48 89 43 50 e9 9f fe ff ff 45 85 c0 75 19 41 8b 45 34 45 89 ce 83 e0 7f 88 43 32 49 63 c1 48 89 43 60 e9 6d fe ff ff <49> 83 7d 30 00 78 04 80 4b 31 02 48 8d 53 70 48 8d 73 68 4c 89 ef
RSP: 0018:ffff9c82ce06b7b0 EFLAGS: 00010206
RAX: 7171c62bac0f4992 RBX: ffff8d215a07da00 RCX: 0000000000000000
RDX: 000000000000001d RSI: 0000000000000013 RDI: ffff8d215a07db38
RBP: ffff9c82ce06b7f8 R08: 0000000001000000 R09: 0000000000020000
R10: 000000000000001d R11: 0000000000020000 R12: ffff9c82ce06b930
R13: 0000000000000000 R14: 0000000000020000 R15: 0000000000000000
FS:  00007f43b58da7c0(0000) GS:ffff8d284ea80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 000000012c8f6000 CR4: 00000000003506e0
Call Trace:
 ? kfree+0x3bc/0x3e0
 ? spl_kmem_free_impl+0x25/0x30 [spl]
 do_dump+0x372/0x510 [zfs]
 ? test_ti_thread_flag+0x12/0x20 [zfs]
 ? test_tsk_thread_flag+0x19/0x20 [zfs]
 dmu_send_impl+0x6e9/0xbf0 [zfs]
 ? dbuf_rele+0x39/0x50 [zfs]
 dmu_send+0x4f7/0x830 [zfs]
 ? _cond_resched+0x1a/0x50
 ? _cond_resched+0x1a/0x50
 ? slab_pre_alloc_hook.constprop.0+0x96/0xe0
 ? __kmalloc_node+0x144/0x2b0
 ? kvmalloc_node+0x79/0x80
 ? i_get_value_size+0x1d/0x1c0 [znvpair]
 ? nvpair_value_common+0x9a/0x160 [znvpair]
 zfs_ioc_send_new+0x170/0x1b0 [zfs]
 ? dump_bytes_cb+0x30/0x30 [zfs]
 zfsdev_ioctl_common+0x25f/0x710 [zfs]
 zfsdev_ioctl+0x57/0xe0 [zfs]
 __x64_sys_ioctl+0x91/0xc0
 do_syscall_64+0x38/0x90
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f43b60a7ecb
Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 1f 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc4570b4f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43b60a7ecb
RDX: 00007ffc4570b510 RSI: 0000000000005a40 RDI: 0000000000000005
RBP: 0000000000005a40 R08: 0000000000000001 R09: 000055aff5607090
R10: 000055aff5616f10 R11: 0000000000000246 R12: 000055aff5603100
R13: 00007ffc4570b510 R14: 0000000000005a40 R15: 000055aff5607090
Modules linked in: dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) r8169 realtek dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c raid1 uas hid_generic usb_storage usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper r8168(OE) nvme ahci xhci_pci gpio_amdpt i2c_piix4 nvme_core libahci xhci_pci_renesas wmi gpio_generic
CR2: 0000000000000030
---[ end trace 888052c159ddad61 ]---
RIP: 0010:dmu_dump_write+0x24a/0x320 [zfs]
Code: 8b 45 78 48 89 43 50 e9 9f fe ff ff 45 85 c0 75 19 41 8b 45 34 45 89 ce 83 e0 7f 88 43 32 49 63 c1 48 89 43 60 e9 6d fe ff ff <49> 83 7d 30 00 78 04 80 4b 31 02 48 8d 53 70 48 8d 73 68 4c 89 ef
RSP: 0018:ffff9c82ce06b7b0 EFLAGS: 00010206
RAX: 7171c62bac0f4992 RBX: ffff8d215a07da00 RCX: 0000000000000000
RDX: 000000000000001d RSI: 0000000000000013 RDI: ffff8d215a07db38
RBP: ffff9c82ce06b7f8 R08: 0000000001000000 R09: 0000000000020000
R10: 000000000000001d R11: 0000000000020000 R12: ffff9c82ce06b930
R13: 0000000000000000 R14: 0000000000020000 R15: 0000000000000000
FS:  00007f43b58da7c0(0000) GS:ffff8d284ea80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 000000012c8f6000 CR4: 00000000003506e0

On a different, fully-booted system, I also experienced this problem using the provided image as described in the reproduction steps, but piped to zstreamdump: zfs send -v -w rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348 | zstreamdump -vvvvv Output of that command is here:

full send of rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348 estimated size is 71.9M
total estimated size is 71.9M
BEGIN record
    hdrtype = 1
    features = 1420004
    magic = 2f5bacbac
    creation_time = 6097e858
    type = 2
    flags = 0xc
    toguid = 7171c62bac0f4992
    fromguid = 0
    toname = rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348
    payloadlen = 1028

nvlist version: 0
    crypt_keydata = (embedded nvlist)
    nvlist version: 0
        DSL_CRYPTO_SUITE = 0x8
        DSL_CRYPTO_GUID = 0xb80816514de8587d
        DSL_CRYPTO_VERSION = 0x1
        DSL_CRYPTO_MASTER_KEY_1 = 0xe3 0x62 0x66 0x92 0x94 0xfc 0x1c 0xb7 0xdf 0xc1 0xa2 0x84 0xd6 0x35 0x33 0xa8 0xcd 0xc8 0x0 0x27 0x61 0x38 0xe6 0xca 0xcb 0x9c 0xe1 0x5b 0x40 0xcf 0xff 0x8a
        DSL_CRYPTO_HMAC_KEY_1 = 0x57 0x82 0xa9 0x6 0x20 0xe5 0xd3 0x66 0xd5 0xd 0xd5 0xa0 0x16 0x60 0x35 0xaf 0xb 0x76 0x88 0x93 0x44 0x79 0x4c 0x7d 0xfa 0xc6 0x39 0xb2 0x61 0x6b 0xc2 0xf 0xdf 0x76 0xcf 0x27 0x68 0xfe 0xcb 0x46 0x77 0x50 0x5a 0xec 0xac 0x2a 0xe3 0xa3 0xfd 0x9b 0xe 0x29 0x31 0x10 0x79 0x9 0xff 0x65 0x35 0x93 0x5e 0x1e 0x5f 0xb1
        DSL_CRYPTO_IV = 0xe8 0x82 0x47 0x5 0x8a 0x31 0x6f 0x3a 0xa 0x7 0x2a 0xed
        DSL_CRYPTO_MAC = 0xc8 0x58 0x7d 0x21 0x3a 0xea 0x6b 0xe9 0xcf 0x12 0x8e 0x37 0xd9 0x84 0xda 0x51
        portable_mac = 0xea 0x53 0x9f 0x72 0x86 0xea 0xef 0x42 0x3f 0x5d 0x36 0x9e 0x1c 0x13 0x62 0x7c 0xfb 0x2f 0x7f 0xbd 0xbd 0x5b 0x35 0x7e 0xdb 0x49 0x80 0x78 0x71 0x81 0x34 0x42
        keyformat = 0x3
        pbkdf2iters = 0x14dc9380
        pbkdf2salt = 0x24d5add546f84cac
        mdn_checksum = 0x0
        mdn_compress = 0x0
        mdn_nlevels = 0x6
        mdn_blksz = 0x4000
        mdn_indblkshift = 0x11
        mdn_nblkptr = 0x3
        mdn_maxblkid = 0x1c
        to_ivset_guid = 0x49bc398dde1c52
        from_ivset_guid = 0x0
    (end crypt_keydata)

OBJECT_RANGE firstobj = 0 numslots = 32 flags = 0 salt = 46607763babffc46 iv = 34ad819587cbef7f7c1b4a7c mac = 8487d105b30bbea8c108e3764608f027
    checksum = 384d689c38/3139fc7db3d6/19b9da0b79baf7/9feae235d15572c
OBJECT object = 1 type = 21 bonustype = 0 blksz = 512 bonuslen = 0 dn_slots = 1 raw_bonuslen = 0 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 3
    checksum = 3b277baabc/4326e3804e9d/2b7a855c3a1107/146ef0b54ca78d8a
FREE object = 1 offset = 512 length = -1
    checksum = 400d750301/56761feb6918/42e47f5f717217/2524a89810d0e178
WRITE object = 1 type = 21 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 0000000000000000 iv = 000000000000000000000000 mac = 547abf58d088ea6bfd3f39a74e2d5f7d
    checksum = 472867d026/6b69b6254dcf/6066e6d6401bd2/3dea28c449cf1fe0
FREE object = 1 offset = 512 length = 1024
    checksum = 51d28ab827/aafc4693a693/d00accf192fccb/b52d56b9ab6ec1fb
OBJECT object = 2 type = 20 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 565c2bf77a/c52a86f4759a/10829885607258b/fd056031ce4eb2a1
FREE object = 2 offset = 512 length = -1
    checksum = 84aedf362d/10ec0434e6014/19726d05389cd44/ca145cb48ec14bcc
WRITE object = 2 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = 70742bd9538349f5807cbbf9 mac = bbc1a475739c8c5022dec82ab978a933
    checksum = 9022e51b79/139907acede6a/1f026f511c5d37f/5388961b5e191f2c
OBJECT object = 3 type = 20 bonustype = 44 blksz = 512 bonuslen = 168 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = d66159a732/1d2c0aab4edb4/3260a7561492fcc/57ac1d8bd1a402e3
FREE object = 3 offset = 512 length = -1
    checksum = 106b2641bd0/26c0fde23eedd/47414a1ac580c63/abf80dce12dbc989
WRITE object = 3 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = ee618a7e611085d9bfa715dd mac = e82f58c6f179231793b67c0e1d86e770
    checksum = 1111b6516da/2be504f530a62/53d9c96fa73c466/25cbacf81f4d6855
OBJECT object = 4 type = 20 bonustype = 44 blksz = 512 bonuslen = 168 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 1560c39977f/3be3629120111/7d5ebcf98203c02/5c067ded70354dbf
FREE object = 4 offset = 512 length = -1
    checksum = 17ea153e4d4/4a28ba048ea00/a6ac7fb7a8e6d69/f7fdfa00dbe9f10d
WRITE object = 4 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = dd01361ce31dcc921195f099 mac = 0996a1294aa4cbffab735aeadf293972
    checksum = 18bb45189f1/519e3ca927504/be69c895ea9bd7d/5e3581bf88066798
OBJECT object = 5 type = 19 bonustype = 44 blksz = 512 bonuslen = 168 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 1d086366ade/67d1d85c53d2d/108cf96c2f5827ac/bfc20d2551a95096
FREE object = 5 offset = 512 length = -1
    checksum = 1fc732b0204/7aece8c58113b/14eb9ede6770bd04/4868c14c69d15514
WRITE object = 5 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = 424ac76c0a9e29af5fe13d0f mac = fae90754bdfcaad02bb05511deb2ce2d
    checksum = 205d3f23f03/84bb5fe5790a0/175b031146ad7d77/72c223f24623af0
OBJECT object = 6 type = 19 bonustype = 44 blksz = 512 bonuslen = 168 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 24a33e62f5d/a10891a531ac8/1ebad1a74fb43ea7/b517f3b56e483acf
FREE object = 6 offset = 512 length = -1
    checksum = 276a36d21d2/b8d5fd694eef0/2565bd191ebf6821/b256da994a4314a6
WRITE object = 6 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = 7da2467122025725273408d9 mac = 70122a51ca23ada5101794465f08eaa4
    checksum = 2817c1a693d/c4fcb22321ade/2908b84512ce7fa7/a57af1dfb1d7b5d4
OBJECT object = 7 type = 20 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 2be82987611/e736ba1f5863c/33ca82cd3793211c/e63449772a2d5855
FREE object = 7 offset = 512 length = -1
    checksum = 2e7d4e77712/10369e6810977e/3d40675078fb6186/beefeb0f82b70e67
WRITE object = 7 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = ad6d42a62ebba2c77b96e4c1 mac = abe76e31e3e6c613f0ae34483f1996f4
    checksum = 2f379fd8e4b/111bb805d43b5d/42544c34a6ea353a/2f5b3c9545801327
OBJECT object = 8 type = 20 bonustype = 44 blksz = 512 bonuslen = 168 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 32fca7f6943/139bccfe9a64a9/5117bbe67eca4dfd/648f15a56364a6b3
FREE object = 8 offset = 512 length = -1
    checksum = 35ead11d9a6/15a79f15da4b6b/5dd2f7b97297b516/54ad5d10d43ec202
WRITE object = 8 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = 286692f4f6821816b6e7b229 mac = 5fc1c8bbde9e33d9c04de97f7e3d0c57
    checksum = 36ab57a87ab/16b0f85d62ca6b/6494e08c58049faf/f3d278ebbfa0459b
OBJECT object = 9 type = 19 bonustype = 44 blksz = 512 bonuslen = 168 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 3ac5d946f5c/1994154e68b704/77fe298b9b672432/94b0a352238258af
FREE object = 9 offset = 512 length = -1
    checksum = 3ddac37df79/1bed5983725fb3/888089799af90c3a/b0e8ce8c463ce73a
WRITE object = 9 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = 7b5dfdfa541eb355a5775492 mac = b0229b881a091fd4f5d74aa9014ef7db
    checksum = 3eb49b5547e/1d1dafebf54b92/9131b1a647b668cf/9d0d65ae073b8211
OBJECT object = 10 type = 19 bonustype = 44 blksz = 512 bonuslen = 168 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 42df8be29c6/20679e4f1e2689/a9efccf4e32ef83b/465e06daa0e83c6e
FREE object = 10 offset = 512 length = -1
    checksum = 45afe2d05a1/230fc59c1e3a28/bec17945ad1208b9/89d8986d5a405e0a
WRITE object = 10 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = ff7d41efd84ba802 iv = 0f8f07ff3d9b298e52ddac42 mac = e0b1ed3151271d11f90cf9e83cf32e39
    checksum = 46739097b85/24662e7875ae9a/c9a5043315fe67c9/544119c057b5274a
OBJECT object = 11 type = 20 bonustype = 44 blksz = 16384 bonuslen = 184 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 1 indblkshift = 17 nlevels = 2 nblkptr = 1
    checksum = 4abda2cb41d/28154c47dffabb/e86850f7ba8afb22/d3fef9900a801f2f
FREE object = 11 offset = 32768 length = -1
    checksum = 4d845eb7d22/2b0a9294373318/20f1e7ddc858b02/239909b29c14f9a3
WRITE object = 11 type = 20 checksum type = 7 compression type = 15 flags = 0 offset = 0 logical_size = 16384 compressed_size = 4096 payload_size = 4096 props = 8f0007001f salt = 46607763babffc46 iv = 636e721c51ad7b18ab260b92 mac = 269445ad389b71675ba164405a2867db
    checksum = 4e34009a14a/2c86c404c3d59f/f66e87718a74e6c/d04ed0bf88561b58
WRITE object = 11 type = 20 checksum type = 7 compression type = 15 flags = 0 offset = 16384 logical_size = 16384 compressed_size = 4096 payload_size = 4096 props = 8f0007001f salt = 46607763babffc46 iv = 11e72402d96a4e45b3122667 mac = bac5146efa3e6e14de55d765c2b22520
    checksum = 6ef45aa67f9/4623b90c0b565b/2f94ac2f6670076/4d1cb6c578c0b96f
FREE object = 11 offset = 32768 length = 16744448
    checksum = 8f7cb512f37/688aa2e8a2498a/77ad116477ecfcec/e5fc473a36d3ed0a
OBJECT object = 12 type = 20 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 8fca99a9330/6b477c75a0e4be/97f36e47de5a3989/5226f85d754cbd77
FREE object = 12 offset = 512 length = -1
    checksum = 9288b71fab2/70e6444b622caf/dbe6d8f453c8081f/6c61906907956b8
WRITE object = 12 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = 331c17e7acecb221fb8c38c1 mac = 1cd7b417392083d217b633609894f966
    checksum = 932bdd1c537/73b2c5f1d575c5/febb79cb7981b662/615fef47ca97cc35
OBJECT object = 13 type = 19 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 9753f957976/7b3d8ca1d3d7a8/5ede052cb7d5b4bf/d1e70329cc37f996
FREE object = 13 offset = 512 length = -1
    checksum = 9a724a30628/81296d5a5cd578/acc3048f13e54da9/5f5facd23d424d87
WRITE object = 13 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = 4ce2020751e3d73085a0c9b0 mac = 440345775e1e1eca502470822f72cf0c
    checksum = 9af9346a8e8/841c1b009887c5/d49202bc5e700829/2164a5cb54b174e5
OBJECT object = 14 type = 19 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 9effb16c33b/8c0994cc333cd8/42112de3d1e140b2/62cd926b0d3a559
FREE object = 14 offset = 512 length = -1
    checksum = a21bd20357f/92417bb5751a71/9a6bc6ac134e0f30/a54472115716f50
WRITE object = 14 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = 60d812e280465eb09d4769ac mac = cfeaa090a1e10206efd0b7d5f0e3c08c
    checksum = a2bfcab2a8e/9559dc25b0c042/c775e6643324c6a7/4196e1dbda4df9b
OBJECT object = 15 type = 20 bonustype = 44 blksz = 16384 bonuslen = 184 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 1 indblkshift = 17 nlevels = 2 nblkptr = 1
    checksum = a6e569d9cec/9dac7cd989f78d/42fd88e975471ae9/3396357e98a0800
FREE object = 15 offset = 32768 length = -1
    checksum = a9cc56912b1/a430800e3bccce/a652630a52c77ee5/ff8f9b9db2be31b5
WRITE object = 15 type = 20 checksum type = 7 compression type = 15 flags = 0 offset = 0 logical_size = 16384 compressed_size = 4096 payload_size = 4096 props = 8f0007001f salt = 7f31ff01ee1d2f3a iv = dde9907eec9090193839ee8b mac = c3cf1a60e05226de14721033b91b237e
    checksum = aa72eff68e3/a76e8958e06a98/d8d92775275ffcb6/720930e4cfdcc1b2
WRITE object = 15 type = 20 checksum type = 7 compression type = 15 flags = 0 offset = 16384 logical_size = 16384 compressed_size = 4096 payload_size = 4096 props = 8f0007001f salt = 7f31ff01ee1d2f3a iv = 268a9ba3f2a32dd61bb77e49 mac = 23cf45ddcc0f4023763801f92c51d8d7
    checksum = ca9668691c4/d9c54585dd81e4/12c6d47f5e0bd3fd/755d20cefbeb4913
FREE object = 15 offset = 32768 length = 16744448
    checksum = ea6d085a6e9/114cdd1c8ad5347/382a971160081b3b/ac008b27805c1b0d
OBJECT object = 16 type = 20 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = eab4cdfeb1f/11945e2cc82d145/8d31cf8a65caef9d/debe398762b15d59
FREE object = 16 offset = 512 length = -1
    checksum = ed74b8ff3d2/12265e61c812551/3d9ec82ccb287ade/8322778a5c77c4ec
WRITE object = 16 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = 8e44890d1e320193743fb2b4 mac = 37ee80a4986689434aede5ba3f516797
    checksum = ee1707dac4a/126ed9591b6beea/96ccad02cbe423cc/31fa0c0d5e5f0aa
OBJECT object = 17 type = 19 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = f2064fc8417/133086ed3ee8046/8900f9a5d29ea4b0/a0a9a0ade08d656a
FREE object = 17 offset = 512 length = -1
    checksum = f5483b74e10/13c74c74aa21f89/496b85d95be19cea/a3ba7d750289c1f3
WRITE object = 17 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = c1b656f68c55453c13030722 mac = 0b2718d236174684f62f4fd803d5520c
    checksum = f5e508deee9/141228c0a1206d0/aa8fd449930efc23/f653561cf7dc789a
OBJECT object = 18 type = 19 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = fa0ee57192c/14da401539d147c/b203e74892089b4e/c9f83ae84d5893a
FREE object = 18 offset = 512 length = -1
    checksum = fd1ae2453a4/1575e3672f9dbf7/82f2bffcc88d9bbd/7e1c7b93afc0c022
WRITE object = 18 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = 32d1f45b030d15d9a55c9b70 mac = e1f09825da2f07759429bbe853c68293
    checksum = fdc998b08ff/15c327266fbabb6/ec501da2fe7857c9/9c6ab900c44a65ee
OBJECT object = 19 type = 19 bonustype = 44 blksz = 131072 bonuslen = 184 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 10 indblkshift = 17 nlevels = 2 nblkptr = 1
    checksum = 102153d6e781/1691b3ef33999eb/9b4cba12b12350b/74a5a1f6a04f13f7
FREE object = 19 offset = 1441792 length = -1
    checksum = 104e05e6c85a/17323560634f158/ebaf99bee02cf334/4e0d8b41f4fc7154
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = 1d4df7f86b07f2f6d4cbb3ca mac = 4ae2f8e065f7cf3bb2812f52f1f2ac3f
    checksum = 105b78823dcd/1781dee60b18bed/5d88ea47555ce721/a4b9f9d2002f8bc6
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 131072 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = bf77190558870cef16705417 mac = c9661cbc35cb809541868931d052e204
    checksum = 50846e604423/19cdadfb5f2ccabf/7005886b138c84cb/b71c48da704d678e
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 262144 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = 627bd3cb3f6705e4f15967e7 mac = 7bbc66f698b870404db1df7bfe1d43e5
    checksum = 906df3638e84/523ddeb07fea2bea/87c43493a5386b52/d64c05742cd08508
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 393216 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = 2017de0cd2de95041edc60d0 mac = 9f61535f8ba75086e39658cd98f33014
    checksum = d050e97d75d4/aaad193005106bed/848a4726dad639c7/d6dcfa1ecc19e226
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 524288 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = 1bc9c18d4026cba2aa970c0a mac = 2c461e7309bec4d0106031186bfb1e4a
    checksum = 110487b61f50f/231b90a6ec77991e/6a2aaf277902ad69/7723415e11024a0c
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 655360 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = a91ae31f67c34e3cc4d4d739 mac = 8cef0ecbf1f98a37d27a0021c79a1faf
    checksum = 1505e40384f72/bba4c2a10f1be0b0/1d77f7a13d01664b/44afad62e8ae35b2
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 786432 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = 94f3ba2367877675cade1ec2 mac = 97f1cadb9be94a42cbcec6cdf57d3aac
    checksum = 190519187e99d/744a1b576d938427/4f7890adc1a180e3/5f20685517321cd9
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 917504 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = 2f1ee6417f8e037ce80739fa mac = c41f8762cc215d764c45758473f84764
    checksum = 1d0d3d63e0658/4d260ed53a9dda65/fda7ce667ffabb04/dbb0b99cf969b1c8
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 1048576 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = 8e0e31eb5bb273c98faeee16 mac = 50dca7f5b6278816c69d4c891bff8930
    checksum = 210f27ae9c6f6/463650576bfc7982/12fcd5a7788f01ab/3cf7fc83db4dec06
WRITE object = 19 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 1179648 logical_size = 131072 compressed_size = 131072 payload_size = 131072 props = 8200ff00ff salt = 7f31ff01ee1d2f3a iv = 00f0e1a820f1494cfb2994c4 mac = 03af5218dc4b5379128883c720bfb3e4
    checksum = 250ec8337f91d/5f7567e7addda505/d54b6ec9eb5ebea6/91254b9e65b87101
WRITE object = 19 type = 19 checksum type = 7 compression type = 15 flags = 0 offset = 1310720 logical_size = 131072 compressed_size = 24576 payload_size = 24576 props = 8f002f00ff salt = 7f31ff01ee1d2f3a iv = 6b9f1c75b12aab9345650b31 mac = 57672839dc0ce8e6317d95ad1319ae53
    checksum = 290dc2418e652/98af5726cd0f0992/77b16773089ab56d/94ea2415ea652f1b
OBJECT object = 20 type = 20 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 29ce2bffec91d/d7a0a0e03218ca21/1fa8ca7dbe83de9d/285d42a39ce6fa4d
FREE object = 20 offset = 512 length = -1
    checksum = 29d12b53612ed/d93d89a820a65075/b504bbf4d3223bb8/79405fb1e2ee0387
WRITE object = 20 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = cfd4b6b1de787956815896fa mac = 8da443a6f13abb7c0f6a81e7f2ce4613
    checksum = 29d1f84f58ad2/da09682be74ae873/5396954d39431aa/6d21575894db4ed5
OBJECT object = 21 type = 19 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 29d6003350377/dc23f9fee741e601/52855a8968ad2ce2/d1e8f4b7883a5f97
FREE object = 21 offset = 512 length = -1
    checksum = 29d8adc60395f/ddc12df91c740df2/b109db8b1143bd82/fca4211e99e91398
WRITE object = 21 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = 5efc00d499398198e34f7274 mac = 5ea9b91a35823166c520496b5ab2ef15
    checksum = 29d94c3398130/de8d308a18240cc9/616030ddd3617f89/efed28fb38c240c
OBJECT object = 22 type = 19 bonustype = 44 blksz = 1024 bonuslen = 184 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 29dd9f4a4b324/e0a824350184c757/50debbabc19e19e1/a9213185c4409588
FREE object = 22 offset = 1024 length = -1
    checksum = 29e05eed613a7/e245a3949913b548/790ca0f4bd630f86/6cbcdec064a5e84f
WRITE object = 22 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 1024 compressed_size = 1024 payload_size = 1024 props = 8200010001 salt = 7f31ff01ee1d2f3a iv = 1bc2f17960d96cf95d82ad79 mac = 9fa3e9bf60f18c0d28607e2a5b8e9286
    checksum = 29e11b4332e38/e311cc08d07501bf/89c4a02d79578c5a/ceb65ca2fd507a68
OBJECT object = 23 type = 20 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 29e98d9984682/e67c716e5e948b28/72e88365846b33c/8a2791f763861f96
FREE object = 23 offset = 512 length = -1
    checksum = 29ec809840570/e81a680468127e2b/c888fc44bd35e5be/e08fb0c837baa29a
WRITE object = 23 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = ad3c2fda150263a090e71c78 mac = 4fb06a82aa9f112230a9a93545a33ead
    checksum = 29ed41814293a/e8e6cb9d449d0b8a/a01df35d3f67a7f4/a91014362f046f8e
OBJECT object = 24 type = 19 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 29f18dd2b83a9/eb02bf5f77c17ada/e41eba382dcf3b3b/dcbcb4a299d1b6ad
FREE object = 24 offset = 512 length = -1
    checksum = 29f4a98a6e030/eca10652d020cfc9/707630d98d745717/96684146e29f2541
WRITE object = 24 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = 7f31ff01ee1d2f3a iv = bec8d2947e731826848e310d mac = 98ce82bf10e93ba5f62518e0c8dac329
    checksum = 29f57979dbd11/ed6d91f8af8cd020/a915917bcfc64f8c/1faa294c5aa70af
OBJECT object = 25 type = 20 bonustype = 44 blksz = 512 bonuslen = 184 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 29f9c524373b0/ef89f04b33dd8ea2/91b540358729fb4d/75686787cf94aed5
FREE object = 25 offset = 512 length = -1
    checksum = 29fc9fe51c04c/f12886c941d127c7/e995c2eb5d8dcd87/8483f8f7825f7ed2
WRITE object = 25 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = a5f9e07196575f1a iv = ed035dfe8eee61731414deb6 mac = f0f6f75b3a64f48320f904fa8f2bc975
    checksum = 29fd819e82acb/f1f53961fa7bd7ab/838446287620e063/5b1d93ec4365300a
OBJECT object = 26 type = 20 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 2a01f0891bb3a/f411ff68db954f57/11764593cfb1d278/e4ce6b82dca29724
FREE object = 26 offset = 512 length = -1
    checksum = 2a04b238a6933/f5b0e58642c886d8/3568de9018e65dec/2a2400fec4114cd7
WRITE object = 26 type = 20 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = a5f9e07196575f1a iv = 8ad9118810b70b17f6bac47d mac = eeb2228edce00d5687e4b431fae82a31
    checksum = 2a053cf72e90b/f67dbe4d36b472d1/30ea2e472eee4669/a8eb1870f6a5b2d9
OBJECT object = 27 type = 19 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 2a097ae100bc6/f89ae5c74a448b97/64dea7ba775d5f38/535d8c50476d5558
FREE object = 27 offset = 512 length = -1
    checksum = 2a0ceb51b9a4f/fa3a1b155b0ba2e7/55679e28a27c2b42/994a231ea2f261c7
WRITE object = 27 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = a5f9e07196575f1a iv = b5f9b8af221422d56e58527e mac = 5c32440334f37365ed89b254df6b3d28
    checksum = 2a0d8de15ad73/fb071c5ff2dd8ac2/b2bd7f433a480ff8/6fabce71776dcd0b
OBJECT object = 28 type = 19 bonustype = 44 blksz = 512 bonuslen = 176 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 2a11ba90fbb3f/fd24ae9096024bde/8d66ef3342080786/72921cb8a83b7a62
FREE object = 28 offset = 512 length = -1
    checksum = 2a14bc6a4726d/fec43294458b759e/4b128cdb54ac05ba/6e603ad16791c7ff
WRITE object = 28 type = 19 checksum type = 7 compression type = 2 flags = 0 offset = 0 logical_size = 512 compressed_size = 512 payload_size = 512 props = 8200000000 salt = a5f9e07196575f1a iv = 963ba1b58556de7c91586f6f mac = b5fd1a5437fbc6940fd4f50a5aee3da9
    checksum = 2a156f20f47ff/ff915a1a339de41c/a8178b2b497000e/c1680a1935c6b1a0
OBJECT object = 29 type = 19 bonustype = 44 blksz = 131072 bonuslen = 184 dn_slots = 1 raw_bonuslen = 320 flags = 0 maxblkid = 0 indblkshift = 17 nlevels = 1 nblkptr = 1
    checksum = 2a195886545a9/1af4fb655f88bf3/8c9107777c0da2f5/8c6d618a9c57fbcc
FREE object = 29 offset = 193536 length = -1
    checksum = 2a1c57d270ee4/34f1e7d97d7d62c/17e34402e803ac3d/df15603f45d9b553
TIME        SENT   SNAPSHOT rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348
09:53:16   16.0E   rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348
09:53:17   16.0E   rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348
09:53:18   16.0E   rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348
09:53:19   16.0E   rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348
09:53:20   16.0E   rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348
^C

Its dmesg output is below:

[  330.763624] BUG: kernel NULL pointer dereference, address: 0000000000000030
[  330.763634] #PF: supervisor read access in kernel mode
[  330.763637] #PF: error_code(0x0000) - not-present page
[  330.763639] PGD 0 P4D 0 
[  330.763645] Oops: 0000 [#1] SMP PTI
[  330.763649] CPU: 0 PID: 8803 Comm: zfs Tainted: P           O      5.11.0-22-generic #23-Ubuntu
[  330.763653] Hardware name: LENOVO 2306CTO/2306CTO, BIOS CBET4000 4.12-3071-g8053595370-dirty 09/30/2020
[  330.763656] RIP: 0010:dmu_dump_write+0x24a/0x320 [zfs]
[  330.763818] Code: 8b 45 78 48 89 43 50 e9 9f fe ff ff 45 85 c0 75 19 41 8b 45 34 45 89 ce 83 e0 7f 88 43 32 49 63 c1 48 89 43 60 e9 6d fe ff ff <49> 83 7d 30 00 78 04 80 4b 31 02 48 8d 53 70 48 8d 73 68 4c 89 ef
[  330.763821] RSP: 0018:ffffc04b64e277b0 EFLAGS: 00010206
[  330.763823] RAX: 7171c62bac0f4992 RBX: ffff9c3e993c0000 RCX: 0000000000000000
[  330.763825] RDX: 000000000000001d RSI: 0000000000000013 RDI: ffff9c3e993c0138
[  330.763827] RBP: ffffc04b64e277f8 R08: 0000000001000000 R09: 0000000000020000
[  330.763828] R10: 000000000000001d R11: 0000000000020000 R12: ffffc04b64e27930
[  330.763830] R13: 0000000000000000 R14: 0000000000020000 R15: 0000000000000000
[  330.763832] FS:  00007f22a09df7c0(0000) GS:ffff9c3f75200000(0000) knlGS:0000000000000000
[  330.763834] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  330.763835] CR2: 0000000000000030 CR3: 000000019c1d8004 CR4: 00000000001706f0
[  330.763837] Call Trace:
[  330.763839]  ? wait_woken+0x80/0x80
[  330.763846]  do_dump+0x372/0x510 [zfs]
[  330.763949]  ? test_ti_thread_flag+0x12/0x20 [zfs]
[  330.764048]  ? test_tsk_thread_flag+0x19/0x20 [zfs]
[  330.764142]  dmu_send_impl+0x6e9/0xbf0 [zfs]
[  330.764235]  ? dbuf_rele+0x39/0x50 [zfs]
[  330.764317]  dmu_send+0x4f7/0x830 [zfs]
[  330.764406]  ? _cond_resched+0x1a/0x50
[  330.764409]  ? _cond_resched+0x1a/0x50
[  330.764411]  ? slab_pre_alloc_hook.constprop.0+0x96/0xe0
[  330.764415]  ? __kmalloc_node+0x144/0x2b0
[  330.764418]  ? kvmalloc_node+0x79/0x80
[  330.764422]  ? i_get_value_size+0x1d/0x1c0 [znvpair]
[  330.764432]  ? nvpair_value_common+0x9a/0x160 [znvpair]
[  330.764439]  zfs_ioc_send_new+0x170/0x1b0 [zfs]
[  330.764548]  ? dump_bytes_cb+0x30/0x30 [zfs]
[  330.764654]  zfsdev_ioctl_common+0x25f/0x710 [zfs]
[  330.764760]  zfsdev_ioctl+0x57/0xe0 [zfs]
[  330.764863]  __x64_sys_ioctl+0x91/0xc0
[  330.764867]  do_syscall_64+0x38/0x90
[  330.764870]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  330.764872] RIP: 0033:0x7f22a11acecb
[  330.764875] Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 1f 0d 00 f7 d8 64 89 01 48
[  330.764877] RSP: 002b:00007fffc7fc5628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  330.764880] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f22a11acecb
[  330.764881] RDX: 00007fffc7fc5640 RSI: 0000000000005a40 RDI: 0000000000000005
[  330.764883] RBP: 0000000000005a40 R08: 0000000000000001 R09: 000055d4296fee10
[  330.764884] R10: 0000000000000006 R11: 0000000000000246 R12: 000055d4296fef00
[  330.764886] R13: 00007fffc7fc5640 R14: 0000000000005a40 R15: 000055d4296fee10
[  330.764888] Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE nft_chain_nat nf_nat bridge stp llc cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg intel_rapl_msr soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep soundwire_bus intel_rapl_common snd_soc_core x86_pkg_temp_thermal intel_powerclamp coretemp snd_compress ac97_bus kvm_intel snd_pcm_dmaengine snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi btusb rapl btrtl intel_cstate snd_seq btbcm btintel iwlmvm bluetooth mac80211 ecdh_generic ecc libarc4 snd_seq_device snd_timer serio_raw joydev input_leds iwlwifi cfg80211 thinkpad_acpi at24 nvram ledtrig_audio snd mei_me mei soundcore mac_hid nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack
[  330.764946]  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter sch_fq_codel msr nf_tables libcrc32c parport_pc nfnetlink ppdev lp parport ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) dm_crypt r8152 usbnet mii uas usb_storage hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd i915 glue_helper psmouse i2c_i801 i2c_smbus i2c_algo_bit drm_kms_helper ahci libahci lpc_ich syscopyarea sysfillrect sysimgblt fb_sys_fops sdhci_pci cec cqhci rc_core xhci_pci sdhci e1000e drm xhci_pci_renesas video
[  330.764992] CR2: 0000000000000030
[  330.765006] ---[ end trace 63f11c04ef693c3f ]---
[  330.765008] RIP: 0010:dmu_dump_write+0x24a/0x320 [zfs]
[  330.765096] Code: 8b 45 78 48 89 43 50 e9 9f fe ff ff 45 85 c0 75 19 41 8b 45 34 45 89 ce 83 e0 7f 88 43 32 49 63 c1 48 89 43 60 e9 6d fe ff ff <49> 83 7d 30 00 78 04 80 4b 31 02 48 8d 53 70 48 8d 73 68 4c 89 ef
[  330.765099] RSP: 0018:ffffc04b64e277b0 EFLAGS: 00010206
[  330.765102] RAX: 7171c62bac0f4992 RBX: ffff9c3e993c0000 RCX: 0000000000000000
[  330.765105] RDX: 000000000000001d RSI: 0000000000000013 RDI: ffff9c3e993c0138
[  330.765107] RBP: ffffc04b64e277f8 R08: 0000000001000000 R09: 0000000000020000
[  330.765110] R10: 000000000000001d R11: 0000000000020000 R12: ffffc04b64e27930
[  330.765112] R13: 0000000000000000 R14: 0000000000020000 R15: 0000000000000000
[  330.765114] FS:  00007f22a09df7c0(0000) GS:ffff9c3f75200000(0000) knlGS:0000000000000000
[  330.765117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  330.765120] CR2: 0000000000000030 CR3: 000000019c1d8004 CR4: 00000000001706f0
[  336.754689] BUG: unable to handle page fault for address: ffffc04b64e27de0
[  336.754703] #PF: supervisor write access in kernel mode
[  336.754708] #PF: error_code(0x0002) - not-present page
[  336.754712] PGD 100000067 P4D 100000067 PUD 1001de067 PMD 15ece8067 PTE 0
[  336.754726] Oops: 0002 [#2] SMP PTI
[  336.754735] CPU: 0 PID: 8807 Comm: zfs Tainted: P      D    O      5.11.0-22-generic #23-Ubuntu
[  336.754742] Hardware name: LENOVO 2306CTO/2306CTO, BIOS CBET4000 4.12-3071-g8053595370-dirty 09/30/2020
[  336.754747] RIP: 0010:arch_atomic64_cmpxchg.constprop.0+0x2/0x10 [zfs]
[  336.755108] Code: c6 40 18 a7 c0 41 bc 03 00 00 00 e8 78 85 f3 ff e9 98 fe ff ff e8 2e 7a e8 d3 e9 df d9 0a 00 e9 da d9 0a 00 0f 1f 40 00 31 c0 <f0> 48 0f b1 07 c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 c7
[  336.755115] RSP: 0018:ffffc04b400d7e10 EFLAGS: 00010246
[  336.755121] RAX: 0000000000000000 RBX: ffff9c3f0d810000 RCX: 0000000000000000
[  336.755126] RDX: ffff9c3edcf89840 RSI: ffffffffc0a689e0 RDI: ffffc04b64e27de0
[  336.755130] RBP: ffffc04b400d7e58 R08: 0000000000000007 R09: ffff9c3f753ac3f0
[  336.755133] R10: 0000000000000027 R11: 00000000000001a3 R12: ffff9c3ed924d4c0
[  336.755137] R13: 0000000000000000 R14: ffff9c3f0da5f538 R15: 0000000000000001
[  336.755141] FS:  00007f22a046b640(0000) GS:ffff9c3f75200000(0000) knlGS:0000000000000000
[  336.755147] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  336.755151] CR2: ffffc04b64e27de0 CR3: 000000019c1d8006 CR4: 00000000001706f0
[  336.755156] Call Trace:
[  336.755162]  ? zfs_ioc_send_progress+0x10f/0x1c0 [zfs]
[  336.755498]  zfsdev_ioctl_common+0x65a/0x710 [zfs]
[  336.755816]  zfsdev_ioctl+0x57/0xe0 [zfs]
[  336.756130]  __x64_sys_ioctl+0x91/0xc0
[  336.756145]  do_syscall_64+0x38/0x90
[  336.756152]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  336.756160] RIP: 0033:0x7f22a11acecb
[  336.756168] Code: ff ff ff 85 c0 79 8b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6d 1f 0d 00 f7 d8 64 89 01 48
[  336.756174] RSP: 002b:00007f22a0465748 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  336.756181] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f22a11acecb
[  336.756185] RDX: 00007f22a0465750 RSI: 0000000000005a3e RDI: 0000000000000003
[  336.756189] RBP: 000055d4296fa8f0 R08: 000000000000ffff R09: 0000000000000000
[  336.756192] R10: 00007f22a0468d00 R11: 0000000000000246 R12: 00007f22a0468d70
[  336.756195] R13: 00007f22a0468d78 R14: 00007f22a0465750 R15: 00007f22a0468d70
[  336.756202] Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE nft_chain_nat nf_nat bridge stp llc cmac algif_hash algif_skcipher af_alg bnep binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg intel_rapl_msr soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core snd_hwdep soundwire_bus intel_rapl_common snd_soc_core x86_pkg_temp_thermal intel_powerclamp coretemp snd_compress ac97_bus kvm_intel snd_pcm_dmaengine snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi btusb rapl btrtl intel_cstate snd_seq btbcm btintel iwlmvm bluetooth mac80211 ecdh_generic ecc libarc4 snd_seq_device snd_timer serio_raw joydev input_leds iwlwifi cfg80211 thinkpad_acpi at24 nvram ledtrig_audio snd mei_me mei soundcore mac_hid nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt nf_log_ipv4 nf_log_common ipt_REJECT nf_reject_ipv4 xt_LOG nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack
[  336.756334]  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nft_counter sch_fq_codel msr nf_tables libcrc32c parport_pc nfnetlink ppdev lp parport ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) dm_crypt r8152 usbnet mii uas usb_storage hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd i915 glue_helper psmouse i2c_i801 i2c_smbus i2c_algo_bit drm_kms_helper ahci libahci lpc_ich syscopyarea sysfillrect sysimgblt fb_sys_fops sdhci_pci cec cqhci rc_core xhci_pci sdhci e1000e drm xhci_pci_renesas video
[  336.756436] CR2: ffffc04b64e27de0
[  336.756442] ---[ end trace 63f11c04ef693c40 ]---
[  336.756446] RIP: 0010:dmu_dump_write+0x24a/0x320 [zfs]
[  336.756716] Code: 8b 45 78 48 89 43 50 e9 9f fe ff ff 45 85 c0 75 19 41 8b 45 34 45 89 ce 83 e0 7f 88 43 32 49 63 c1 48 89 43 60 e9 6d fe ff ff <49> 83 7d 30 00 78 04 80 4b 31 02 48 8d 53 70 48 8d 73 68 4c 89 ef
[  336.756723] RSP: 0018:ffffc04b64e277b0 EFLAGS: 00010206
[  336.756729] RAX: 7171c62bac0f4992 RBX: ffff9c3e993c0000 RCX: 0000000000000000
[  336.756732] RDX: 000000000000001d RSI: 0000000000000013 RDI: ffff9c3e993c0138
[  336.756736] RBP: ffffc04b64e277f8 R08: 0000000001000000 R09: 0000000000020000
[  336.756740] R10: 000000000000001d R11: 0000000000020000 R12: ffffc04b64e27930
[  336.756743] R13: 0000000000000000 R14: 0000000000020000 R15: 0000000000000000
[  336.756747] FS:  00007f22a046b640(0000) GS:ffff9c3f75200000(0000) knlGS:0000000000000000
[  336.756752] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  336.756756] CR2: ffffc04b64e27de0 CR3: 000000019c1d8006 CR4: 00000000001706f0

zpool properties for rpool_borked:

size    2.72T   -
capacity    0%  -
altroot -   default
health  ONLINE  -
guid    1124856219309159331 -
version -   default
bootfs  -   default
delegation  on  default
autoreplace off default
cachefile   -   default
failmode    wait    default
listsnapshots   off default
autoexpand  off default
dedupratio  1.00x   -
free    2.72T   -
allocated   113M    -
readonly    on  -
ashift  12  local
comment -   default
expandsize  -   -
freeing 0   -
fragmentation   0%  -
leaked  0   -
multihost   off default
checkpoint  -   -
load_guid   9338383059261438715 -
autotrim    on  local
feature@async_destroy   enabled local
feature@empty_bpobj active  local
feature@lz4_compress    active  local
feature@multi_vdev_crash_dump   enabled local
feature@spacemap_histogram  active  local
feature@enabled_txg active  local
feature@hole_birth  active  local
feature@extensible_dataset  active  local
feature@embedded_data   active  local
feature@bookmarks   enabled local
feature@filesystem_limits   enabled local
feature@large_blocks    enabled local
feature@large_dnode enabled local
feature@sha512  enabled local
feature@skein   enabled local
feature@edonr   enabled local
feature@userobj_accounting  active  local
feature@encryption  active  local
feature@project_quota   active  local
feature@device_removal  enabled local
feature@obsolete_counts enabled local
feature@zpool_checkpoint    enabled local
feature@spacemap_v2 active  local
feature@allocation_classes  enabled local
feature@resilver_defer  enabled local
feature@bookmark_v2 enabled local
feature@redaction_bookmarks enabled local
feature@redacted_datasets   enabled local
feature@bookmark_written    enabled local
feature@log_spacemap    active  local
feature@livelist    enabled local
feature@device_rebuild  enabled local
feature@zstd_compress   enabled local

dataset properties for rpool_borked/CORRUPTED_media:

type    filesystem  -
creation    Sat Jun 19 22:23 2021   -
used    80.5M   -
available   2.63T   -
referenced  79.9M   -
compressratio   1.06x   -
mounted no  -
quota   none    default
reservation none    default
recordsize  1M  received
mountpoint  /srv/media_old  local
sharenfs    off default
checksum    on  default
compression lz4 inherited from rpool_borked
atime   off inherited from rpool_borked
devices on  default
exec    on  default
setuid  on  default
readonly    off default
zoned   off default
snapdir hidden  default
aclmode discard default
aclinherit  restricted  default
createtxg   17028   -
canmount    off local
xattr   on  default
copies  1   default
version 5   -
utf8only    on  -
normalization   formD   -
casesensitivity sensitive   -
vscan   off default
nbmand  off default
sharesmb    off default
refquota    none    default
refreservation  none    default
guid    4947032233573722053 -
primarycache    all default
secondarycache  all default
usedbysnapshots 608K    -
usedbydataset   79.9M   -
usedbychildren  0B  -
usedbyrefreservation    0B  -
logbias latency default
objsetid    51063   -
dedup   off default
mlslabel    none    default
sync    standard    default
dnodesize   legacy  default
refcompressratio    1.06x   -
written 7.63M   -
logicalused 84.1M   -
logicalreferenced   83.8M   -
volmode default default
filesystem_limit    none    default
snapshot_limit  none    default
filesystem_count    none    default
snapshot_count  none    default
snapdev hidden  default
acltype posix   inherited from rpool_borked
context none    default
fscontext   none    default
defcontext  none    default
rootcontext none    default
relatime    off default
redundant_metadata  all default
overlay on  inherited from rpool_borked
encryption  aes-256-gcm -
keylocation file:///dev/shm/random.key  local
keyformat   passphrase  -
pbkdf2iters 350000000   -
encryptionroot  rpool_borked/CORRUPTED_media    -
keystatus   unavailable -
special_small_blocks    0   default
com.sun:auto-snapshot   true    inherited from rpool_borked

snapshot properties for rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348:

type    snapshot    -
creation    Sun May  9  6:49 2021   -
used    8K  -
referenced  72.6M   -
compressratio   1.03x   -
devices on  default
exec    on  default
setuid  on  default
createtxg   17071   -
xattr   on  default
version 5   -
utf8only    on  -
normalization   formD   -
casesensitivity sensitive   -
nbmand  off default
guid    8174532689526737298 -
primarycache    all default
secondarycache  all default
defer_destroy   off -
userrefs    0   -
objsetid    51480   -
mlslabel    none    default
refcompressratio    1.03x   -
written 71.0M   -
clones      -
logicalreferenced   74.6M   -
acltype posix   inherited from rpool_borked
context none    default
fscontext   none    default
defcontext  none    default
rootcontext none    default
encryption  aes-256-gcm -
encryptionroot  rpool_borked/CORRUPTED_media    -
keystatus   unavailable -
com.sun:auto-snapshot-desc  -   received
com.sun:auto-snapshot   true    inherited from rpool_borked

Full output of /dev/kmsg in the test from initramfs, including echo t > /proc/sysrq-trigger: kmsg-full.log

Sorry if more of this should have been attachment rather than inline; I just figured inline would be more convenient.

pcd1193182 commented 3 years ago

I tried to copy your setup to see if I could identify the line of code that's crashing, but my addresses don't seem to match up to yours; I must have failed to replicate your setup in some way. Could you send me either your zfs kernel module or the contents of objdump -D of it?

t-m-w commented 3 years ago

Could you send me either your zfs kernel module or the contents of objdump -D of it?

The kernel module is from https://launchpad.net/~jonathonf/+archive/ubuntu/zfs -- here is a zip of everything in /lib/modules/5.11.0-18-generic/kernel/zfs: zfs.zip

t-m-w commented 3 years ago

Actually, I think I shared the wrong modules and you need the stuff from the updates/dkms folder...?

I took the opportunity to try and build the zfs package with --enable-debug. I'm not sure if it worked. All I did was an apt source zfs-linux to get the source from the ppa, then edited debian/rules to add --enable-debug after every ./configure line (2 of them) and after a dh_auto_configure line, then ran dpkg-buildpackage -us -uc and installed the resulting packages, except zfs-dracut. Also, I'm on kernel 5.11.0-22-generic now, so I repeated the process.

Here's those modules again (hopefully this is what you need): zfs.zip Here's the full output of /dev/kmsg including after running echo t > /proc/sysrq-trigger: zfs-error-kmsg.log

t-m-w commented 3 years ago

OK, this'll be my last of several posts in a row in a short time, just to say that I've updated the issue with new information and with a link to download the affected zpool as a sparse image, with unrelated datasets removed, which will hopefully be helpful in solving this (and allow me to remove the zpool/dataset/snapshot from my system if needed). I recommend disregarding my previous two comments.

gamanakis commented 3 years ago

There seem to be multiple problems here. dmu_dump_write() can be called with bp=NULL if it cannot send large_blocks and the buffer size is more than 128kb (so it tries to split the data in chunks). Which actually causes the panic in your case as it tries to dereference it if sending raw data.

Now if we circumvent that and instrument dmu_send.c using this patch:

diff --git a/module/zfs/dmu_send.c b/module/zfs/dmu_send.c
index d65438223..4f2e7c581 100644
--- a/module/zfs/dmu_send.c
+++ b/module/zfs/dmu_send.c
@@ -487,7 +487,7 @@ dmu_dump_write(dmu_send_cookie_t *dscp, dmu_object_type_t type, uint64_t object,
        drrw->drr_logical_size = lsize;

        /* only set the compression fields if the buf is compressed or raw */
-       if (raw || lsize != psize) {
+       if (lsize != psize || raw) {
                ASSERT(raw || dscp->dsc_featureflags &
                    DMU_BACKUP_FEATURE_COMPRESSED);
                ASSERT(!BP_IS_EMBEDDED(bp));
@@ -1007,12 +1007,14 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
                 * don't allow us to send large blocks, we split the data from
                 * the arc buf into chunks.
                 */
+               cmn_err(CE_NOTE, "datablksz=%lu, datasz=%lu", srdp->datablksz, srdp->datasz);
                if (srdp->datablksz > SPA_OLD_MAXBLOCKSIZE &&
                    !(dscp->dsc_featureflags &
                    DMU_BACKUP_FEATURE_LARGE_BLOCKS)) {
                        while (srdp->datablksz > 0 && err == 0) {
                                int n = MIN(srdp->datablksz,
                                    SPA_OLD_MAXBLOCKSIZE);
+                               cmn_err(CE_NOTE, "n=%d", n);
                                err = dmu_dump_write(dscp, srdp->obj_type,
                                    range->object, offset, n, n, NULL, data);
                                offset += n;

then fletcher_native() panics:

[  491.511987] NOTICE: datablksz=1048576, datasz=393216
[  491.511989] NOTICE: n=131072                                                                                      [  491.512112] NOTICE: n=131072
[  491.512347] NOTICE: n=131072
[  491.512503] NOTICE: n=131072
[  491.512512] BUG: unable to handle page fault for address: ffffaefa898be000
[  491.513186] #PF: supervisor read access in kernel mode
[  491.513624] #PF: error_code(0x0000) - not-present page
[  491.514120] PGD 100000067 P4D 100000067 PUD 1001b1067 PMD 1203a4067 PTE 0
[  491.514651] Oops: 0000 [#1] PREEMPT SMP NOPTI                                                                     [  491.515001] CPU: 8 PID: 3210 Comm: lt-zfs Tainted: P           OE     5.12.15-arch1-1 #1                          [  491.515713] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ELITE/Z390 AORUS ELITE-CF, BIOS F10g 09/16/2020                                                                                                                    [  491.516677] RIP: 0010:fletcher_4_avx2_native+0x38/0x80 [zcommon]                                                  [  491.517225] Code: 16 53 48 89 f3 e8 28 ff ff ff c4 c1 7e 6f 04 24 c4 c1 7e 6f 4c 24 20 c4 c1 7e 6f 54 24 40 c4 c1 7e 6f 5c 24 60 48 39 eb 73 1e <c4> e2 7d 35 23 c5 fd d4 c4 c5 f5 d4 c8 c5 ed d4 d1 c5 e5 d4 da 48
[  491.519221] RSP: 0018:ffffaefa893a3520 EFLAGS: 00010006
[  491.519706] RAX: 0000000000000000 RBX: ffffaefa898be000 RCX: 0000000000000000
[  491.520364] RDX: 00000000ffffffff RSI: ffffaefa898bd000 RDI: ffff98288f295000
[  491.521008] RBP: ffffaefa898dd000 R08: 0000000000020000 R09: ffff982882bdb020                                     [  491.521618] R10: 0000000000000002 R11: ffff98288a42c710 R12: ffffaefa893a3540
[  491.522179] R13: ffffaefa898bd000 R14: 0000000000020000 R15: ffffaefa893a3960
[  491.522640] FS:  00007fd9df5327c0(0000) GS:ffff9828fbc00000(0000) knlGS:0000000000000000
[  491.523203] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  491.523593] CR2: ffffaefa898be000 CR3: 000000011b43e005 CR4: 0000000000370ee0
[  491.524080] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  491.524567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  491.525049] Call Trace:
[  491.525229]  fletcher_4_native_impl+0x62/0xa0 [zcommon]
[  491.525580]  ? __kernel_write+0x2c3/0x310
[  491.525907]  fletcher_4_native+0xd0/0xf0 [zcommon]
[  491.526375]  fletcher_4_incremental_impl+0x57/0xc0 [zcommon]
[  491.526789]  fletcher_4_incremental_native+0x2f/0x40 [zcommon]
[  491.527157]  dump_record+0xbb/0x200 [zfs]

Notice the: [ 491.511987] NOTICE: datablksz=1048576, datasz=393216

Is it possible to split in chunks when raw sending an encrypted dataset?

gamanakis commented 3 years ago

It seems that if we split in chunks based on datasz instead of datablksz (in do_dump()) in case of a raw encrypted send, then the send succeeds.

gamanakis commented 3 years ago

After more careful consideration of the current codebase it seems it's not possible to raw send in small chunks.

gamanakis commented 3 years ago

@t-m-w if this issue is still relevant to you, could you try a normal (non-raw) zfs send rpool_borked/CORRUPTED_media@zfs-auto-snap_weekly-2021-05-09-1348 with the keys loaded? That should probably work.

t-m-w commented 3 years ago

@gamanakis Yes, I can confirm that a non-raw send works just fine and does not cause this problem. If you'd like me to privately share any of the decrypted data for testing or troubleshooting, let me know, or if there's anything else you need me to try, I'd be happy to do so.

peterjeremy commented 3 years ago

I have the same problem affecting 3 (out of 81) filesystems on my FreeBSD system - see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=259093

I've been doing some digging through both my crashdumps and using dtrace on my live system. The immediate trigger path is dmu_send_obj() calls dsl_dataset_hold_obj_flags() calls dsl_dataset_hold_obj(), which iterates through load_zfeature() for each per-dataset feature (which is org.open-zfs:large_blocks, org.zfsonlinux:large_dnode, org.illumos:sha512, org.illumos:skein, org.illumos:edonr, org.zfsonlinux:userobj_accounting, com.datto:encryption, org.zfsonlinux:project_quota, org.freebsd:zstd_compress and com.datto:ivset_guid) to initialise dspp.to_ds->ds_feature. dmu_send_obj() then calls dmu_send_impl(), which calls setup_featureflags() to translate the to_ds.ds_feature array into flag bits in featureflags. For the affected filesystem, org.open-zfs:large_blocks is incorrectly false on the dataset (thought it's correctly "active" on the pool) and therefore SPA_FEATURE_LARGE_BLOCKS is not set in the to_ds.ds_feature array, hence DMU_BACKUP_FEATURE_LARGE_BLOCKS is not set in featureflags even though large blocks exist in the dataset.

Working the other way, the only place where SPA_FEATURE_LARGE_BLOCKS is set is in dsl_dataset_block_born(), which marks it as needing activation if (BP_GET_LSIZE(bp) > SPA_OLD_MAXBLOCKSIZE). The "activation" is then converted to "active" in dsl_dataset_sync() if it's not already marked active. I haven't yet traced the actual path to see where that code is failing.

pcd1193182 commented 3 years ago

If that's the case, the bug here isn't in the send logic at all, it would appear to be in the per-dataset feature logic or the large block logic somewhere. I don't know as much about those codepaths... @ahrens or @don-brady might know more?

peterjeremy commented 3 years ago

I have spent some time with dtrace and, whilst I can see ds_feature_activation being set in dsl_dataset_block_born(), I can't find the matching dsl_dataset_sync(). Possibly this is just lack of dtrace-foo but it looks very much like there are code paths where dsl_dataset_sync() is missing.

ahrens commented 3 years ago

@peterjeremy Nice analysis of where the problem is occurring. Are you saying that after ds_feature_activation is set, dsl_dataset_sync() is not called on this dataset? Or that it's called but it doesn't call dsl_dataset_activate_feature()?

If dsl_dataset_block_born() is called, the dataset should already be "dirty" (i.e. on dp_dirty_datasets), and therefore dsl_pool_sync() should call dsl_dataset_sync() on it. If that doesn't seem to be the case, then we could try adding an assertion to dsl_dataset_block_born() asserting that the dataset is dirty. There might be some corner cases where that isn't true, some of which might be fine but it's possible that there's a bug here.

@gamanakis

After more careful consideration of the current codebase it seems it's not possible to raw send in small chunks

That's right. We should be either failing that as an error, or (perhaps better) saying that raw send implies --large-blocks (whether encrypted or not).

gamanakis commented 2 years ago

There is the following codepath: dsl_dataset_sync() -> dmu_objset_sync() -> zio_nowait()

In the zio_nowait() of dmu_objset_sync(), the callback function is: dmu_objset_write_done() -> dsl_dataset_block_born().

My question is, is it possible here that dsl_dataset_sync() completes before zio_nowait() calls dsl_dataset_block_born()? There seems to be a race between dsl_dataset_sync() and the callback dmu_objset_write_done() -> dsl_dataset_block_born().