openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.68k stars 1.75k forks source link

ZFS on Linux null pointer dereference #11679

Closed segdy closed 1 month ago

segdy commented 3 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version 10 buster
Linux Kernel 4.19.0-14 (SMP Debian 4.19.171-2)
Architecture amd64
ZFS Version 2.0.3-1~bpo10+1
SPL Version 2.0.3-1~bpo10+1

Describe the problem you're observing

When I start sending raw ZFS snapshots to a different system, my Linux systen (4.19.0-14-amd64) starts to hang completely. I can ping it, I can start a very commands (such as dmesg) but most commands hang (incl zfs, zpool, htop, ps, ...). The entire systems hangs completely.

Dmesg shows the following entries at the time of the occurance:

[ 2293.134071] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 2293.149707] PGD 0 P4D 0
[ 2293.154752] Oops: 0000 [#1] SMP PTI
[ 2293.161701] CPU: 1 PID: 12576 Comm: receive_writer Tainted: P           OE     4.19.0-14-amd64 #1 Debian 4.19.171-2
[ 2293.182517] Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0a 12/21/2015
[ 2293.196819] RIP: 0010:abd_verify+0x5/0x60 [zfs]
[ 2293.205865] Code: 0f 1f 44 00 00 0f 1f 44 00 00 8b 07 c1 e8 05 83 e0 01 c3 66 90 0f 1f 44 00 00 8b 07 c1 e8 06 83 e0 01 c3 66 90 0f 1f 44 00 00 <8b> 07 a8 01 74 01 c3 a8 40 74 43 41 54 4c 8d 67 68 55 53 48 8b 47
[ 2293.243325] RSP: 0018:ffffb12e4b6d7a28 EFLAGS: 00010246
[ 2293.253741] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2293.267974] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 2293.282205] RBP: 0000000000004000 R08: ffff935ec10b70b0 R09: 0000000000000000
[ 2293.296434] R10: 0000000000007130 R11: ffff935d75f984e0 R12: 0000000000004000
[ 2293.310664] R13: 0000000000000000 R14: ffffffffc0fea550 R15: 0000000000000020
[ 2293.324900] FS:  0000000000000000(0000) GS:ffff935ecfb00000(0000) knlGS:0000000000000000
[ 2293.341053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2293.352510] CR2: 0000000000000000 CR3: 000000001340a001 CR4: 00000000000606e0
[ 2293.366743] Call Trace:
[ 2293.371704]  abd_borrow_buf+0x12/0x40 [zfs]
[ 2293.380104]  abd_borrow_buf_copy+0x28/0x70 [zfs]
[ 2293.389377]  zio_crypt_copy_dnode_bonus+0x36/0x130 [zfs]
[ 2293.400041]  arc_buf_fill+0x3ff/0xb60 [zfs]
[ 2293.408449]  ? zfs_btree_add_idx+0xd0/0x200 [zfs]
[ 2293.417889]  arc_untransform+0x1c/0x70 [zfs]
[ 2293.426461]  dbuf_read_verify_dnode_crypt+0xec/0x160 [zfs]
[ 2293.437466]  dbuf_read_impl.constprop.29+0x4ad/0x6b0 [zfs]
[ 2293.448423]  ? kmem_cache_alloc+0x167/0x1d0
[ 2293.456776]  ? __cv_init+0x3d/0x60 [spl]
[ 2293.464671]  ? dbuf_cons+0xa7/0xc0 [zfs]
[ 2293.472497]  ? spl_kmem_cache_alloc+0x108/0x7a0 [spl]
[ 2293.482583]  ? _cond_resched+0x15/0x30
[ 2293.490071]  ? _cond_resched+0x15/0x30
[ 2293.497542]  ? mutex_lock+0xe/0x30
[ 2293.504402]  ? aggsum_add+0x17a/0x190 [zfs]
[ 2293.512810]  dbuf_read+0x1b2/0x520 [zfs]
[ 2293.520672]  ? dnode_hold_impl+0x350/0xc20 [zfs]
[ 2293.529904]  dmu_bonus_hold_by_dnode+0x126/0x1a0 [zfs]
[ 2293.540186]  receive_object+0x403/0xc70 [zfs]
[ 2293.548906]  ? receive_freeobjects.isra.10+0x9d/0x120 [zfs]
[ 2293.560049]  receive_writer_thread+0x279/0xa00 [zfs]
[ 2293.569962]  ? set_curr_task_fair+0x26/0x50
[ 2293.578319]  ? receive_process_write_record+0x190/0x190 [zfs]
[ 2293.589793]  ? __thread_exit+0x20/0x20 [spl]
[ 2293.598317]  ? thread_generic_wrapper+0x6f/0x80 [spl]
[ 2293.608410]  ? receive_process_write_record+0x190/0x190 [zfs]
[ 2293.619882]  thread_generic_wrapper+0x6f/0x80 [spl]
[ 2293.629609]  kthread+0x112/0x130
[ 2293.636053]  ? kthread_bind+0x30/0x30
[ 2293.643351]  ret_from_fork+0x35/0x40
[ 2293.650473] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter veth pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) nf_tables nfnetlink vboxdrv(OE) bridge binfmt_misc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) intel_rapl x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm irqbypass crct10dif_pclmul ib_iser joydev crc32_pclmul rdma_cm ghash_clmulni_intel iw_cm intel_cstate ib_cm intel_uncore ib_core intel_rapl_perf configfs ipmi_si sg ipmi_devintf iTCO_wdt iTCO_vendor_support pcc_cpufreq intel_pch_thermal iscsi_tcp ipmi_msghandler libiscsi_tcp libiscsi evdev scsi_transport_iscsi pcspkr tun nfsd auth_rpcgss nfs_acl lockd grace sunrpc lm85 dme1737 hwmon_vid iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat nf_conntrack
[ 2293.793008]  nf_defrag_ipv6 nf_defrag_ipv4 fuse loop 8021q garp stp mrp llc ecryptfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 raid10 uas usb_storage raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear hid_generic raid1 usbhid hid md_mod sd_mod ast ahci ttm libahci libata drm_kms_helper drm crc32c_intel igb i2c_i801 dca i2c_algo_bit scsi_mod lpc_ich mfd_core e1000e xhci_pci ehci_pci xhci_hcd ehci_hcd usbcore usb_common thermal fan video button
[ 2293.895677] CR2: 0000000000000000
[ 2293.902280] ---[ end trace 164c64ca87be80af ]---
[ 2294.020926] RIP: 0010:abd_verify+0x5/0x60 [zfs]
[ 2294.029975] Code: 0f 1f 44 00 00 0f 1f 44 00 00 8b 07 c1 e8 05 83 e0 01 c3 66 90 0f 1f 44 00 00 8b 07 c1 e8 06 83 e0 01 c3 66 90 0f 1f 44 00 00 <8b> 07 a8 01 74 01 c3 a8 40 74 43 41 54 4c 8d 67 68 55 53 48 8b 47
[ 2294.067433] RSP: 0018:ffffb12e4b6d7a28 EFLAGS: 00010246
[ 2294.077850] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2294.092082] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 2294.106312] RBP: 0000000000004000 R08: ffff935ec10b70b0 R09: 0000000000000000
[ 2294.120542] R10: 0000000000007130 R11: ffff935d75f984e0 R12: 0000000000004000
[ 2294.134774] R13: 0000000000000000 R14: ffffffffc0fea550 R15: 0000000000000020
[ 2294.149006] FS:  0000000000000000(0000) GS:ffff935ecfb00000(0000) knlGS:0000000000000000
[ 2294.165144] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2294.176600] CR2: 0000000000000000 CR3: 000000001340a001 CR4: 00000000000606e0

Interestingly, the transfer continues happily but just everything else in the system hangs. The only way to recover is resetting the machine (since not even reboot works).

Describe how to reproduce the problem

It's a tough one. It seems to me that the issue might be load related in some sense since it only occurs if I have two zfs send's (via syncoid) running in parallel that have to do with encrypted datasets.

Transfer 1

The first one sends datasets from an unecrypted dataset into an encrypted one (I migrate to encryption).

I use syncoid and use the command:

syncoid -r --skip-parent --no-sync-snap zpradix1imain/sys/vz zpradix1imain/sys/vz_enc

This translates into

zfs send -I 'zpradix1imain/sys/vz/main'@'zfs-auto-snap_hourly-2021-03-02-1917' 'zpradix1imain/sys/vz/main'@'zfs-auto-snap_frequent-2021-03-02-1932' | mbuffer -q -s 128k -m 16M 2>/dev/null | pv -s 16392592 | zfs receive -s -F 'zpradix1imain/sys/vz_enc/main'

Transfer 2

I transfer data from an encrypted dataset raw to a secondary server. The syncoid command is:

syncoid -r --skip-parent --no-sync-snap --sendoptions=w --exclude=zfs-auto-snap_hourly --exclude=zfs-auto-snap_frequent zpradix1imain/data root@192.168.200.12:zpzetta/radix/data

This translates into:

zfs send -w 'zpradix1imain/data/home'@'vicari-prev' | pv -s 179222507064 | lzop | mbuffer -q -s 128k -m 16M 2>/dev/null | ssh ...

In summary:

pquan commented 3 years ago

I am experiencing the same or similar issue. A hang-up during zfs send/receive of encrypted filesystem. The two filesystems are encrypted with different keys. I'm also using syncoid. My stack trace is slightly different, but the null pointer dereference is in the same region of abd_borrow_buf_copy

My system is a standard proxmox install 6.3

Os: Debian Buster 10 Kernel: 5.4.78-2-pve ZFS Module Version: v0.8.5-pve1 Pool: 4x2 two-drive mirrors of 2TB and 4TB vdev devices. zpool scrub returns no errors.

[54105.343179] BUG: kernel NULL pointer dereference, address: 0000000000000000
[54105.343185] #PF: supervisor read access in kernel mode
[54105.343187] #PF: error_code(0x0000) - not-present page
[54105.343188] PGD 0 P4D 0
[54105.343190] Oops: 0000 [#1] SMP PTI
[54105.343193] CPU: 1 PID: 13831 Comm: receive_writer Tainted: P           O      5.4.78-2-pve #1
[54105.343195] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC R2.0, BIOS P1.40 03/09/2018
[54105.343253] RIP: 0010:abd_borrow_buf+0x13/0x70 [zfs]
[54105.343255] Code: 49 8b 44 24 18 41 5c 41 5d 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 f5 41 54 53 <44> 8b 27 48 89 fb 41 83 e4 01 75 30 44 8b 77 1c 48 8b 47 20 4d 85
[54105.343258] RSP: 0018:ffff96ae063a39d8 EFLAGS: 00010246
[54105.343260] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[54105.343261] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[54105.343263] RBP: ffff96ae063a39f8 R08: 00000000000000a5 R09: 0000000000000000
[54105.343264] R10: 0000000000000706 R11: 0000000000000000 R12: 0000000000000000
[54105.343265] R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc07ee470
[54105.343267] FS:  0000000000000000(0000) GS:ffff88c9af500000(0000) knlGS:0000000000000000
[54105.343269] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54105.343271] CR2: 0000000000000000 CR3: 000000001980a004 CR4: 00000000000606e0
[54105.343272] Call Trace:
[54105.343292]  abd_borrow_buf_copy+0x2c/0x80 [zfs]
[54105.343323]  zio_crypt_copy_dnode_bonus+0x31/0x140 [zfs]
[54105.343327]  ? preempt_schedule_common+0xe/0x20
[54105.343345]  arc_buf_fill+0x402/0xb60 [zfs]
[54105.343364]  arc_untransform+0x21/0x90 [zfs]
[54105.343383]  dbuf_read_verify_dnode_crypt+0xee/0x160 [zfs]
[54105.343402]  dbuf_read+0x19d/0xb80 [zfs]
[54105.343425]  dmu_tx_check_ioerr+0x68/0xd0 [zfs]
[54105.343448]  dmu_tx_hold_free_impl+0x132/0x250 [zfs]
[54105.343470]  dmu_tx_hold_free+0x3c/0x50 [zfs]
[54105.343491]  dmu_free_long_range+0x234/0x4b0 [zfs]
[54105.343511]  dmu_free_long_object+0x26/0xc0 [zfs]
[54105.343533]  receive_freeobjects.isra.10+0x7a/0x100 [zfs]
[54105.343555]  receive_writer_thread+0x43c/0xb80 [zfs]
[54105.343558]  ? set_next_task_fair+0x73/0xb0
[54105.343565]  ? spl_kmem_free+0x33/0x40 [spl]
[54105.343586]  ? receive_read_prefetch+0x140/0x140 [zfs]
[54105.343591]  thread_generic_wrapper+0x74/0x90 [spl]
[54105.343613]  ? receive_read_prefetch+0x140/0x140 [zfs]
[54105.343617]  ? thread_generic_wrapper+0x74/0x90 [spl]
[54105.343620]  kthread+0x120/0x140
[54105.343624]  ? __thread_exit+0x20/0x20 [spl]
[54105.343626]  ? kthread_park+0x90/0x90
[54105.343629]  ret_from_fork+0x35/0x40
[54105.343631] Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp sch_ingress iptable_filter bpfilter bonding 8021q garp mrp openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log nfnetlink intel_rapl_msr mei_hdcp ppdev intel_rapl_common x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd rapl intel_cstate snd_hda_codec_hdmi pcspkr i915 drm_kms_helper snd_hda_codec_realtek mei_me snd_hda_codec_generic ledtrig_audio mei drm snd_hda_intel snd_intel_dspcfg snd_hda_codec i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect snd_hwdep sysimgblt snd_pcm snd_timer snd soundcore parport_pc parport mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid coretemp sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO)
[54105.343657]  zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq libcrc32c r8169 realtek e1000e i2c_i801 lpc_ich ahci libahci ehci_pci ehci_hcd xhci_pci xhci_hcd mpt3sas raid_class scsi_transport_sas video
[54105.343675] CR2: 0000000000000000
[54105.343677] ---[ end trace 61f4d52a93ce31c9 ]---
[54105.343695] RIP: 0010:abd_borrow_buf+0x13/0x70 [zfs]
[54105.343696] Code: 49 8b 44 24 18 41 5c 41 5d 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 f5 41 54 53 <44> 8b 27 48 89 fb 41 83 e4 01 75 30 44 8b 77 1c 48 8b 47 20 4d 85
[54105.343699] RSP: 0018:ffff96ae063a39d8 EFLAGS: 00010246
[54105.343700] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[54105.343702] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[54105.343703] RBP: ffff96ae063a39f8 R08: 00000000000000a5 R09: 0000000000000000
[54105.343705] R10: 0000000000000706 R11: 0000000000000000 R12: 0000000000000000
[54105.343706] R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc07ee470
[54105.343708] FS:  0000000000000000(0000) GS:ffff88c9af500000(0000) knlGS:0000000000000000
[54105.343709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54105.343710] CR2: 0000000000000000 CR3: 000000001980a004 CR4: 00000000000606e0
[89614.646434] hrtimer: interrupt took 14015 ns
cserem commented 3 years ago

Similar hung system, on the same distribution. The problem has been around ever since reinstalling the system and enabling encryption with ZFS 0.8.5, and persists with 5.4.114-1-pve / 2.0.4-pve1

dmesg ``` kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP PTI kernel: CPU: 1 PID: 23147 Comm: receive_writer Tainted: P O 5.4.114-1-pve #1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.60 03/01/2018 kernel: RIP: 0010:abd_verify+0x5/0x70 [zfs] kernel: Code: 00 00 00 66 90 0f 1f 44 00 00 55 8b 07 c1 e8 06 48 89 e5 83 e0 01 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <8b> 07 a8 01 75 4f 55 48 89 e5 41 55 41 54 49 89 fc 53 a8 40 74 40 kernel: RSP: 0018:ffffbd418d5b7930 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 kernel: RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000 kernel: RBP: ffffbd418d5b7948 R08: 0000000000000037 R09: 0000000000000000 kernel: R10: 000000000000b1e5 R11: 0000000000000000 R12: 0000000000004000 kernel: R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc06b19d0 kernel: FS: 0000000000000000(0000) GS:ffff97e2efe80000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000000 CR3: 00000003150e4000 CR4: 00000000001026e0 kernel: Call Trace: kernel: ? abd_borrow_buf+0x17/0x40 [zfs] kernel: abd_borrow_buf_copy+0x2c/0x80 [zfs] kernel: zio_crypt_copy_dnode_bonus+0x31/0x140 [zfs] kernel: ? preempt_schedule_common+0xe/0x20 kernel: arc_buf_fill+0x402/0xb40 [zfs] kernel: arc_untransform+0x21/0x80 [zfs] kernel: dbuf_read_verify_dnode_crypt+0xee/0x160 [zfs] kernel: ? __cv_init+0x42/0x60 [spl] kernel: dbuf_read_impl.constprop.33+0x10c/0x6e0 [zfs] kernel: ? arc_space_consume+0x4f/0x120 [zfs] kernel: ? dbuf_create+0x3e8/0x580 [zfs] kernel: ? mutex_lock+0x12/0x30 kernel: ? dbuf_find+0x111/0x1d0 [zfs] kernel: ? _cond_resched+0x19/0x30 kernel: ? down_read+0x12/0xa0 kernel: dbuf_read+0x1b2/0x510 [zfs] kernel: dmu_tx_check_ioerr+0x68/0xd0 [zfs] kernel: dmu_tx_hold_free_impl+0x132/0x250 [zfs] kernel: dmu_tx_hold_free+0x3c/0x50 [zfs] kernel: dmu_free_long_range+0x22a/0x4d0 [zfs] kernel: dmu_free_long_object+0x26/0xc0 [zfs] kernel: receive_freeobjects.isra.15+0x8d/0x120 [zfs] kernel: receive_writer_thread+0x46a/0xa00 [zfs] kernel: ? spl_kmem_free+0x33/0x40 [spl] kernel: ? receive_process_write_record+0x180/0x180 [zfs] kernel: thread_generic_wrapper+0x74/0x90 [spl] kernel: ? receive_process_write_record+0x180/0x180 [zfs] kernel: ? thread_generic_wrapper+0x74/0x90 [spl] kernel: kthread+0x120/0x140 kernel: ? __thread_exit+0x20/0x20 [spl] kernel: ? kthread_park+0x90/0x90 kernel: ret_from_fork+0x35/0x40 kernel: Modules linked in: xt_multiport binfmt_misc veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl_msr intel_rapl_common intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp i915 kvm_intel snd_hda_intel drm_kms_helper snd_intel_dspcfg kvm snd_hda_codec hci_uart irqbypass snd_hda_core punit_atom_debug btqca btrtl btbcm drm btintel crct10dif_pclmul i2c_algo_bit bluetooth crc32_pclmul fb_sys_fops syscopyarea sysfillrect ghash_clmulni_intel mei_hdcp snd_hwdep snd_pcm sysimgblt mei_txe ppdev ecdh_generic cryptd snd_timer ecc mei snd pcspkr parport_pc soundcore intel_cstate parport rfkill_gpio mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid sunrpc coretemp ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) kernel: zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress hid_generic raid6_pq libcrc32c usbhid r8169 ahci xhci_pci libahci realtek i2c_i801 lpc_ich xhci_hcd video i2c_hid hid kernel: CR2: 0000000000000000 kernel: ---[ end trace 737a4d6fa99905a7 ]--- kernel: RIP: 0010:abd_verify+0x5/0x70 [zfs] kernel: Code: 00 00 00 66 90 0f 1f 44 00 00 55 8b 07 c1 e8 06 48 89 e5 83 e0 01 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <8b> 07 a8 01 75 4f 55 48 89 e5 41 55 41 54 49 89 fc 53 a8 40 74 40 : kernel: RSP: 0018:ffffbd418d5b7930 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 kernel: RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000 kernel: RBP: ffffbd418d5b7948 R08: 0000000000000037 R09: 0000000000000000 kernel: R10: 000000000000b1e5 R11: 0000000000000000 R12: 0000000000004000 kernel: R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc06b19d0 kernel: FS: 0000000000000000(0000) GS:ffff97e2efe80000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000000 CR3: 00000003150e4000 CR4: 00000000001026e0 ````

I too was suspecting that paralell execution of receives was causing the problem, but it turned out, that with serialized execution the problem still occurs (less often i think, but kind of hard to match a pattern here)

My problem is triggered by receiving snapshots into an encrypted dataset, i am not sending data anywhere from this node.

This machine is a consumer of snapshots for multiple machines (all different versions of ZFS). The kernel NULL pointer dereference always happens when a zfs receive runs, never when idle (most of the time there is a running receive, so this might not be valuable information). All sources are non-encrypted, the destination dataset is encrypted and opened with a key.

The commands being run are something along: zfs send -I dataset1@from dataset1@to |compression|buffering|ssh| zfs receive -s -F 'encrypted-dataset/backups/dataset1'

I have tried checking on /proc/spl/kstat/zfs/dbgmsg (zfs_flags = 256, turning it higher, and i can't manage the amout of data it produces) but nothing interesting showed up.

Perhaps someone could tell us, how to diagnose it further?

On a side note, it seems all three of us are using relatively old arcitectures, from the same era:

lheckemann commented 3 years ago

Same problem here, on NixOS, Linux 5.4.114, ZFS 2.0.4, also (probably) when receiving into an encrypted dataset.

Kernel log: https://gist.github.com/lheckemann/1c11ab6865c44a192c9d2597d17ba72b

This is the second time these symptoms have shown up, though I didn't think to get logs the first time around.

I'm on a Celeron G3900 (Skylake, Q4'15).

EDIT: I'm receiving non-raw sends.

pquan commented 3 years ago
  • segdy: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0a 12/21/2015 - C220 released Q2'13
  • pquan: H81 Pro BTC R2.0 - H81 released Q3'13
  • me: Pentium J2900, Bay Trail-D released Q4'13

I'm not sure if this is related to the age of the architecture. But add another of my systems: Intel(R) Xeon(R) CPU E3-1231 running on ASUS P9D-M board. This is relatively "new" when compared to the others.

Incidentally, I also have a X8SIL running an X3440 cpu and 32GB ram. That has not experienced the issue (yet). All these systems are, as you define them, consumers for ZFS snapshots (i.e. backup servers receiving the snapshots from production). The lockups seem related to load too. The more you receive, the more likely the issue occurs. Parallel streams seem trigger the issue more easily .

pquan commented 3 years ago

Today I had a corruption on the receiving system. The load average was over 3000 when I had to hard-reset it. On boot, the zfs import took an unusual long time. zfs status is reporting a permanent data error

errors: Permanent errors have been detected in the following files:

        <0x1e39d>:<0x0>

But I'm not sure what that file is. If data corruption is involved, this would be a serious issue.

rincebrain commented 3 years ago

Could you share the output of these two commands? cat /sys/module/icp/parameters/icp_aes_impl cat /sys/module/icp/parameters/icp_gcm_impl

I have a vague suspicion that all the CPUs mentioned here don't believe in aesni, and am wondering if there's something awry in the codepath they're taking.

(Note that I am not remotely an expert on this code, just a random person musing about what could be going so awry for you and yet not causing much more screaming by encryption users.)

(If you'd like to more aggressively experiment with this theory, I suspect the speed order for this goes aesni > x86_64 > generic, so if you are indeed using x86_64, you could try using the "generic" implementation and see if the problem still reproduces.)

cserem commented 3 years ago
# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq

I have switched both parameters to generic now. Will report back as things advance.

rincebrain commented 3 years ago

One further thought - I don't see a way to validate which implementation it was currently using on "fastest", so it's also possible you were already using generic for one of both of those. So if it still NULL derefs, you could try explicitly setting x86_64 and pclmulqdq.

(I thought of this because if #12106 is indeed related to this, it definitely can't be using x86_64, so...)

pquan commented 3 years ago
# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq

I have switched both parameters to generic now. Will report back as things advance.

Same output here, on all CPU I use (aes-ni aware and in use not only for zfs but for many other things).

# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq

Are we suggesting that the aes-ni implementation in the kernel is "buggy"? That is most certainly not the case.

Not sure if I pointed this out, but, in my case, I'm using different encryption keys on the sending and on the receiving side. Also, some of my senders are not encrypted, but the receiver is (via a parent fs which is encrypted).

rincebrain commented 3 years ago

No, I was suggesting precisely the opposite - that the reason people here might be seeing an issue and others aren't is the absence of use of aesni.

edit: Just for reference, on my (Broadwell) Xeon:

$ for i in aes gcm;do cat /sys/module/icp/parameters/icp_${i}_impl;echo "";done;
cycle [fastest] generic x86_64 aesni
cycle [fastest] avx generic pclmulqdq

So I was hypothesizing the absence of use of one or both of those could be related to this, and was more suspicious of aesni than avx.

pquan commented 3 years ago

Understood. Not sure how to test if zfs is using aes-ni, but a quick check on openssl with evp and non-evp api show a triple-fold increase with using evp, suggesting that aes-ni hardware is being exercised.

I'm going to turn explicitly on aesni and avx on one of the servers now. Just to test the opposite case of generic.

cycle fastest generic x86_64 [aesni]
cycle fastest [avx] generic pclmulqdq
rincebrain commented 3 years ago

AFAICS from quickly skimming the code, there's no equivalent of vdev_raidz_bench for the crypto bits. Maybe I'll see how hard implementing one would be after this.

rincebrain commented 3 years ago

Just to be clear I'm trying to reproduce the right thing, could somebody share which values of "encryption/keyformat" they're using on the dataset that things are being received under? (And of course, if they turn out to all be the same/different, that's another data point...)

cserem commented 3 years ago

My consumer is:

rpool/backups  encryption                  aes-256-gcm                                    -
rpool/backups  keylocation                 file:///rpool/other-encrypted/zfs.KEY              local
rpool/backups  keyformat                   raw     

(After changing icp_[aes|gcm]_impl to "generic" a couple of hours ago, the next backup iteration caused the same null pointer/stack trace.

After restarting and re-applying the "generic" settings, has yet crashed again)

rincebrain commented 3 years ago

Well, unless generic was mysteriously calculated to be faster before and you were always using that, there goes that theory. Rats.

edit: Well, I guess it's technically not impossible for both the non-aesni/avx implementations to cause similar misbehavior, but I'm going to assume for now that it's not specific to those...somehow.

AttilaFueloep commented 3 years ago

@rincebrain The fastest implementation is statically selected according to the available hardware. See here and here. To summarize:

GCM: generic< pclmulqdq < aes AES: generic < x86_64 < aesni

I'm still working on supporting more instruction set extensions for GCM and am thinking of adding a bench as part of that work. Currently the static selection should always select the fastest one.

Further it's quite unlikely that the implementations differ in behavior since ztest uses the cycle implementation, which cycles through all avialable implementations per encryption call/block, and should've catched any discrepancies. I think it's more likely that the load encryption puts on the system triggers some race condition. Just a guess though.

rincebrain commented 3 years ago

@rincebrain The fastest implementation is statically selected according to the available hardware. See here and here. To summarize:

GCM: generic< pclmulqdq < aes AES: generic < x86_64 < aesni

I'm still working on supporting more instruction set extensions for GCM and am thinking of adding a bench as part of that work. Currently the static selection should always select the fastest one.

Further it's quite unlikely that the implementations differ in behavior since ztest uses the cycle implementation, which cycles through all avialable implementations per encryption call/block, and should've catched any discrepancies. I think it's more likely that the load encryption puts on the system triggers some race condition. Just a guess though.

Yeah, I'd buy that, the only reason I proposed the selection as an idea before was, as I said, that I noticed all of them happened to be using CPUs that did not believe in aesni, and the assumption that most people are not running into this regularly, so something about their setups is different.

I also, annoyingly, have not been able to reproduce this in a VM no matter how I've varied the amount of computational power or cores, so far. :/

pquan commented 3 years ago

I must say, this is the first time someone talks about cpu beliefs. This is a serious issue. It happens on multiple cpu generations, mainboard etc. All my stuff is server grade using ecc memory and enterprise sata disks. No errors are logged on the disks, memory, cpu anywhere. Yet, now I'm starting to develop data errors on multiple servers. These seem related to errors in snapshots, even on "source" snapshots: i.e. those that were not transferred to a backup server, but that are native to a server. It started when I started using encrypted storage and OpenZFS v2.0.4 (specifially the proxmox kernel pve-kernel-5.4.114-1-pve)

It might be related to proxmox. I tought I've never seen it with the 0.8.5 zfs module, but turns out it was there. Now I have the same issue with 2.0.4. The thing that changed was encryption.

For now the data corruption regards only snapshots. I hope it won't start affecting live files.

It also seems there's some silent corruption goin on. On one of the servers with data errors, I never had this null pointer dereference, yet, there are corrupted snapshots, and may of them:

errors: Permanent errors have been detected in the following files:

        <0x16c01>:<0x0>
        <0x17506>:<0x1>
        <0x16220>:<0x0>
        <0x1742a>:<0x0>
        <0x1742a>:<0x1>
        <0x1742c>:<0x1>
        <0x1612f>:<0x0>
        <0x17435>:<0x0>
        <0x17439>:<0x0>
        <0x16151>:<0x0>
        <0x17451>:<0x0>
        <0x17451>:<0x1>
        <0x16d66>:<0x0>
        rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_20:00:03_hourly:<0x0>
        <0x16280>:<0x0>
        <0x1678c>:<0x0>
        rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_22:00:01_hourly:<0x0>
        <0x165ad>:<0x0>
        <0x165ad>:<0x1>
        <0x171bd>:<0x0>
        <0x15fcb>:<0x0>
        <0x165d8>:<0x0>
        <0x169f9>:<0x0>

Those hex ids are deleted snapshots (most likely). It can happen that a snapshot is created/deleted during the send/receive. They are asynchronous processes.

At the moment I'm running a scrub now on all servers. Will see if those errors go away or not (deleted snapshots). But the "vm-1234" snapshots above cannot be deleted. They're not visible when issuing zfs list -t snapshot, yet, they do have a name in the error list.

# zfs destroy rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_22:00:01_hourly
could not find any snapshots to destroy; check snapshot names.

There's also more:

# zfs list -t snapshot rpool/vm/vm-1234-disk-0|grep 05-22
cannot iterate filesystems: I/O error
#
pquan commented 3 years ago

Ok, I was waken up this morning by a (perhaps related) issue on one of the "sending" servers. The stack trace does not show the encryption part and seems more related to the arc. I haven't seen one of those for years, so being a very rare thing, I decided to post it here in hope it would help.

I was doing some things on this server tonight, namely:

The server was completely locked up, load average around 6800 and the following stack trace:

[1214732.919868] VERIFY3(0 == remove_reference(hdr, NULL, tag)) failed (0 == 1)
[1214732.919930] PANIC at arc.c:3791:arc_buf_destroy()
[1214732.919985] Showing stack for process 1114
[1214732.920040] CPU: 2 PID: 1114 Comm: z_rd_int Tainted: P           O      5.4.114-1-pve #1
[1214732.920114] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.3.V1 02/04/2020
[1214732.920193] Call Trace:
[1214732.920251]  dump_stack+0x6d/0x8b
[1214732.920309]  spl_dumpstack+0x29/0x2b [spl]
[1214732.920365]  spl_panic+0xd3/0xfb [spl]
[1214732.920420]  ? __wake_up_common+0x78/0x140
[1214732.920521]  ? zei_add_range+0x140/0x140 [zfs]
[1214732.920576]  ? __wake_up+0x13/0x20
[1214732.920630]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214732.920715]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1214732.920792]  arc_buf_destroy+0x101/0x110 [zfs]
[1214732.920869]  arc_read_done+0x243/0x480 [zfs]
[1214732.920957]  zio_done+0x43f/0x1020 [zfs]
[1214732.921045]  zio_execute+0x99/0xf0 [zfs]
[1214732.921101]  taskq_thread+0x2f7/0x4e0 [spl]
[1214732.921157]  ? wake_up_q+0x80/0x80
[1214732.921243]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214732.921302]  kthread+0x120/0x140
[1214732.921357]  ? task_done+0xb0/0xb0 [spl]
[1214732.921410]  ? kthread_park+0x90/0x90
[1214732.921465]  ret_from_fork+0x35/0x40
[1214871.316671] INFO: task z_rd_int:1114 blocked for more than 120 seconds.
[1214871.316733]       Tainted: P           O      5.4.114-1-pve #1
[1214871.316790] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.316864] z_rd_int        D    0  1114      2 0x80004000
[1214871.316921] Call Trace:
[1214871.316978]  __schedule+0x2e6/0x700
[1214871.317040]  schedule+0x33/0xa0
[1214871.317096]  spl_panic+0xf9/0xfb [spl]
[1214871.317195]  ? zei_add_range+0x140/0x140 [zfs]
[1214871.317252]  ? __wake_up+0x13/0x20
[1214871.317307]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214871.317389]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1214871.317466]  arc_buf_destroy+0x101/0x110 [zfs]
[1214871.317542]  arc_read_done+0x243/0x480 [zfs]
[1214871.317628]  zio_done+0x43f/0x1020 [zfs]
[1214871.317727]  zio_execute+0x99/0xf0 [zfs]
[1214871.317783]  taskq_thread+0x2f7/0x4e0 [spl]
[1214871.317839]  ? wake_up_q+0x80/0x80
[1214871.317925]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214871.317984]  kthread+0x120/0x140
[1214871.318050]  ? task_done+0xb0/0xb0 [spl]
[1214871.318104]  ? kthread_park+0x90/0x90
[1214871.318157]  ret_from_fork+0x35/0x40
[1214871.318215] INFO: task dp_sync_taskq:1157 blocked for more than 120 seconds.
[1214871.318287]       Tainted: P           O      5.4.114-1-pve #1
[1214871.318344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.318417] dp_sync_taskq   D    0  1157      2 0x80004000
[1214871.318473] Call Trace:
[1214871.318525]  __schedule+0x2e6/0x700
[1214871.318577]  schedule+0x33/0xa0
[1214871.318631]  cv_wait_common+0x104/0x130 [spl]
[1214871.318685]  ? wait_woken+0x80/0x80
[1214871.318751]  __cv_wait+0x15/0x20 [spl]
[1214871.318834]  dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1214871.318892]  taskq_thread+0x2f7/0x4e0 [spl]
[1214871.318947]  ? wake_up_q+0x80/0x80
[1214871.319001]  kthread+0x120/0x140
[1214871.319066]  ? task_done+0xb0/0xb0 [spl]
[1214871.319119]  ? kthread_park+0x90/0x90
[1214871.319173]  ret_from_fork+0x35/0x40
[1214871.319228] INFO: task txg_sync:1325 blocked for more than 120 seconds.
[1214871.319286]       Tainted: P           O      5.4.114-1-pve #1
[1214871.319342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.319416] txg_sync        D    0  1325      2 0x80004000
[1214871.319472] Call Trace:
[1214871.319523]  __schedule+0x2e6/0x700
[1214871.319607]  ? vdev_mirror_io_start+0xb0/0x1a0 [zfs]
[1214871.319662]  schedule+0x33/0xa0
[1214871.319715]  schedule_timeout+0x152/0x330
[1214871.319778]  ? __next_timer_interrupt+0xd0/0xd0
[1214871.319833]  io_schedule_timeout+0x1e/0x50
[1214871.319888]  __cv_timedwait_common+0x138/0x170 [spl]
[1214871.319944]  ? wait_woken+0x80/0x80
[1214871.319998]  __cv_timedwait_io+0x19/0x20 [spl]
[1214871.320086]  zio_wait+0x139/0x280 [zfs]
[1214871.320162]  arc_read+0xd68/0x1210 [zfs]
[1214871.320249]  ? arc_hdr_set_compress+0x50/0x50 [zfs]
[1214871.320332]  dsl_scan_visitbp+0x77d/0xcd0 [zfs]
[1214871.320388]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214871.320470]  ? dsl_scan_prefetch+0x222/0x260 [zfs]
[1214871.320552]  dsl_scan_visit_rootbp+0xe8/0x150 [zfs]
[1214871.320648]  dsl_scan_visitds+0x1ae/0x510 [zfs]
[1214871.320790]  ? dnode_rele_and_unlock+0xb6/0xe0 [zfs]
[1214871.320874]  ? dnode_rele+0x3b/0x40 [zfs]
[1214871.320951]  ? dbuf_rele_and_unlock+0x306/0x6a0 [zfs]
[1214871.321039]  ? dsl_dataset_hold_obj+0x68c/0x9e0 [zfs]
[1214871.321130]  ? dbuf_rele+0x3b/0x40 [zfs]
[1214871.321246]  dsl_scan_sync+0x90e/0x1320 [zfs]
[1214871.321337]  spa_sync+0x610/0xfe0 [zfs]
[1214871.321392]  ? mutex_lock+0x12/0x30
[1214871.321478]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
[1214871.321568]  txg_sync_thread+0x2e1/0x4a0 [zfs]
[1214871.321656]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[1214871.321714]  thread_generic_wrapper+0x74/0x90 [spl]
[1214871.321770]  kthread+0x120/0x140
[1214871.321825]  ? __thread_exit+0x20/0x20 [spl]
[1214871.321879]  ? kthread_park+0x90/0x90
[1214871.321933]  ret_from_fork+0x35/0x40
[1214992.151145] INFO: task z_rd_int:1114 blocked for more than 241 seconds.
[1214992.151211]       Tainted: P           O      5.4.114-1-pve #1
[1214992.151268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.151342] z_rd_int        D    0  1114      2 0x80004000
[1214992.151399] Call Trace:
[1214992.151456]  __schedule+0x2e6/0x700
[1214992.151518]  schedule+0x33/0xa0
[1214992.151584]  spl_panic+0xf9/0xfb [spl]
[1214992.151687]  ? zei_add_range+0x140/0x140 [zfs]
[1214992.151743]  ? __wake_up+0x13/0x20
[1214992.151798]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214992.151882]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1214992.151966]  arc_buf_destroy+0x101/0x110 [zfs]
[1214992.152042]  arc_read_done+0x243/0x480 [zfs]
[1214992.152128]  zio_done+0x43f/0x1020 [zfs]
[1214992.152213]  zio_execute+0x99/0xf0 [zfs]
[1214992.152281]  taskq_thread+0x2f7/0x4e0 [spl]
[1214992.152337]  ? wake_up_q+0x80/0x80
[1214992.152423]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214992.152482]  kthread+0x120/0x140
[1214992.152537]  ? task_done+0xb0/0xb0 [spl]
[1214992.152591]  ? kthread_park+0x90/0x90
[1214992.152645]  ret_from_fork+0x35/0x40
[1214992.152702] INFO: task dp_sync_taskq:1157 blocked for more than 241 seconds.
[1214992.152774]       Tainted: P           O      5.4.114-1-pve #1
[1214992.152831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.152905] dp_sync_taskq   D    0  1157      2 0x80004000
[1214992.152968] Call Trace:
[1214992.153019]  __schedule+0x2e6/0x700
[1214992.153072]  schedule+0x33/0xa0
[1214992.153126]  cv_wait_common+0x104/0x130 [spl]
[1214992.153180]  ? wait_woken+0x80/0x80
[1214992.153235]  __cv_wait+0x15/0x20 [spl]
[1214992.153315]  dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1214992.153374]  taskq_thread+0x2f7/0x4e0 [spl]
[1214992.153428]  ? wake_up_q+0x80/0x80
[1214992.153482]  kthread+0x120/0x140
[1214992.153536]  ? task_done+0xb0/0xb0 [spl]
[1214992.153589]  ? kthread_park+0x90/0x90
[1214992.153654]  ret_from_fork+0x35/0x40
[1214992.153709] INFO: task txg_sync:1325 blocked for more than 241 seconds.
[1214992.153767]       Tainted: P           O      5.4.114-1-pve #1
[1214992.153823] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.153897] txg_sync        D    0  1325      2 0x80004000
[1214992.153953] Call Trace:
[1214992.154016]  __schedule+0x2e6/0x700
[1214992.154099]  ? vdev_mirror_io_start+0xb0/0x1a0 [zfs]
[1214992.159404]  schedule+0x33/0xa0
[1214992.159463]  schedule_timeout+0x152/0x330
[1214992.159521]  ? __next_timer_interrupt+0xd0/0xd0
[1214992.159579]  io_schedule_timeout+0x1e/0x50
[1214992.159640]  __cv_timedwait_common+0x138/0x170 [spl]
[1214992.159699]  ? wait_woken+0x80/0x80
[1214992.159754]  __cv_timedwait_io+0x19/0x20 [spl]
[1214992.159842]  zio_wait+0x139/0x280 [zfs]
[1214992.159918]  arc_read+0xd68/0x1210 [zfs]
[1214992.159994]  ? arc_hdr_set_compress+0x50/0x50 [zfs]
[1214992.160079]  dsl_scan_visitbp+0x77d/0xcd0 [zfs]
[1214992.160136]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214992.160219]  ? dsl_scan_prefetch+0x222/0x260 [zfs]
[1214992.160304]  dsl_scan_visit_rootbp+0xe8/0x150 [zfs]
[1214992.160388]  dsl_scan_visitds+0x1ae/0x510 [zfs]
[1214992.160469]  ? dnode_rele_and_unlock+0xb6/0xe0 [zfs]
[1214992.160551]  ? dnode_rele+0x3b/0x40 [zfs]
[1214992.160628]  ? dbuf_rele_and_unlock+0x306/0x6a0 [zfs]
[1214992.160711]  ? dsl_dataset_hold_obj+0x68c/0x9e0 [zfs]
[1214992.160791]  ? dbuf_rele+0x3b/0x40 [zfs]
[1214992.160874]  dsl_scan_sync+0x90e/0x1320 [zfs]
[1214992.160959]  spa_sync+0x610/0xfe0 [zfs]
[1214992.161013]  ? mutex_lock+0x12/0x30
[1214992.161098]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
[1214992.161186]  txg_sync_thread+0x2e1/0x4a0 [zfs]
[1214992.161273]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[1214992.161332]  thread_generic_wrapper+0x74/0x90 [spl]
[1214992.161388]  kthread+0x120/0x140
[1214992.161442]  ? __thread_exit+0x20/0x20 [spl]
[1214992.161497]  ? kthread_park+0x90/0x90
[1214992.161550]  ret_from_fork+0x35/0x40
[1215112.985599] INFO: task z_rd_int:1114 blocked for more than 362 seconds.
[1215112.985661]       Tainted: P           O      5.4.114-1-pve #1
[1215112.985718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215112.985792] z_rd_int        D    0  1114      2 0x80004000
[1215112.985849] Call Trace:
[1215112.985905]  __schedule+0x2e6/0x700
[1215112.985960]  schedule+0x33/0xa0
[1215112.986017]  spl_panic+0xf9/0xfb [spl]
[1215112.986119]  ? zei_add_range+0x140/0x140 [zfs]
[1215112.986175]  ? __wake_up+0x13/0x20
[1215112.986229]  ? __cv_broadcast+0x2a/0x40 [spl]
[1215112.986314]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1215112.986395]  arc_buf_destroy+0x101/0x110 [zfs]
[1215112.986473]  arc_read_done+0x243/0x480 [zfs]
[1215112.986561]  zio_done+0x43f/0x1020 [zfs]
[1215112.986648]  zio_execute+0x99/0xf0 [zfs]
[1215112.986705]  taskq_thread+0x2f7/0x4e0 [spl]
[1215112.986761]  ? wake_up_q+0x80/0x80
[1215112.986847]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1215112.986906]  kthread+0x120/0x140
[1215112.986961]  ? task_done+0xb0/0xb0 [spl]
[1215112.987015]  ? kthread_park+0x90/0x90
[1215112.987069]  ret_from_fork+0x35/0x40
[1215112.987126] INFO: task dp_sync_taskq:1157 blocked for more than 362 seconds.
[1215112.987199]       Tainted: P           O      5.4.114-1-pve #1
[1215112.987255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215112.987329] dp_sync_taskq   D    0  1157      2 0x80004000
[1215112.987385] Call Trace:
[1215112.987436]  __schedule+0x2e6/0x700
[1215112.987490]  schedule+0x33/0xa0
[1215112.987543]  cv_wait_common+0x104/0x130 [spl]
[1215112.987598]  ? wait_woken+0x80/0x80
[1215112.987653]  __cv_wait+0x15/0x20 [spl]
[1215112.987735]  dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1215112.987794]  taskq_thread+0x2f7/0x4e0 [spl]
[1215112.987849]  ? wake_up_q+0x80/0x80
[1215112.987902]  kthread+0x120/0x140
[1215112.987957]  ? task_done+0xb0/0xb0 [spl]
[1215112.988010]  ? kthread_park+0x90/0x90
[1215112.988064]  ret_from_fork+0x35/0x40
[1215233.820082] INFO: task z_rd_int:1114 blocked for more than 483 seconds.
[1215233.820144]       Tainted: P           O      5.4.114-1-pve #1
[1215233.820201] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215233.820275] z_rd_int        D    0  1114      2 0x80004000
[1215233.820332] Call Trace:
[1215233.820389]  __schedule+0x2e6/0x700
[1215233.820443]  schedule+0x33/0xa0
[1215233.820501]  spl_panic+0xf9/0xfb [spl]
[1215233.820603]  ? zei_add_range+0x140/0x140 [zfs]
[1215233.820660]  ? __wake_up+0x13/0x20
[1215233.820714]  ? __cv_broadcast+0x2a/0x40 [spl]
[1215233.820799]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1215233.820876]  arc_buf_destroy+0x101/0x110 [zfs]
[1215233.820953]  arc_read_done+0x243/0x480 [zfs]
[1215233.821041]  zio_done+0x43f/0x1020 [zfs]
[1215233.821129]  zio_execute+0x99/0xf0 [zfs]
[1215233.821185]  taskq_thread+0x2f7/0x4e0 [spl]
[1215233.821241]  ? wake_up_q+0x80/0x80
[1215233.821327]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1215233.821386]  kthread+0x120/0x140
[1215233.821441]  ? task_done+0xb0/0xb0 [spl]
[1215233.821494]  ? kthread_park+0x90/0x90
[1215233.821549]  ret_from_fork+0x35/0x40
[1215233.821606] INFO: task dp_sync_taskq:1157 blocked for more than 483 seconds.
[1215233.821679]       Tainted: P           O      5.4.114-1-pve #1
[1215233.821735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215233.821809] dp_sync_taskq   D    0  1157      2 0x80004000
[1215233.821865] Call Trace:
[1215233.821917]  __schedule+0x2e6/0x700
[1215233.821970]  schedule+0x33/0xa0
[1215233.822024]  cv_wait_common+0x104/0x130 [spl]
[1215233.822078]  ? wait_woken+0x80/0x80
[1215233.822133]  __cv_wait+0x15/0x20 [spl]
[1215233.822216]  dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1215233.822274]  taskq_thread+0x2f7/0x4e0 [spl]
[1215233.822329]  ? wake_up_q+0x80/0x80
[1215233.822383]  kthread+0x120/0x140
[1215233.822437]  ? task_done+0xb0/0xb0 [spl]
[1215233.822491]  ? kthread_park+0x90/0x90
[1215233.822545]  ret_from_fork+0x35/0x40

This is an Intel(R) Xeon(R) CPU E5-1620 v2 (4 core, 8 threads) cpu, 32GB of ECC ram running on a Supermicro X9SRE-3F motherboard.

No other errors logged. The system was responsive due to a separated ext4 root unrelated to zfs. I was unable to unmount, mount, touch anything zfs-related. I had to issue a unmount/sync/reboot magic sysreq to reboot it.

Maybe the reason we're able to trigger this issue is because these are server systems doing heavy lifting on real parallel threads as opposed to "simulated" under a VM?

I hope this helps. Let me know if I can do anything to help diagnose further.

Edit: I'm reading he metrics on the server and can give some more details. The issue started around 2 o'clock in the morning, The total lockup of the server arrived around 4 o'clock, 2 hours later. I had the following symptoms:

All other parameters are ok. This last lock-up seems unrelated to the crypt path above. It looks more like the issues of ram starvation like we used to have back a few years ago.

rincebrain commented 3 years ago

A data point: I left my Xeon D-1540 (8c16t, Broadwell) (Debian buster, 4.19.0-16-amd64, 2.0.3-1~bpo10+1) running a loop for 24 hours of running 16 zfs receives into an encrypted parent of an unencrypted source send (no flags) of a ~39GiB dataset I have, with no VM, just in case it was indeed something that couldn't be triggered for that reason, then destroying them, and so on. No excitement ensued.

Can you share the output of zfs get all [one of these source datasets] and zpool get all [source pool], and the same for the destination pool and dataset? I'd like to eliminate as many possible differences as I can.

cserem commented 3 years ago

I have randomly picked two sources with two datasets, and the destination: zfs get all output It might be worth noting:

single thread, icp_aes_impl icp_gcm_impl on fastest Apr 02 08:16:26 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Apr 14 09:05:41 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Apr 17 04:33:22 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 01 16:02:09 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 06 11:46:07 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 12 05:49:43 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 12 21:07:13 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000

(Since the last two crashes where so close to eachother, I've implemented a watchdog and automatic unlocking, and went full berserk, re-enabling concurrency) 4 thread, icp_aes_impl icp_gcm_impl on fastest: May 21 23:25:21 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 39:46:02 May 24 03:31:06 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 52:05:45

4 threads icp_aes_impl icp_gcm_impl on generic: May 24 14:00:40 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 10:29:34 May 26 03:29:24 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 37:28:44

4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq No crash yet, uptime is 3 days, 7:47. (Not even close to the ~2 week best uptime with single thread, but with 4 threads it is the best so far)

rincebrain commented 3 years ago

This time, I left a loop of 4 sends and rollbacks running, after creating an incremental by (on the same source I used for my prior send loop:

Then zfs send -I snap1 snap2 > incsend Then a loop of while true; for i in $(seq 1 4); do dd if=incsend bs=1M | zfs recv encroot/$i & done; wait; for i in $(seq 1 4);do zfs rollback encroot/$i@snap1;done; done; (boy, my server loved 4 gzip-9 streams at once)

Unfortunately, as you might guess from my not leading with a success statement, it did not reproduce after 24 hours of looping.

I'll probably explore including a zvol next, unless someone else comes up with a brighter idea.

pquan commented 3 years ago

I think I stated before, my setup, but I'll try to explain my setup, in a deeper fashion:

I do not lock easily. Once in a week/10 days.

Now, some meat. In the past days, I detected some corruption on one of the source servers. ZFS could mount the dataset but the data in it was inconsistent (missing most directories under the root). Also I could not unmount it once mounted. I had this issue on two related datasets: Same LXC root and data (1TB mail server data).

Weirdly enough, sending this corrupted dataset to another dataset on the same machine seemed to fix the issue on one of them. The other (the mail spool), I had to roll back to a known good state, then I had to resync all missing mails form an off-site server before putting it back in production (this is a multisite, geo-replicated mail spool setup, so I had some live data to sync in addition).

I was able to delete the corrupted datasets. A full scrub fixed the data errors in the already deleted snapshots.

I know all the above does not give any hints, other than anecdotal evidence of a possible problem. I'm only sure there was corruption to start with and it was silent. It probably got to break some algo somewhere. I'd start with a full scrub on all affected pools.

Here's some get all statistics. Please note, despite the name of the rpool (proxmox root pool) this is not an actual root pool containing the boot files. I just use the same name for historic reasons, but the pool is mounted inside an "ext4" root.

First: sending server pool:

# zpool get all rpool
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           3.38T                          -
rpool  capacity                       49%                            -
rpool  altroot                        -                              default
rpool  health                         ONLINE                         -
rpool  guid                           10861474846250979260           -
rpool  version                        -                              default
rpool  bootfs                         -                              default
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      -                              default
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupratio                     1.00x                          -
rpool  free                           1.71T                          -
rpool  allocated                      1.67T                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  35%                            -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  checkpoint                     -                              -
rpool  load_guid                      8733562770359337761            -
rpool  autotrim                       off                            default
rpool  feature@async_destroy          enabled                        local
rpool  feature@empty_bpobj            active                         local
rpool  feature@lz4_compress           active                         local
rpool  feature@multi_vdev_crash_dump  enabled                        local
rpool  feature@spacemap_histogram     active                         local
rpool  feature@enabled_txg            active                         local
rpool  feature@hole_birth             active                         local
rpool  feature@extensible_dataset     active                         local
rpool  feature@embedded_data          active                         local
rpool  feature@bookmarks              enabled                        local
rpool  feature@filesystem_limits      enabled                        local
rpool  feature@large_blocks           enabled                        local
rpool  feature@large_dnode            active                         local
rpool  feature@sha512                 enabled                        local
rpool  feature@skein                  enabled                        local
rpool  feature@edonr                  enabled                        local
rpool  feature@userobj_accounting     active                         local
rpool  feature@encryption             active                         local
rpool  feature@project_quota          active                         local
rpool  feature@device_removal         enabled                        local
rpool  feature@obsolete_counts        enabled                        local
rpool  feature@zpool_checkpoint       enabled                        local
rpool  feature@spacemap_v2            active                         local
rpool  feature@allocation_classes     enabled                        local
rpool  feature@resilver_defer         enabled                        local
rpool  feature@bookmark_v2            enabled                        local
rpool  feature@redaction_bookmarks    disabled                       local
rpool  feature@redacted_datasets      disabled                       local
rpool  feature@bookmark_written       disabled                       local
rpool  feature@log_spacemap           disabled                       local
rpool  feature@livelist               disabled                       local
rpool  feature@device_rebuild         disabled                       local
rpool  feature@zstd_compress          disabled                       local

Sending server dataset. This is an example. All my datasets are configured the same way (lz4, xattr=sa etc).

# zfs get all rpool/vm/subvol-4242-disk-0
NAME                         PROPERTY              VALUE                         SOURCE
rpool/vm/subvol-4242-disk-0  type                  filesystem                    -
rpool/vm/subvol-4242-disk-0  creation              Wed Mar 10  8:45 2021         -
rpool/vm/subvol-4242-disk-0  used                  11.7G                         -
rpool/vm/subvol-4242-disk-0  available             1.60T                         -
rpool/vm/subvol-4242-disk-0  referenced            5.54G                         -
rpool/vm/subvol-4242-disk-0  compressratio         2.30x                         -
rpool/vm/subvol-4242-disk-0  mounted               yes                           -
rpool/vm/subvol-4242-disk-0  quota                 none                          default
rpool/vm/subvol-4242-disk-0  reservation           none                          default
rpool/vm/subvol-4242-disk-0  recordsize            128K                          default
rpool/vm/subvol-4242-disk-0  mountpoint            /rpool/vm/subvol-4242-disk-0  default
rpool/vm/subvol-4242-disk-0  sharenfs              off                           default
rpool/vm/subvol-4242-disk-0  checksum              on                            default
rpool/vm/subvol-4242-disk-0  compression           lz4                           inherited from rpool
rpool/vm/subvol-4242-disk-0  atime                 off                           inherited from rpool
rpool/vm/subvol-4242-disk-0  devices               on                            default
rpool/vm/subvol-4242-disk-0  exec                  on                            default
rpool/vm/subvol-4242-disk-0  setuid                on                            default
rpool/vm/subvol-4242-disk-0  readonly              off                           default
rpool/vm/subvol-4242-disk-0  zoned                 off                           default
rpool/vm/subvol-4242-disk-0  snapdir               hidden                        default
rpool/vm/subvol-4242-disk-0  aclmode               discard                       default
rpool/vm/subvol-4242-disk-0  aclinherit            restricted                    default
rpool/vm/subvol-4242-disk-0  createtxg             3675833                       -
rpool/vm/subvol-4242-disk-0  canmount              on                            default
rpool/vm/subvol-4242-disk-0  xattr                 sa                            inherited from rpool
rpool/vm/subvol-4242-disk-0  copies                1                             default
rpool/vm/subvol-4242-disk-0  version               5                             -
rpool/vm/subvol-4242-disk-0  utf8only              off                           -
rpool/vm/subvol-4242-disk-0  normalization         none                          -
rpool/vm/subvol-4242-disk-0  casesensitivity       sensitive                     -
rpool/vm/subvol-4242-disk-0  vscan                 off                           default
rpool/vm/subvol-4242-disk-0  nbmand                off                           default
rpool/vm/subvol-4242-disk-0  sharesmb              off                           default
rpool/vm/subvol-4242-disk-0  refquota              none                          default
rpool/vm/subvol-4242-disk-0  refreservation        none                          default
rpool/vm/subvol-4242-disk-0  guid                  13076753325802082524          -
rpool/vm/subvol-4242-disk-0  primarycache          all                           default
rpool/vm/subvol-4242-disk-0  secondarycache        all                           default
rpool/vm/subvol-4242-disk-0  usedbysnapshots       6.13G                         -
rpool/vm/subvol-4242-disk-0  usedbydataset         5.54G                         -
rpool/vm/subvol-4242-disk-0  usedbychildren        0B                            -
rpool/vm/subvol-4242-disk-0  usedbyrefreservation  0B                            -
rpool/vm/subvol-4242-disk-0  logbias               latency                       default
rpool/vm/subvol-4242-disk-0  objsetid              53904                         -
rpool/vm/subvol-4242-disk-0  dedup                 off                           default
rpool/vm/subvol-4242-disk-0  mlslabel              none                          default
rpool/vm/subvol-4242-disk-0  sync                  standard                      inherited from rpool/vm
rpool/vm/subvol-4242-disk-0  dnodesize             auto                          inherited from rpool
rpool/vm/subvol-4242-disk-0  refcompressratio      2.06x                         -
rpool/vm/subvol-4242-disk-0  written               8K                            -
rpool/vm/subvol-4242-disk-0  logicalused           24.6G                         -
rpool/vm/subvol-4242-disk-0  logicalreferenced     10.8G                         -
rpool/vm/subvol-4242-disk-0  volmode               default                       default
rpool/vm/subvol-4242-disk-0  filesystem_limit      none                          default
rpool/vm/subvol-4242-disk-0  snapshot_limit        none                          default
rpool/vm/subvol-4242-disk-0  filesystem_count      none                          default
rpool/vm/subvol-4242-disk-0  snapshot_count        none                          default
rpool/vm/subvol-4242-disk-0  snapdev               hidden                        default
rpool/vm/subvol-4242-disk-0  acltype               posix                         inherited from rpool
rpool/vm/subvol-4242-disk-0  context               none                          default
rpool/vm/subvol-4242-disk-0  fscontext             none                          default
rpool/vm/subvol-4242-disk-0  defcontext            none                          default
rpool/vm/subvol-4242-disk-0  rootcontext           none                          default
rpool/vm/subvol-4242-disk-0  relatime              off                           default
rpool/vm/subvol-4242-disk-0  redundant_metadata    all                           default
rpool/vm/subvol-4242-disk-0  overlay               on                            default
rpool/vm/subvol-4242-disk-0  encryption            aes-256-gcm                   -
rpool/vm/subvol-4242-disk-0  keylocation           none                          default
rpool/vm/subvol-4242-disk-0  keyformat             passphrase                    -
rpool/vm/subvol-4242-disk-0  pbkdf2iters           350000                        -
rpool/vm/subvol-4242-disk-0  encryptionroot        rpool                         -
rpool/vm/subvol-4242-disk-0  keystatus             available                     -
rpool/vm/subvol-4242-disk-0  special_small_blocks  0                             default

Here's a receiver get all :

# zpool get all rpool
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           3.62T                          -
rpool  capacity                       70%                            -
rpool  altroot                        -                              default
rpool  health                         ONLINE                         -
rpool  guid                           16014196203982227614           -
rpool  version                        -                              default
rpool  bootfs                         rpool/ROOT/pve-1               local
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      -                              default
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupratio                     1.00x                          -
rpool  free                           1.07T                          -
rpool  allocated                      2.56T                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  29%                            -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  checkpoint                     -                              -
rpool  load_guid                      9824040886009751730            -
rpool  autotrim                       off                            default
rpool  feature@async_destroy          enabled                        local
rpool  feature@empty_bpobj            active                         local
rpool  feature@lz4_compress           active                         local
rpool  feature@multi_vdev_crash_dump  enabled                        local
rpool  feature@spacemap_histogram     active                         local
rpool  feature@enabled_txg            active                         local
rpool  feature@hole_birth             active                         local
rpool  feature@extensible_dataset     active                         local
rpool  feature@embedded_data          active                         local
rpool  feature@bookmarks              enabled                        local
rpool  feature@filesystem_limits      enabled                        local
rpool  feature@large_blocks           enabled                        local
rpool  feature@large_dnode            active                         local
rpool  feature@sha512                 enabled                        local
rpool  feature@skein                  enabled                        local
rpool  feature@edonr                  enabled                        local
rpool  feature@userobj_accounting     active                         local
rpool  feature@encryption             active                         local
rpool  feature@project_quota          active                         local
rpool  feature@device_removal         enabled                        local
rpool  feature@obsolete_counts        enabled                        local
rpool  feature@zpool_checkpoint       enabled                        local
rpool  feature@spacemap_v2            active                         local
rpool  feature@allocation_classes     enabled                        local
rpool  feature@resilver_defer         enabled                        local
rpool  feature@bookmark_v2            enabled                        local
rpool  feature@redaction_bookmarks    disabled                       local
rpool  feature@redacted_datasets      disabled                       local
rpool  feature@bookmark_written       disabled                       local
rpool  feature@log_spacemap           disabled                       local
rpool  feature@livelist               disabled                       local
rpool  feature@device_rebuild         disabled                       local
rpool  feature@zstd_compress          disabled                       local

And a dataset:

NAME                                       PROPERTY              VALUE                                       SOURCE
rpool/backup/vm/subvol-3140-disk-1  type                  filesystem                                  -
rpool/backup/vm/subvol-3140-disk-1  creation              Thu Apr  8 10:28 2021                       -
rpool/backup/vm/subvol-3140-disk-1  used                  20.6G                                       -
rpool/backup/vm/subvol-3140-disk-1  available             970G                                        -
rpool/backup/vm/subvol-3140-disk-1  referenced            7.66G                                       -
rpool/backup/vm/subvol-3140-disk-1  compressratio         2.13x                                       -
rpool/backup/vm/subvol-3140-disk-1  mounted               no                                          -
rpool/backup/vm/subvol-3140-disk-1  quota                 none                                        default
rpool/backup/vm/subvol-3140-disk-1  reservation           none                                        default
rpool/backup/vm/subvol-3140-disk-1  recordsize            128K                                        default
rpool/backup/vm/subvol-3140-disk-1  mountpoint            /rpool/backup/vm/subvol-3140-disk-1  default
rpool/backup/vm/subvol-3140-disk-1  sharenfs              off                                         default
rpool/backup/vm/subvol-3140-disk-1  checksum              on                                          default
rpool/backup/vm/subvol-3140-disk-1  compression           lz4                                         inherited from rpool
rpool/backup/vm/subvol-3140-disk-1  atime                 off                                         inherited from rpool
rpool/backup/vm/subvol-3140-disk-1  devices               on                                          default
rpool/backup/vm/subvol-3140-disk-1  exec                  on                                          default
rpool/backup/vm/subvol-3140-disk-1  setuid                on                                          default
rpool/backup/vm/subvol-3140-disk-1  readonly              off                                         default
rpool/backup/vm/subvol-3140-disk-1  zoned                 off                                         default
rpool/backup/vm/subvol-3140-disk-1  snapdir               hidden                                      default
rpool/backup/vm/subvol-3140-disk-1  aclmode               discard                                     default
rpool/backup/vm/subvol-3140-disk-1  aclinherit            restricted                                  default
rpool/backup/vm/subvol-3140-disk-1  createtxg             10724                                       -
rpool/backup/vm/subvol-3140-disk-1  canmount              on                                          default
rpool/backup/vm/subvol-3140-disk-1  xattr                 on                                          default
rpool/backup/vm/subvol-3140-disk-1  copies                1                                           default
rpool/backup/vm/subvol-3140-disk-1  version               5                                           -
rpool/backup/vm/subvol-3140-disk-1  utf8only              off                                         -
rpool/backup/vm/subvol-3140-disk-1  normalization         none                                        -
rpool/backup/vm/subvol-3140-disk-1  casesensitivity       sensitive                                   -
rpool/backup/vm/subvol-3140-disk-1  vscan                 off                                         default
rpool/backup/vm/subvol-3140-disk-1  nbmand                off                                         default
rpool/backup/vm/subvol-3140-disk-1  sharesmb              off                                         default
rpool/backup/vm/subvol-3140-disk-1  refquota              none                                        default
rpool/backup/vm/subvol-3140-disk-1  refreservation        none                                        default
rpool/backup/vm/subvol-3140-disk-1  guid                  5702550096622166170                         -
rpool/backup/vm/subvol-3140-disk-1  primarycache          all                                         default
rpool/backup/vm/subvol-3140-disk-1  secondarycache        all                                         default
rpool/backup/vm/subvol-3140-disk-1  usedbysnapshots       12.9G                                       -
rpool/backup/vm/subvol-3140-disk-1  usedbydataset         7.66G                                       -
rpool/backup/vm/subvol-3140-disk-1  usedbychildren        0B                                          -
rpool/backup/vm/subvol-3140-disk-1  usedbyrefreservation  0B                                          -
rpool/backup/vm/subvol-3140-disk-1  logbias               latency                                     default
rpool/backup/vm/subvol-3140-disk-1  objsetid              185                                         -
rpool/backup/vm/subvol-3140-disk-1  dedup                 off                                         default
rpool/backup/vm/subvol-3140-disk-1  mlslabel              none                                        default
rpool/backup/vm/subvol-3140-disk-1  sync                  standard                                    inherited from rpool
rpool/backup/vm/subvol-3140-disk-1  dnodesize             legacy                                      default
rpool/backup/vm/subvol-3140-disk-1  refcompressratio      1.19x                                       -
rpool/backup/vm/subvol-3140-disk-1  written               0                                           -
rpool/backup/vm/subvol-3140-disk-1  logicalused           43.2G                                       -
rpool/backup/vm/subvol-3140-disk-1  logicalreferenced     8.96G                                       -
rpool/backup/vm/subvol-3140-disk-1  volmode               default                                     default
rpool/backup/vm/subvol-3140-disk-1  filesystem_limit      none                                        default
rpool/backup/vm/subvol-3140-disk-1  snapshot_limit        none                                        default
rpool/backup/vm/subvol-3140-disk-1  filesystem_count      none                                        default
rpool/backup/vm/subvol-3140-disk-1  snapshot_count        none                                        default
rpool/backup/vm/subvol-3140-disk-1  snapdev               hidden                                      default
rpool/backup/vm/subvol-3140-disk-1  acltype               off                                         default
rpool/backup/vm/subvol-3140-disk-1  context               none                                        default
rpool/backup/vm/subvol-3140-disk-1  fscontext             none                                        default
rpool/backup/vm/subvol-3140-disk-1  defcontext            none                                        default
rpool/backup/vm/subvol-3140-disk-1  rootcontext           none                                        default
rpool/backup/vm/subvol-3140-disk-1  relatime              off                                         default
rpool/backup/vm/subvol-3140-disk-1  redundant_metadata    all                                         default
rpool/backup/vm/subvol-3140-disk-1  overlay               on                                          default
rpool/backup/vm/subvol-3140-disk-1  encryption            aes-256-gcm                                 -
rpool/backup/vm/subvol-3140-disk-1  keylocation           none                                        default
rpool/backup/vm/subvol-3140-disk-1  keyformat             passphrase                                  -
rpool/backup/vm/subvol-3140-disk-1  pbkdf2iters           350000                                      -
rpool/backup/vm/subvol-3140-disk-1  encryptionroot        rpool/backup                          -
rpool/backup/vm/subvol-3140-disk-1  keystatus             available                                   -
rpool/backup/vm/subvol-3140-disk-1  special_small_blocks  0                                           default

And a last thing I almost forgot: On some of the systems I'm having a misaligned write error. In the past this seemed related to hardware problems with bad sata cables and/or controller issues. I'm getting this issue on servers with onboard sata ports with very short cables. Again all this started after the upgrade to 0.8.x and most prominently I'm finding it on v2.0.4. This happens on both SSD and SATA disks. I'll reduce the speed on the sata disks to 1.5GBit as a check but doing so on the SSDs would not be feasible.

Sorry to drop all this meat on the table. I hope I'm not causing confusion. I've not been able to isolate the issue in a better way.

cserem commented 3 years ago

Update: 4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq crashed after: 5 days, 14:47 Trying again with the above parameters.

I was able to get the zfs related processes:

root     20454  0.0  0.0   2384   700 ?        S    18:25   0:00 sh -c ssh  -o ServerAliveCountMax=3 -o ServerAliveInterval=30   -S /tmp/syncoid-root@source1-1622478259 root@source1 ' zfs send  -I '"'"'rpool/rpool-ssd-backup/subvol-120-disk-2'"'"'@'"'"'2021-05-31-180000'"'"' '"'"'rpool/rpool-ssd-backup/subvol-120-disk-2'"'"'@'"'"'2021-05-31-181500'"'"' | pigz -3 | mbuffer  -q -s 128k -m 16M 2>/dev/null' | mbuffer  -q -s 128k -m 16M 2>/dev/null | pigz -dc | pv -p -t -e -r -b -s 52098832 |  zfs receive  -s -F 'rpool/backups/SOURCE1/rpool/rpool-ssd-backup/subvol-120-disk-2' 2>&1
root     20459  0.0  0.0  15708  4172 ?        S    18:25   0:00 zfs receive -s -F rpool/backups/SOURCE1/rpool/rpool-ssd-backup/subvol-120-disk-2
root     28671  0.0  0.0   2384   764 ?        S    18:25   0:00 sh -c ssh  -o ServerAliveCountMax=3 -o ServerAliveInterval=30   -S /tmp/syncoid-root@SOURCE2-1622478239 root@SOURCE2 ' zfs send  -I '"'"'rpool/data/subvol-117-disk-0'"'"'@'"'"'syncoid_pve_2021-05-31:18:12:10-GMT02:00'"'"' '"'"'rpool/data/subvol-117-disk-0'"'"'@'"'"'syncoid_pve_2021-05-31:18:25:51-GMT02:00'"'"' | pigz -3 | mbuffer  -q -s 128k -m 16M 2>/dev/null' | mbuffer  -q -s 128k -m 16M 2>/dev/null | pigz -dc | pv -p -t -e -r -b -s 495575408 |  zfs receive  -s -F 'rpool/backups/SOURCE2/rpool/data/subvol-117-disk-0' 2>&1
root     28680  0.3  0.0  15708  4424 ?        D    18:25   0:00 zfs receive -s -F rpool/backups/SOURCE2/rpool/data/subvol-117-disk-0
root     31831  0.0  0.0   2384   760 ?        S    18:26   0:00 sh -c zfs rollback -R 'rpool/backups/SOURCE3/rpool-ssd/data/subvol-104-disk-1'@'2021-05-31-180000'
root     31832  0.0  0.0  15704  4144 ?        D    18:26   0:00 zfs rollback -R rpool/backups/SOURCE3/rpool-ssd/data/subvol-104-disk-1@2021-05-31-180000
root     31863  0.0  0.0   2384   764 ?        S    18:26   0:00 sh -c zfs destroy 'rpool/backups/SOURCE4/rpool/data/subvol-106-disk-2'@syncoid_pve_2021-05-31:18:12:03-GMT02:00
root     31865  0.0  0.0  15708  4012 ?        S    18:26   0:00 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-31:18:12:03-GMT02:00

Since the destroys (thoese are some of the receives where I deliberately make a snapshot for syncing, and they are being cleaned up after the transfer) have the highest pid, perhaps the crash is related to that? Then again the rollback is 'D'... I will hopefully have more ps outputs in the future.

cserem commented 3 years ago

4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq crashed after: 1 day, 6:44 I am switching back to default (fastest)

Unfortunately i couldn't capture a process list this time.

Here are some events from previous crashes up to the point the system is rebooted and the pool is imported:

zpool history ``` 2021-05-24.03:30:46 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-24:03:15:33-GMT02:00 2021-05-24.03:30:51 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-24:03:15:50-GMT02:00 2021-05-24.03:30:52 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-24:03:15:46-GMT02:00 2021-05-24.03:30:55 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2 2021-05-24.03:30:57 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-24:03:15:50-GMT02:00 2021-05-24.03:31:00 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1 2021-05-24.03:31:02 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-24:03:15:46-GMT02:00 2021-05-24.03:31:03 zfs receive -s -F rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-2 2021-05-24.03:31:04 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-5 2021-05-24.03:31:04 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/ROOT/pve-1@2021-05-24-030000 >>>>May 24 03:31:06 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 2021-05-24.07:11:22 zpool import -N rpool -- 2021-05-24.14:00:33 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0 2021-05-24.14:00:34 zfs receive -F rpool/backups/SOURCE5/rpool/data/vm-100-disk-2 2021-05-24.14:00:34 zfs receive -s -F rpool/backups/SOURCE4/rpool/ROOT/pve-1 2021-05-24.14:00:35 zfs destroy rpool/backups/SOURCE5/rpool/data/vm-100-disk-2@syncoid_pve_2021-05-24:13:51:04-GMT02:00 2021-05-24.14:00:35 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-0@2021-05-24-134500 2021-05-24.14:00:36 zfs receive -s -F rpool/backups/SOURCE6/rpool/data/vm-101-disk-0 2021-05-24.14:00:37 zfs destroy rpool/backups/SOURCE6/rpool/data/vm-101-disk-0@syncoid_pve_2021-05-24:13:51:14-GMT02:00 2021-05-24.14:00:37 zfs destroy rpool/backups/SOURCE4/rpool/ROOT/pve-1@syncoid_pve_2021-05-24:13:50:54-GMT02:00 2021-05-24.14:00:38 zfs rollback -R rpool/backups/SOURCE6/rpool/data/vm-102-disk-0@syncoid_pve_2021-05-24:13:51:24-GMT02:00 2021-05-24.14:00:39 zfs rollback -R rpool/backups/SOURCE3/rpool/ROOT/pve-1@2021-05-24-134500 >>>>May 24 14:00:40 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 2021-05-24.16:03:19 zpool import -N rpool -- 2021-05-26.03:29:07 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-101-disk-2 2021-05-26.03:29:10 zfs rollback -R rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-1@2021-05-26-030000 2021-05-26.03:29:10 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-101-disk-2@syncoid_pve_2021-05-26:03:14:09-GMT02:00 2021-05-26.03:29:11 zfs receive -s -F rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-1 2021-05-26.03:29:13 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1@syncoid_pve_2021-05-26:03:14:20-GMT02:00 2021-05-26.03:29:14 zfs rollback -R rpool/backups/SOURCE8/rpool/ROOT/pve-1@syncoid_pve_2021-05-26:03:14:38-GMT02:00 2021-05-26.03:29:16 zfs rollback -R rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-2@2021-05-26-030000 2021-05-26.03:29:20 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1 2021-05-26.03:29:22 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1@syncoid_pve_2021-05-26:03:14:20-GMT02:00 >>>>May 26 03:29:24 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 2021-05-26.03:29:26 zfs receive -s -F rpool/backups/SOURCE8/rpool/ROOT/pve-1 2021-05-26.03:40:17 zpool import -N rpool -- 2021-05-31.18:25:37 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-31:18:11:48-GMT02:00 2021-05-31.18:25:38 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-1@2021-05-31-180000 2021-05-31.18:25:42 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1 2021-05-31.18:25:42 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-0 2021-05-31.18:25:43 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-1 2021-05-31.18:25:43 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-31:18:11:48-GMT02:00 2021-05-31.18:25:44 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-0@syncoid_pve_2021-05-31:18:11:39-GMT02:00 >>>>May 31 18:25:44 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 2021-05-31.18:25:45 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-2@2021-05-31-180000 2021-05-31.18:25:47 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-31:18:12:00-GMT02:00 2021-05-31.18:25:47 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1@syncoid_pve_2021-05-31:18:11:51-GMT02:00 2021-05-31.18:25:48 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-2 2021-05-31.18:25:49 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-0@2021-05-31-180000 2021-05-31.18:25:49 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1 2021-05-31.18:25:50 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-0 2021-05-31.18:25:51 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-31:18:12:00-GMT02:00 2021-05-31.18:25:51 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-1@2021-05-31-180000 2021-05-31.18:25:52 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-1 2021-05-31.18:25:54 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-117-disk-0@syncoid_pve_2021-05-31:18:12:10-GMT02:00 2021-05-31.18:25:54 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1 2021-05-31.18:25:55 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-2@2021-05-31-180000 2021-05-31.18:26:04 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1@syncoid_pve_2021-05-31:18:11:51-GMT02:00 2021-05-31.18:26:05 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-31:18:12:03-GMT02:00 2021-05-31.18:26:09 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-2 2021-05-31.18:26:11 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-104-disk-0@2021-05-31-180000 2021-05-31.18:26:12 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-104-disk-0 2021-05-31.18:37:02 zpool import -N rpool -- 2021-06-02.01:07:35 zfs destroy rpool/backups/SOURCE4/rpool/ROOT@syncoid_pve_2021-06-02:00:39:31-GMT02:00 2021-06-02.01:07:37 zfs receive -F rpool/backups/SOURCE5/rpool/data/subvol-101-disk-1 2021-06-02.01:07:37 zfs destroy rpool/backups/SOURCE5/rpool/data/subvol-101-disk-1@syncoid_pve_2021-06-02:00:39:37-GMT02:00 2021-06-02.01:07:39 zfs rollback -R rpool/backups/SOURCE4/rpool/ROOT/pve-1@syncoid_pve_2021-06-02:00:39:44-GMT02:00 2021-06-02.01:07:41 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/subvol-103-disk-0 2021-06-02.01:07:42 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0@2021-06-02-003000 2021-06-02.01:07:45 zfs rollback -R rpool/backups/SOURCE6/rpool/data/vm-100-disk-0@syncoid_pve_2021-06-02:00:40:18-GMT02:00 2021-06-02.01:07:45 zfs rollback -R rpool/backups/SOURCE5/rpool/data/vm-100-disk-1@syncoid_pve_2021-06-02:00:39:46-GMT02:00 >>>>Jun 02 01:07:46 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 2021-06-02.01:07:48 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0 2021-06-02.01:07:49 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-0@2021-06-02-003000 2021-06-02.01:21:17 zpool import -N rpool ```

Aren't there some commands that I could run after the exception, to gather more useful data?

pquan commented 3 years ago

It seems that "reflowing" some of the problematic datasets on different mount point in the same pool has had some beneficial effects in my case. I mean just moving datasets with syncoid from rpool/mount1/dataset[x] -> rpool/mount2/dataset[x] and deleting some older snapshots and scrubbing the pool. I'm moving said datasets back to their original position now to see if the fix is permanent. This is all empiric work, but it would support the theory of some type of corruption occurring undetected. It does not seem to affect data tough, just metadata (I hope).

BTW: zfs send/receive is so painfully slow. 2x2 mirrors spinning disk often manage only 20-30MB on average, with lots of time spent under the 5-10MB region.

rincebrain commented 3 years ago

You might benefit, if you're not using it, from zfs send -c, since you're using gzip-9, and I would find it difficult to overstate how slow it is.

edit: I'm terribly sorry, I was thinking of someone else in the thread who is using gzip-9. My suggestion remains as stated, but assuming its properties work for you, you might want to explore using -w to avoid paying the de+re-encryption overhead.

Because, for example, I get much faster throughput than that on my single-disk pools sans encryption.

You might also want to use e.g. perf top and see if it is indeed spending a whole bunch of CPU cycles in the encryption code, or somewhere else.

pquan commented 3 years ago

edit: I'm terribly sorry, I was thinking of someone else in the thread who is using gzip-9. My suggestion remains as stated, but assuming its properties work for you, you might want to explore using -w to avoid paying the de+re-encryption overhead.

Because, for example, I get much faster throughput than that on my single-disk pools sans encryption.

You might also want to use e.g. perf top and see if it is indeed spending a whole bunch of CPU cycles in the encryption code, or somewhere else.

I'm spending ~50% of cpu time in encryption/decryption and lz4 compression/decompression. However, running with -w increases the speed only marginally. Guess I'm IO bound.

pquan commented 3 years ago

While doing the above test, I managed to lock up my server, transferring the single dataset from one place to another. Below is the stack trace. I was running a zfs send/receive with -w, so no encryption/decryption was taking place. There were some snapshotting, listing, deleting mixed during the transfer, for sure. Other than that, just 90-100% io usage for a few hours. I might want to mention that the dataset is around 800GB of size. It contains tens of millions of small files (maildir storage) and about 100 snapshots.

Am I pushing ZFS too much ?

[822539.204672] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.3.V1 02                                                         /04/2020
[822539.204756] Call Trace:
[822539.204813]  dump_stack+0x6d/0x8b
[822539.204874]  spl_dumpstack+0x29/0x2b [spl]
[822539.204931]  spl_panic+0xd3/0xfb [spl]
[822539.205027]  ? zfs_btree_insert_leaf_impl.isra.9+0x4c/0x60 [zfs]
[822539.205121]  ? abd_iter_advance+0x2d/0x60 [zfs]
[822539.205200]  ? zfs_btree_find_parent_idx+0x80/0xd0 [zfs]
[822539.205291]  ? zfs_btree_find+0x160/0x2d0 [zfs]
[822539.205381]  range_tree_remove_impl+0x67e/0x1000 [zfs]
[822539.205469]  ? metaslab_df_alloc+0xc6/0x5e0 [zfs]
[822539.205527]  ? ktime_get_raw_ts64+0x38/0xd0
[822539.205614]  ? multilist_sublist_unlock+0x2b/0x40 [zfs]
[822539.205701]  range_tree_remove+0x10/0x20 [zfs]
[822539.205787]  metaslab_alloc_dva+0x264/0x1170 [zfs]
[822539.205873]  metaslab_alloc+0xb4/0x240 [zfs]
[822539.205963]  zio_dva_allocate+0xd5/0x820 [zfs]
[822539.206019]  ? _cond_resched+0x19/0x30
[822539.206074]  ? mutex_lock+0x12/0x30
[822539.206169]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822539.206230]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822539.206326]  zio_execute+0x99/0xf0 [zfs]
[822539.206382]  taskq_thread+0x2f7/0x4e0 [spl]
[822539.206438]  ? wake_up_q+0x80/0x80
[822539.206523]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822539.206584]  kthread+0x120/0x140
[822539.206640]  ? task_done+0xb0/0xb0 [spl]
[822539.206694]  ? kthread_park+0x90/0x90
[822539.206747]  ret_from_fork+0x35/0x40
[822765.000993] INFO: task z_wr_iss:761 blocked for more than 120 seconds.
[822765.001055]       Tainted: P           O      5.4.114-1-pve #1
[822765.001111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.001185] z_wr_iss        D    0   761      2 0x80004000
[822765.001242] Call Trace:
[822765.001299]  __schedule+0x2e6/0x700
[822765.001352]  schedule+0x33/0xa0
[822765.001404]  schedule_preempt_disabled+0xe/0x10
[822765.001459]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.001513]  __mutex_lock_slowpath+0x13/0x20
[822765.001567]  mutex_lock+0x2c/0x30
[822765.001672]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.001759]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.001847]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.001934]  ? zio_crypt_do_indirect_mac_checksum_abd+0x5d/0x70 [zfs]
[822765.001993]  ? _cond_resched+0x19/0x30
[822765.002046]  ? mutex_lock+0x12/0x30
[822765.002129]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.002190]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.002279]  zio_execute+0x99/0xf0 [zfs]
[822765.002335]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.002391]  ? wake_up_q+0x80/0x80
[822765.002477]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.002536]  kthread+0x120/0x140
[822765.002590]  ? task_done+0xb0/0xb0 [spl]
[822765.002644]  ? kthread_park+0x90/0x90
[822765.002697]  ret_from_fork+0x35/0x40
[822765.002751] INFO: task z_wr_iss:762 blocked for more than 120 seconds.
[822765.002809]       Tainted: P           O      5.4.114-1-pve #1
[822765.002865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.002938] z_wr_iss        D    0   762      2 0x80004000
[822765.002995] Call Trace:
[822765.003046]  __schedule+0x2e6/0x700
[822765.003099]  schedule+0x33/0xa0
[822765.003151]  schedule_preempt_disabled+0xe/0x10
[822765.003205]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.003262]  __mutex_lock_slowpath+0x13/0x20
[822765.003317]  mutex_lock+0x2c/0x30
[822765.003400]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.003485]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.003573]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.003628]  ? _cond_resched+0x19/0x30
[822765.003681]  ? mutex_lock+0x12/0x30
[822765.003763]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.003822]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.003911]  zio_execute+0x99/0xf0 [zfs]
[822765.003966]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.004021]  ? wake_up_q+0x80/0x80
[822765.004107]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.004165]  kthread+0x120/0x140
[822765.004219]  ? task_done+0xb0/0xb0 [spl]
[822765.004273]  ? kthread_park+0x90/0x90
[822765.004326]  ret_from_fork+0x35/0x40
[822765.004380] INFO: task z_wr_iss:763 blocked for more than 120 seconds.
[822765.004437]       Tainted: P           O      5.4.114-1-pve #1
[822765.004493] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.004568] z_wr_iss        D    0   763      2 0x80004000
[822765.004624] Call Trace:
[822765.004676]  __schedule+0x2e6/0x700
[822765.004728]  schedule+0x33/0xa0
[822765.004781]  schedule_preempt_disabled+0xe/0x10
[822765.004836]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.004890]  __mutex_lock_slowpath+0x13/0x20
[822765.004944]  mutex_lock+0x2c/0x30
[822765.005103]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.005206]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.005305]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.005406]  ? zio_crypt_do_indirect_mac_checksum_abd+0x5d/0x70 [zfs]
[822765.005468]  ? _cond_resched+0x19/0x30
[822765.005525]  ? mutex_lock+0x12/0x30
[822765.005623]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.005688]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.005791]  zio_execute+0x99/0xf0 [zfs]
[822765.005852]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.005910]  ? wake_up_q+0x80/0x80
[822765.006010]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.006072]  kthread+0x120/0x140
[822765.006132]  ? task_done+0xb0/0xb0 [spl]
[822765.006188]  ? kthread_park+0x90/0x90
[822765.006245]  ret_from_fork+0x35/0x40
[822765.006302] INFO: task z_wr_iss:764 blocked for more than 120 seconds.
[822765.006363]       Tainted: P           O      5.4.114-1-pve #1
[822765.006422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.006498] z_wr_iss        D    0   764      2 0x80004000
[822765.006558] Call Trace:
[822765.006613]  __schedule+0x2e6/0x700
[822765.006670]  schedule+0x33/0xa0
[822765.006726]  schedule_preempt_disabled+0xe/0x10
[822765.006784]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.006843]  __mutex_lock_slowpath+0x13/0x20
[822765.006900]  mutex_lock+0x2c/0x30
[822765.006996]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.007097]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.007201]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.007259]  ? _cond_resched+0x19/0x30
[822765.007316]  ? mutex_lock+0x12/0x30
[822765.007412]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.007475]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.007579]  zio_execute+0x99/0xf0 [zfs]
[822765.012866]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.012929]  ? wake_up_q+0x80/0x80
[822765.013056]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.013118]  kthread+0x120/0x140
[822765.013177]  ? task_done+0xb0/0xb0 [spl]
[822765.013234]  ? kthread_park+0x90/0x90
[822765.013292]  ret_from_fork+0x35/0x40
[822765.013346] INFO: task z_wr_iss:765 blocked for more than 120 seconds.
[822765.013404]       Tainted: P           O      5.4.114-1-pve #1
[822765.013460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.013534] z_wr_iss        D    0   765      2 0x80004000
[822765.013590] Call Trace:
[822765.013642]  __schedule+0x2e6/0x700
[822765.013695]  schedule+0x33/0xa0
[822765.013747]  schedule_preempt_disabled+0xe/0x10
[822765.013802]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.013856]  __mutex_lock_slowpath+0x13/0x20
[822765.013910]  mutex_lock+0x2c/0x30
[822765.013994]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.014081]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.014169]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.014224]  ? _cond_resched+0x19/0x30
[822765.014277]  ? mutex_lock+0x12/0x30
[822765.014360]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.014419]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.014508]  zio_execute+0x99/0xf0 [zfs]
[822765.014563]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.014618]  ? wake_up_q+0x80/0x80
[822765.014704]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.014763]  kthread+0x120/0x140
[822765.014817]  ? task_done+0xb0/0xb0 [spl]
[822765.014871]  ? kthread_park+0x90/0x90
[822765.014924]  ret_from_fork+0x35/0x40
[822765.014978] INFO: task z_wr_iss:766 blocked for more than 120 seconds.
[822765.015036]       Tainted: P           O      5.4.114-1-pve #1
[822765.015092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.015165] z_wr_iss        D    0   766      2 0x80004000
[822765.015222] Call Trace:
[822765.015273]  __schedule+0x2e6/0x700
[822765.015326]  schedule+0x33/0xa0
[822765.015380]  spl_panic+0xf9/0xfb [spl]
[822765.015467]  ? abd_iter_advance+0x2d/0x60 [zfs]
[822765.015544]  ? zfs_btree_find_parent_idx+0x80/0xd0 [zfs]
[822765.015624]  ? zfs_btree_find+0x160/0x2d0 [zfs]
[822765.015709]  range_tree_remove_impl+0x67e/0x1000 [zfs]
[822765.015794]  ? metaslab_df_alloc+0xc6/0x5e0 [zfs]
[822765.015850]  ? ktime_get_raw_ts64+0x38/0xd0
[822765.015934]  ? multilist_sublist_unlock+0x2b/0x40 [zfs]
[822765.016020]  range_tree_remove+0x10/0x20 [zfs]
[822765.016104]  metaslab_alloc_dva+0x264/0x1170 [zfs]
[822765.016201]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.016289]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.016345]  ? _cond_resched+0x19/0x30
[822765.016398]  ? mutex_lock+0x12/0x30
[822765.016485]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.016548]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.016638]  zio_execute+0x99/0xf0 [zfs]
[822765.016694]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.016749]  ? wake_up_q+0x80/0x80
[822765.016835]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.016893]  kthread+0x120/0x140
[822765.016948]  ? task_done+0xb0/0xb0 [spl]
[822765.017037]  ? kthread_park+0x90/0x90
[822765.017096]  ret_from_fork+0x35/0x40
[822765.017156] INFO: task z_wr_int:769 blocked for more than 120 seconds.
[822765.017216]       Tainted: P           O      5.4.114-1-pve #1
[822765.017282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.017359] z_wr_int        D    0   769      2 0x80004000
[822765.017420] Call Trace:
[822765.017476]  __schedule+0x2e6/0x700
[822765.017533]  schedule+0x33/0xa0
[822765.017589]  schedule_preempt_disabled+0xe/0x10
[822765.017647]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.017706]  __mutex_lock_slowpath+0x13/0x20
[822765.017763]  mutex_lock+0x2c/0x30
[822765.017869]  metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.017970]  metaslab_free_impl+0xad/0xe0 [zfs]
[822765.018070]  metaslab_free_dva+0x5e/0x80 [zfs]
[822765.018170]  metaslab_free+0x196/0x1c0 [zfs]
[822765.018270]  zio_free_sync+0xe4/0x100 [zfs]
[822765.018374]  zio_free+0xb9/0xf0 [zfs]
[822765.018472]  dsl_free+0x11/0x20 [zfs]
[822765.018572]  dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.018665]  dbuf_write_done+0x1ff/0x220 [zfs]
[822765.018752]  arc_write_done+0x8f/0x410 [zfs]
[822765.018855]  zio_done+0x43f/0x1020 [zfs]
[822765.018958]  zio_execute+0x99/0xf0 [zfs]
[822765.019019]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.019078]  ? wake_up_q+0x80/0x80
[822765.019178]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.019240]  kthread+0x120/0x140
[822765.019299]  ? task_done+0xb0/0xb0 [spl]
[822765.019356]  ? kthread_park+0x90/0x90
[822765.019412]  ret_from_fork+0x35/0x40
[822765.019470] INFO: task z_wr_int:772 blocked for more than 120 seconds.
[822765.019530]       Tainted: P           O      5.4.114-1-pve #1
[822765.019589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.019666] z_wr_int        D    0   772      2 0x80004000
[822765.019725] Call Trace:
[822765.019781]  __schedule+0x2e6/0x700
[822765.019838]  schedule+0x33/0xa0
[822765.019893]  schedule_preempt_disabled+0xe/0x10
[822765.019952]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.020011]  __mutex_lock_slowpath+0x13/0x20
[822765.020068]  mutex_lock+0x2c/0x30
[822765.020163]  metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.020264]  metaslab_free_impl+0xad/0xe0 [zfs]
[822765.020365]  metaslab_free_dva+0x5e/0x80 [zfs]
[822765.020464]  metaslab_free+0x196/0x1c0 [zfs]
[822765.020567]  zio_free_sync+0xe4/0x100 [zfs]
[822765.020671]  zio_free+0xb9/0xf0 [zfs]
[822765.020768]  dsl_free+0x11/0x20 [zfs]
[822765.020863]  dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.020956]  dbuf_write_done+0x1ff/0x220 [zfs]
[822765.021076]  arc_write_done+0x8f/0x410 [zfs]
[822765.021180]  zio_done+0x43f/0x1020 [zfs]
[822765.021283]  zio_execute+0x99/0xf0 [zfs]
[822765.021344]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.021403]  ? wake_up_q+0x80/0x80
[822765.021504]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.021565]  kthread+0x120/0x140
[822765.021624]  ? task_done+0xb0/0xb0 [spl]
[822765.021681]  ? kthread_park+0x90/0x90
[822765.021738]  ret_from_fork+0x35/0x40
[822765.021797] INFO: task z_wr_int:775 blocked for more than 120 seconds.
[822765.021858]       Tainted: P           O      5.4.114-1-pve #1
[822765.021917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.021993] z_wr_int        D    0   775      2 0x80004000
[822765.022053] Call Trace:
[822765.022108]  __schedule+0x2e6/0x700
[822765.022164]  schedule+0x33/0xa0
[822765.022220]  schedule_preempt_disabled+0xe/0x10
[822765.022278]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.022337]  __mutex_lock_slowpath+0x13/0x20
[822765.022394]  mutex_lock+0x2c/0x30
[822765.022492]  metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.022590]  metaslab_free_impl+0xad/0xe0 [zfs]
[822765.022687]  metaslab_free_dva+0x5e/0x80 [zfs]
[822765.022787]  metaslab_free+0x196/0x1c0 [zfs]
[822765.022890]  zio_free_sync+0xe4/0x100 [zfs]
[822765.022993]  zio_free+0xb9/0xf0 [zfs]
[822765.023090]  dsl_free+0x11/0x20 [zfs]
[822765.023182]  dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.023276]  dbuf_write_done+0x1ff/0x220 [zfs]
[822765.023366]  arc_write_done+0x8f/0x410 [zfs]
[822765.023469]  zio_done+0x43f/0x1020 [zfs]
[822765.023572]  zio_execute+0x99/0xf0 [zfs]
[822765.023632]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.023691]  ? wake_up_q+0x80/0x80
[822765.023791]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.023853]  kthread+0x120/0x140
[822765.023913]  ? task_done+0xb0/0xb0 [spl]
[822765.023969]  ? kthread_park+0x90/0x90
[822765.024026]  ret_from_fork+0x35/0x40
[822765.024089] INFO: task txg_sync:1350 blocked for more than 120 seconds.
[822765.024150]       Tainted: P           O      5.4.114-1-pve #1
[822765.024209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.024286] txg_sync        D    0  1350      2 0x80004000
[822765.024346] Call Trace:
[822765.024401]  __schedule+0x2e6/0x700
[822765.024458]  schedule+0x33/0xa0
[822765.024514]  schedule_timeout+0x152/0x330
[822765.024573]  ? __next_timer_interrupt+0xd0/0xd0
[822765.024631]  io_schedule_timeout+0x1e/0x50
[822765.024690]  __cv_timedwait_common+0x138/0x170 [spl]
[822765.024750]  ? wait_woken+0x80/0x80
[822765.024808]  __cv_timedwait_io+0x19/0x20 [spl]
[822765.024912]  zio_wait+0x139/0x280 [zfs]
[822765.025009]  ? _cond_resched+0x19/0x30
[822765.025109]  dsl_pool_sync+0xdc/0x510 [zfs]
[822765.025211]  spa_sync+0x5a4/0xfe0 [zfs]
[822765.025268]  ? mutex_lock+0x12/0x30
[822765.025368]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
[822765.025472]  txg_sync_thread+0x2e1/0x4a0 [zfs]
[822765.030783]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[822765.030846]  thread_generic_wrapper+0x74/0x90 [spl]
[822765.030905]  kthread+0x120/0x140
[822765.030960]  ? __thread_exit+0x20/0x20 [spl]
[822765.031014]  ? kthread_park+0x90/0x90
[822765.031068]  ret_from_fork+0x35/0x40
rincebrain commented 3 years ago

In general: no, you're not pushing it too much. I was passing around a 2 TB snapshot stream while I was trying to reproduce this with a bunch of gzip-9 sends on loop.

It would be useful if you could share the lines immediately preceding what you pasted, as they should include the specific panic message (which I'm guessing is different, since the stack is).

I have been periodically spending time considering what I'd like to know for this and how to get it; I know the former but not the latter. (What I'd like to know, at the moment, is where the NULLs that are being dereferenced in the initial panics are coming from; short of somewhat invasive patching, I don't see a good way to answer that question.)

Of course, this isn't helped by my not being at all familiar with these codepaths. Someone more familiar might be able to pull out an answer much quicker.

Here's another question - did you turn on encryption while your ZFS version was 0.8.X? I just noticed both your sending and receiving pool are on the feature flags from there, and am wondering if the reason I can't reproduce it is that my pool only had encryption turned on recently...

(One more experiment you could play with, if you wish, would be setting zfs_abd_scatter_enabled to 0 - it may help or hurt performance to doso, but it's just a thought; it's possible it won't help at all.)

pquan commented 3 years ago

This time there was no panic. I only had a lot of locked processes in D state. The only panic was from a mysqld process due to "disk starvation" (there's a bug there for sure, but not relevant here). It came after the above locked process warnings.

Perhaps one detail that is missing until now is that syncoid uses resumable transfers. Here's the process with full cmd line:

root     10967  0.0  0.0   2384   764 pts/8    S+   11:26   0:00 sh -c zfs send -w  -I 'rpool/newdataset/subvol-1002-disk-1'@'autosnap_2021-03-01_00:00:01_monthly' 'rpool/newdataset/subvol-1002-disk-1'@'syncoid_hostname_2021-06-06:07:42:37' | mbuffer  -q -s 128k -m 16M 2>/dev/null | pv -p -t -e -r -b -s 315006434464 |  zfs receive  -s -F 'rpool/olddataset/subvol-1002-disk-1'
root     10968  0.8  0.0  14740  5904 pts/8    S+   11:26   1:38 zfs send -w -I rpool/newdataset/subvol-1002-disk-1@autosnap_2021-03-01_00:00:01_monthly rpool/newdataset/subvol-1002-disk-1@syncoid_hostname_2021-06-06:07:42:37
root     10971  1.0  0.0  13948  5300 pts/8    S+   11:26   2:03 zfs receive -s -F rpool/olddataset/subvol-1002-disk-1

Other than that, I can paste you the full list of kernel processes here: https://pastebin.com/j8MygKX8 The other processes I filtered are a bunch of user space cron's locked in D state, a lot of defunct daemons (sshd primarily).

Here's another question - did you turn on encryption while your ZFS version was 0.8.X? I just noticed both your sending and receiving pool are on the feature flags from there, and am wondering if the reason I can't reproduce it is that my pool only had encryption turned on recently...

If memory serves me, this server was installed about 10 months ago, so it must have started on ZFS 0.8.5. The encryption was turned on fairly early in the life cycle so I'd say yes. You know much better if something changed in the encryption path. I shall warn here that encrypt/decrypt is not happening on this last run as it was done with the -w option (See the zfs send -w above).

I'll try another transfer in the next 24 hours and see if it locks again. If it does, I'll save the full dmesg output and post the relevant parts (if any).

rincebrain commented 3 years ago

Were there other messages in dmesg that weren't just "stuck process for X seconds"?

I saw the use of resumable transfers and had been using them; perhaps I'll try constructing something to artificially interrupt the seconds and resume them, though that's a bit complicating.

My question about the lines immediately prior to what you pasted from the prior dmesg output stands, I think.

pquan commented 3 years ago

Yes, I only had stuck process warnings. I pasted then all. Nothing else was there, except the normal kernel stuff. The load was above 3000 but again the root filesystem is not on zfs so I was able to login and do normal administration.

My question about the lines immediately prior to what you pasted from the prior dmesg output stands, I think.

Ok, I'll save them next time. I did not copy them because they were irrelevant (in my eyes). They were network bridges and apparmor notifications for running containers.

billy-mar commented 3 years ago

Hi Everybody,

I have the same problem on Debian 10 with ZFS 2.0.3-1 on kernel 5.10. The only difference is that I always get BUG: kernel NULL pointer dereference, address: 0000000000000030 (instead of 00000 address). The root dataset is configured with aes-256-gcm, skein checksums and xattr=sa and recordsize and compression differ between different datasets from 16k to 1M and lz4 or gzip.

I get BUG: kernel NULL... everytime I run zfs send -R -v -w even when I redirect the send stream to a file: zfs send -R -v -w foo@bar > /tmp/file. I'm currently trying to isolate datasets that have this problem. It seems that I might be able to zfs destroy -r them so I can make the send.

I have several servers and laptops configured with exactly the same configuration and I've never had an issue like this. On this particular server several times when I run zpool scrub it repaired a small amount of data on one disk with no data loss. At the end I discovered that I had an issue with LSI storage controller (I tested that the problem was not a cable or HDD).

Could you please tell me how can I help you to isolate the problem?

Btw. Is it possible to test by zfs send -R -v -w foo@bar > /dev/null? For some reason ZFS doesn't want to send to /dev/null...

Billy

rincebrain commented 3 years ago

The /dev/null thing is a result of a Linux change in (IIRC) 5.10 and newer; #11445 is a bug about it, there's a pending PR to add a more informative note when it happens and a more invasive PR to work around it.

zfs send [...] | cat > /dev/null should work around it, if you like, in the interim - or pv or anything else where the stdout of send is not directly /dev/null.

edit: If you have a snapshot that reproduces this every time, please keep it around; everyone else, as far as I can tell, has this problem intermittently. As far as providing more information, I've periodically been thinking about it, but it's slightly tricky as one of the primary tools I'd like to use to inspect this won't provide a bunch of information on encrypted datasets, unfortunately.

One question I have which I think I know the answer to (I expect it to be "no") is "if you do a no-w send of a snapshot which reproduces this reliably with -w, does it still panic?"

Also, as an additional datapoint, could you share what CPU model you have? (The idea of it being CPU-dependent seems unlikely at this point, but since everyone else provided those datapoints already...)

billy-mar commented 3 years ago

zfs send [...] | cat > /dev/null

Thank you, I'll use it for other my devices to test their zpools.

Btw. one thing I haven't mentioned is that I created the pool and enabled encryption with the first stable ZFSonLInux version that had encryption feature. As far as I remember it was 8.0.

edit: If you have a snapshot that reproduces this every time

I don't have snapshots anymore, I deleted whole datasets yesterday but I might have a whole HDD. I was thinking to DD the HDD and send you the DD image but it's too large for my limited internet and I might be breaking law sending you the data anyway...

if you do a no-w send of a snapshot which reproduces this reliably with -w, does it still panic

I will test that. I know datasets that panicked and some of them had no snapshots till I zfs snapshot -r root_of_the_pool in order to zfs send -R -w.

But I tested something else.

I tested to import the pool without loading keys and mounting. In other words it's panicking regardless I import the pool using zpool import -l pool or zpool import -N pool.

Also, as an additional datapoint, could you share what CPU model you have?

Supermicro X9 motherboard with ECC RAM and 2x Intel Xeon 2660 v2.

Thank you Rich for digging into this. ;-)

billy-mar commented 3 years ago

For some reason I couldn't reproduce the panic with zfs send ... | cat > /dev/null. Is it possible that piping into cat does not panic but sending into a file does? Can anybody please test/reproduce?

I might be wrong, maybe I'm just sending something that doesn't panic...

rincebrain commented 3 years ago

Nobody else in the thread has (I think; there's a fair amount of it, and I might have missed or forgotten it) reported the same send doing it every time, so it's possible you were just getting unlucky, if you're sending the same streams. (If you're not, the same statement doubly applies.)

I can't say it's impossible for that to be true without a root cause, but my guess would be that you're just not losing whatever roulette is involved here. If you can reproduce it a fair number of times in a large sample in one config and never in the other, that would be a compelling argument (and also pretty confusing, for me, since I think people have reported this happening when only one side of a send/recv is on the panicing system? I could see it only happening on recv, but I think someone in here has reported it on the sending side?).

billy-mar commented 3 years ago

were just getting unlucky, if you're sending the same streams

Ouh, I see. I'll reboot a few times and try it again and again. Thank you.

people have reported this happening when only one side of a send/recv is on the panicing system

I can tell you for sure, it's not panicing system. It's panicing drive / pool since I managed to remove a disk from a panicing PC and connected it using USB enclosure to a laptop and it panicked. Except it didn't panic yesterday so I just need to reboot the laptop a few times and I'm assuming it'll panic again.

And it's definitely sending problem, not receiving issue since I tested it on that laptop by sending it to a file.

rincebrain commented 3 years ago

Was the panic the same on the laptop? Because, like the panic, the couple of data issues people have reported so far have been somewhat unpredictable, not a predictable "pool no longer imports" outcome every time.

Well, that says "it can be caused by just sending", not "it can only be caused by just sending" - the original post in this bug has a backtrace in the receive code, so I think the conclusion is "it can happen on either side" (!)

billy-mar commented 3 years ago

Was the panic the same on the laptop?

Yes, I moved a harddrive from PC to USB enclosure connected to laptop, sent the panicking dataset by send -R -w and it panicked absolutely same. Important to say that laptop and PC have 100% same config and 100% same packages installed.

pool no longer imports

Speaking about that, I hit sometime an issue that when I boot up, some datasets are not mounted. Pool is imported and encryption keys are loaded but sometime I need to execute zfs mount -alv to mount everything. I didn't have that problem with 0.8.x, It started happening on 2.0. The same thing sometime happens when importing pool with zpool import -l - I get an error cannot allocate memory for a few datasets but when I then execute zfs mount -alv it's all good again.

it can happen on either side

So far with my testing It has never happened with zfs receive < file_that_i_sent_to but of course as you mentioned it's a roulette.

Thank you for looking into this Rich ;-).

rincebrain commented 3 years ago

With my extremely unskilled tools, at least.

I broke out my old Haswell system and 0.8.3, to see if I could reproduce this by writing some data there and then moving forward; I've been running it in 9 parallel zfs send loops for the past couple days, and it seems to panic about once every 24h, but so far every time kdump has failed to properly run, and the system just hung until manual intervention (this machine very much does not have a BMC).

I think I've fixed that, though, so next time it dies, I'll see if this panic was a known one fixed later in the 0.8 series or this fun.

billy-mar commented 3 years ago

I've never run multiple sends in parallel and it my case ZFS always paniced with the same datasets (till I destroyed them and recreated them). You might consider running it in tmux with one pane with send -v ... receive -v and another with journalctl --dmesg --follow and then when it panics just note the dataset that you were sending, reboot and try to send again only the one you noted and one before and one after. I bet you it'll panic immediately so you won't need to spend 24h to reproduce.

rincebrain commented 3 years ago

I appreciate that you're trying to help, but I have used those tools before.

I don't have a whole bunch of encrypted datasets with complicated creation histories laying around, just ones I created for trying to reproduce this. So it's not getting to a specific dataset that's doing it for me - in fact, it's not doing it at all, as what I was hitting was an old 0.8 bug, so I've upgraded to 2.0.4 and am trying again (was trying to test whether it's datasets originating on 0.8).

pquan commented 3 years ago

I have not been active here for a few days. I was not getting more errors, but last night I had another lockup (trace below). I already think that this is data-dependent. Perhaps load-dependent too and I would include issued caused by unresponsive or heavily loaded disks (spinning disks, in my case). I'm saying this because one of my lock-ups was on a sender (trace included above 8 days ago). This one was on a receiver. There are no errors/messages in dmesg before the panic. However, zpool status is reporting that one of the hard disks experienced an error. The error counters on all disks are all 0 and there are no kernel logs of an unresponsive disk. As in the last time, the trace does not have any encryption related parts. In this case, I was receving an unencrypted dataset into an encrypted backup dataset. There's a possibility that we're dealing with two different issues here.

[1052344.672906] VERIFY3(0 == dmu_bonus_hold_by_dnode(dn, FTAG, &db, flags)) failed (0 == 5)
[1052344.674471] PANIC at dmu_recv.c:1811:receive_object()
[1052344.676133] Showing stack for process 30854
[1052344.677665] CPU: 5 PID: 30854 Comm: receive_writer Tainted: P           O      5.4.114-1-pve #1
[1052344.678987] Hardware name: ASUSTeK COMPUTER INC. P9D-M Series/P9D-M Series, BIOS 2201 07/17/2019
[1052344.680185] Call Trace:
[1052344.681393]  dump_stack+0x6d/0x8b
[1052344.682411]  spl_dumpstack+0x29/0x2b [spl]
[1052344.683317]  spl_panic+0xd3/0xfb [spl]
[1052344.684284]  ? _cond_resched+0x19/0x30
[1052344.685194]  ? down_read+0x12/0xa0
[1052344.686059]  ? dbuf_read+0x1c9/0x510 [zfs]
[1052344.686947]  ? dmu_bonus_hold_by_dnode+0x152/0x1b0 [zfs]
[1052344.687887]  ? kmem_asprintf+0x90/0xb0 [spl]
[1052344.688766]  receive_object+0x981/0xd10 [zfs]
[1052344.689621]  ? _cond_resched+0x19/0x30
[1052344.690468]  ? mutex_lock+0x12/0x30
[1052344.691276]  receive_writer_thread+0x279/0xa00 [zfs]
[1052344.692102]  ? spl_kmem_free+0x33/0x40 [spl]
[1052344.692906]  ? receive_process_write_record+0x180/0x180 [zfs]
[1052344.693873]  thread_generic_wrapper+0x74/0x90 [spl]
[1052344.694858]  ? receive_process_write_record+0x180/0x180 [zfs]
[1052344.695853]  ? thread_generic_wrapper+0x74/0x90 [spl]
[1052344.696562]  kthread+0x120/0x140
[1052344.697256]  ? __thread_exit+0x20/0x20 [spl]
[1052344.697934]  ? kthread_park+0x90/0x90
[1052344.698597]  ret_from_fork+0x35/0x40
[1052566.547031] INFO: task txg_quiesce:545 blocked for more than 120 seconds.
[1052566.547727]       Tainted: P           O      5.4.114-1-pve #1
[1052566.548397] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052566.549001] txg_quiesce     D    0   545      2 0x80004000
[1052566.549584] Call Trace:
[1052566.550152]  __schedule+0x2e6/0x700
[1052566.550703]  schedule+0x33/0xa0
[1052566.551291]  cv_wait_common+0x104/0x130 [spl]
[1052566.551924]  ? wait_woken+0x80/0x80
[1052566.552436]  __cv_wait+0x15/0x20 [spl]
[1052566.552975]  txg_quiesce_thread+0x2ba/0x3b0 [zfs]
[1052566.553487]  ? txg_sync_thread+0x4a0/0x4a0 [zfs]
[1052566.553960]  thread_generic_wrapper+0x74/0x90 [spl]
[1052566.554418]  kthread+0x120/0x140
[1052566.554861]  ? __thread_exit+0x20/0x20 [spl]
[1052566.555353]  ? kthread_park+0x90/0x90
[1052566.555808]  ret_from_fork+0x35/0x40
[1052566.556290] INFO: task zfs:866 blocked for more than 120 seconds.
[1052566.556684]       Tainted: P           O      5.4.114-1-pve #1
[1052566.557080] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052566.557459] zfs             D    0   866    863 0x80004000
[1052566.557826] Call Trace:
[1052566.558182]  __schedule+0x2e6/0x700
[1052566.558537]  schedule+0x33/0xa0
[1052566.558886]  io_schedule+0x16/0x40
[1052566.559260]  cv_wait_common+0xb5/0x130 [spl]
[1052566.559630]  ? wait_woken+0x80/0x80
[1052566.559964]  __cv_wait_io+0x18/0x20 [spl]
[1052566.560325]  txg_wait_synced_impl+0xc9/0x110 [zfs]
[1052566.560706]  txg_wait_synced+0x10/0x40 [zfs]
[1052566.561080]  dsl_sync_task_common+0x1b5/0x290 [zfs]
[1052566.561456]  ? dmu_recv_end_sync+0x580/0x580 [zfs]
[1052566.561869]  ? dmu_recv_cleanup_ds+0x140/0x140 [zfs]
[1052566.562287]  ? dmu_recv_end_sync+0x580/0x580 [zfs]
[1052566.562642]  ? dmu_recv_cleanup_ds+0x140/0x140 [zfs]
[1052566.563074]  dsl_sync_task+0x1a/0x20 [zfs]
[1052566.563489]  dmu_recv_existing_end+0x61/0x80 [zfs]
[1052566.563854]  ? rrw_exit+0x6a/0x160 [zfs]
[1052566.564190]  ? _cond_resched+0x19/0x30
[1052566.564526]  ? down_read+0x12/0xa0
[1052566.564886]  ? zvol_find_by_name_hash+0xcf/0x250 [zfs]
[1052566.565238]  dmu_recv_end+0x8e/0xb0 [zfs]
[1052566.565613]  zfs_ioc_recv_impl+0xa3c/0x1030 [zfs]
[1052566.565989]  zfs_ioc_recv+0x18b/0x330 [zfs]
[1052566.566330]  ? ns_capable_common+0x2f/0x50
[1052566.566670]  ? capable+0x19/0x20
[1052566.567077]  ? _cond_resched+0x19/0x30
[1052566.567464]  ? __kmalloc_node+0x1e0/0x330
[1052566.567783]  ? spl_kmem_alloc+0xdc/0x130 [spl]
[1052566.568126]  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
[1052566.568434]  ? __kmalloc_node+0x267/0x330
[1052566.568736]  ? spl_kmem_free+0x33/0x40 [spl]
[1052566.569065]  zfsdev_ioctl+0x54/0xe0 [zfs]
[1052566.569366]  do_vfs_ioctl+0xa9/0x640
[1052566.569666]  ? __vfs_read+0x29/0x40
[1052566.569963]  ? vfs_read+0x99/0x160
[1052566.570257]  ksys_ioctl+0x67/0x90
[1052566.570594]  __x64_sys_ioctl+0x1a/0x20
[1052566.570887]  do_syscall_64+0x57/0x190
[1052566.571251]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1052566.571604] RIP: 0033:0x7ffaa1a67427
[1052566.571898] Code: Bad RIP value.
[1052566.572186] RSP: 002b:00007ffeb18fe628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1052566.572490] RAX: ffffffffffffffda RBX: 00007ffeb1901d80 RCX: 00007ffaa1a67427
[1052566.572799] RDX: 00007ffeb18fe7d0 RSI: 0000000000005a1b RDI: 0000000000000005
[1052566.573107] RBP: 00007ffeb1902dc0 R08: 0000000000000003 R09: 00007ffaa1b32f10
[1052566.573412] R10: 000055d3489e9010 R11: 0000000000000246 R12: 00007ffeb18fe7d0
[1052566.573711] R13: 00007ffeb1908c20 R14: 0000000000000000 R15: 0000000000000000
[1052566.574009] INFO: task receive_writer:30854 blocked for more than 120 seconds.
[1052566.574308]       Tainted: P           O      5.4.114-1-pve #1
[1052566.574609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052566.574922] receive_writer  D    0 30854      2 0x80004000
[1052566.575310] Call Trace:
[1052566.575689]  __schedule+0x2e6/0x700
[1052566.576004]  schedule+0x33/0xa0
[1052566.576319]  spl_panic+0xf9/0xfb [spl]
[1052566.576631]  ? _cond_resched+0x19/0x30
[1052566.576943]  ? down_read+0x12/0xa0
[1052566.577273]  ? dbuf_read+0x1c9/0x510 [zfs]
[1052566.577599]  ? dmu_bonus_hold_by_dnode+0x152/0x1b0 [zfs]
[1052566.577913]  ? kmem_asprintf+0x90/0xb0 [spl]
[1052566.578244]  receive_object+0x981/0xd10 [zfs]
[1052566.578558]  ? _cond_resched+0x19/0x30
[1052566.578869]  ? mutex_lock+0x12/0x30
[1052566.579276]  receive_writer_thread+0x279/0xa00 [zfs]
[1052566.579656]  ? spl_kmem_free+0x33/0x40 [spl]
[1052566.579987]  ? receive_process_write_record+0x180/0x180 [zfs]
[1052566.580306]  thread_generic_wrapper+0x74/0x90 [spl]
[1052566.580642]  ? receive_process_write_record+0x180/0x180 [zfs]
[1052566.580963]  ? thread_generic_wrapper+0x74/0x90 [spl]
[1052566.581283]  kthread+0x120/0x140
[1052566.581603]  ? __thread_exit+0x20/0x20 [spl]
[1052566.581923]  ? kthread_park+0x90/0x90
[1052566.582243]  ret_from_fork+0x35/0x40
[1052687.378767] INFO: task txg_quiesce:545 blocked for more than 241 seconds.
[1052687.379153]       Tainted: P           O      5.4.114-1-pve #1
[1052687.379530] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052687.379913] txg_quiesce     D    0   545      2 0x80004000
[1052687.380283] Call Trace:
[1052687.380627]  __schedule+0x2e6/0x700
[1052687.380969]  schedule+0x33/0xa0
[1052687.381313]  cv_wait_common+0x104/0x130 [spl]
[1052687.381659]  ? wait_woken+0x80/0x80
[1052687.382008]  __cv_wait+0x15/0x20 [spl]
[1052687.382410]  txg_quiesce_thread+0x2ba/0x3b0 [zfs]
[1052687.382865]  ? txg_sync_thread+0x4a0/0x4a0 [zfs]
[1052687.383297]  thread_generic_wrapper+0x74/0x90 [spl]
[1052687.383653]  kthread+0x120/0x140
[1052687.384004]  ? __thread_exit+0x20/0x20 [spl]
[1052687.384356]  ? kthread_park+0x90/0x90
[1052687.384702]  ret_from_fork+0x35/0x40
[1052687.385079] INFO: task journal-offline:877 blocked for more than 120 seconds.
[1052687.385425]       Tainted: P           O      5.4.114-1-pve #1
[1052687.385773] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052687.386136] journal-offline D    0   877  15571 0x00000320
[1052687.386496] Call Trace:
[1052687.386889]  __schedule+0x2e6/0x700
[1052687.387323]  schedule+0x33/0xa0
[1052687.387661]  io_schedule+0x16/0x40
[1052687.387992]  wait_on_page_bit+0x141/0x210
[1052687.388323]  ? file_fdatawait_range+0x30/0x30
[1052687.388657]  wait_on_page_writeback+0x43/0x90
[1052687.388989]  __filemap_fdatawait_range+0xae/0x120
[1052687.389327]  filemap_write_and_wait_range+0x5c/0xb0
[1052687.389708]  zpl_fsync+0x3c/0xa0 [zfs]
[1052687.390048]  vfs_fsync_range+0x48/0x80
[1052687.390388]  ? __fget_light+0x59/0x70
[1052687.390743]  do_fsync+0x3d/0x70
[1052687.391158]  __x64_sys_fsync+0x14/0x20
[1052687.391492]  do_syscall_64+0x57/0x190
[1052687.391823]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1052687.392160] RIP: 0033:0x7f3c470b3017
[1052687.392496] Code: Bad RIP value.
[1052687.392828] RSP: 002b:00007f3c3e7d5cb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[1052687.393171] RAX: ffffffffffffffda RBX: 0000000000000018 RCX: 00007f3c470b3017
[1052687.393515] RDX: 0000000000000000 RSI: 00007f3c46cb8b92 RDI: 0000000000000018
[1052687.393861] RBP: 00007f3c46cba3a0 R08: 00007f3c3e7d6700 R09: 00007f3c3e7d6700
[1052687.394205] R10: 0000000000000012 R11: 0000000000000293 R12: 0000000000000002
[1052687.394550] R13: 0000000000000000 R14: 0000560145e954f0 R15: 00007ffd438ac490
[1052687.395036] INFO: task zfs:866 blocked for more than 241 seconds.
[1052687.395407]       Tainted: P           O      5.4.114-1-pve #1
[1052687.395760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052687.396118] zfs             D    0   866    863 0x80004000
[1052687.396473] Call Trace:
[1052687.396818]  __schedule+0x2e6/0x700
[1052687.397157]  schedule+0x33/0xa0
[1052687.397494]  io_schedule+0x16/0x40
[1052687.397833]  cv_wait_common+0xb5/0x130 [spl]
[1052687.398172]  ? wait_woken+0x80/0x80
[1052687.398512]  __cv_wait_io+0x18/0x20 [spl]
[1052687.398945]  txg_wait_synced_impl+0xc9/0x110 [zfs]
[1052687.399384]  txg_wait_synced+0x10/0x40 [zfs]
[1052687.399747]  dsl_sync_task_common+0x1b5/0x290 [zfs]
[1052687.400106]  ? dmu_recv_end_sync+0x580/0x580 [zfs]
[1052687.400465]  ? dmu_recv_cleanup_ds+0x140/0x140 [zfs]
[1052687.400823]  ? dmu_recv_end_sync+0x580/0x580 [zfs]
[1052687.401181]  ? dmu_recv_cleanup_ds+0x140/0x140 [zfs]
[1052687.401558]  dsl_sync_task+0x1a/0x20 [zfs]
[1052687.401916]  dmu_recv_existing_end+0x61/0x80 [zfs]
[1052687.402279]  ? rrw_exit+0x6a/0x160 [zfs]
[1052687.402609]  ? _cond_resched+0x19/0x30
[1052687.403001]  ? down_read+0x12/0xa0
[1052687.403404]  ? zvol_find_by_name_hash+0xcf/0x250 [zfs]
[1052687.403742]  dmu_recv_end+0x8e/0xb0 [zfs]
[1052687.404090]  zfs_ioc_recv_impl+0xa3c/0x1030 [zfs]
[1052687.404445]  zfs_ioc_recv+0x18b/0x330 [zfs]
[1052687.404783]  ? ns_capable_common+0x2f/0x50
[1052687.405120]  ? capable+0x19/0x20
[1052687.405454]  ? _cond_resched+0x19/0x30
[1052687.405788]  ? __kmalloc_node+0x1e0/0x330
[1052687.406122]  ? spl_kmem_alloc+0xdc/0x130 [spl]
[1052687.406486]  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
[1052687.406872]  ? __kmalloc_node+0x267/0x330
[1052687.407262]  ? spl_kmem_free+0x33/0x40 [spl]
[1052687.407614]  zfsdev_ioctl+0x54/0xe0 [zfs]
[1052687.407953]  do_vfs_ioctl+0xa9/0x640
[1052687.408286]  ? __vfs_read+0x29/0x40
[1052687.408618]  ? vfs_read+0x99/0x160
[1052687.408948]  ksys_ioctl+0x67/0x90
[1052687.409272]  __x64_sys_ioctl+0x1a/0x20
[1052687.409599]  do_syscall_64+0x57/0x190
[1052687.409918]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1052687.410234] RIP: 0033:0x7ffaa1a67427
[1052687.410549] Code: Bad RIP value.
[1052687.410910] RSP: 002b:00007ffeb18fe628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1052687.411280] RAX: ffffffffffffffda RBX: 00007ffeb1901d80 RCX: 00007ffaa1a67427
[1052687.411594] RDX: 00007ffeb18fe7d0 RSI: 0000000000005a1b RDI: 0000000000000005
[1052687.411902] RBP: 00007ffeb1902dc0 R08: 0000000000000003 R09: 00007ffaa1b32f10
[1052687.412207] R10: 000055d3489e9010 R11: 0000000000000246 R12: 00007ffeb18fe7d0
[1052687.412508] R13: 00007ffeb1908c20 R14: 0000000000000000 R15: 0000000000000000
[1052687.412806] INFO: task receive_writer:30854 blocked for more than 241 seconds.
[1052687.413105]       Tainted: P           O      5.4.114-1-pve #1
[1052687.413407] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052687.413720] receive_writer  D    0 30854      2 0x80004000
[1052687.414037] Call Trace:
[1052687.414351]  __schedule+0x2e6/0x700
[1052687.414665]  schedule+0x33/0xa0
[1052687.415020]  spl_panic+0xf9/0xfb [spl]
[1052687.415412]  ? _cond_resched+0x19/0x30
[1052687.415726]  ? down_read+0x12/0xa0
[1052687.416054]  ? dbuf_read+0x1c9/0x510 [zfs]
[1052687.416381]  ? dmu_bonus_hold_by_dnode+0x152/0x1b0 [zfs]
[1052687.416695]  ? kmem_asprintf+0x90/0xb0 [spl]
[1052687.417026]  receive_object+0x981/0xd10 [zfs]
[1052687.417339]  ? _cond_resched+0x19/0x30
[1052687.417648]  ? mutex_lock+0x12/0x30
[1052687.417975]  receive_writer_thread+0x279/0xa00 [zfs]
[1052687.418287]  ? spl_kmem_free+0x33/0x40 [spl]
[1052687.418617]  ? receive_process_write_record+0x180/0x180 [zfs]
[1052687.418996]  thread_generic_wrapper+0x74/0x90 [spl]
[1052687.419399]  ? receive_process_write_record+0x180/0x180 [zfs]
[1052687.419723]  ? thread_generic_wrapper+0x74/0x90 [spl]
[1052687.420043]  kthread+0x120/0x140
[1052687.420389]  ? __thread_exit+0x20/0x20 [spl]
[1052687.420710]  ? kthread_park+0x90/0x90
[1052687.421031]  ret_from_fork+0x35/0x40
[1052808.210503] INFO: task txg_quiesce:545 blocked for more than 362 seconds.
[1052808.210893]       Tainted: P           O      5.4.114-1-pve #1
[1052808.211272] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052808.211658] txg_quiesce     D    0   545      2 0x80004000
[1052808.212046] Call Trace:
[1052808.212426]  __schedule+0x2e6/0x700
[1052808.212768]  schedule+0x33/0xa0
[1052808.213116]  cv_wait_common+0x104/0x130 [spl]
[1052808.213463]  ? wait_woken+0x80/0x80
[1052808.213813]  __cv_wait+0x15/0x20 [spl]
[1052808.214217]  txg_quiesce_thread+0x2ba/0x3b0 [zfs]
[1052808.214674]  ? txg_sync_thread+0x4a0/0x4a0 [zfs]
[1052808.215119]  thread_generic_wrapper+0x74/0x90 [spl]
[1052808.215473]  kthread+0x120/0x140
[1052808.215824]  ? __thread_exit+0x20/0x20 [spl]
[1052808.216176]  ? kthread_park+0x90/0x90
[1052808.216521]  ret_from_fork+0x35/0x40
[1052808.216898] INFO: task journal-offline:877 blocked for more than 241 seconds.
[1052808.217246]       Tainted: P           O      5.4.114-1-pve #1
[1052808.217594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052808.217953] journal-offline D    0   877  15571 0x00000320
[1052808.218314] Call Trace:
[1052808.218699]  __schedule+0x2e6/0x700
[1052808.219122]  schedule+0x33/0xa0
[1052808.219462]  io_schedule+0x16/0x40
[1052808.219797]  wait_on_page_bit+0x141/0x210
[1052808.220127]  ? file_fdatawait_range+0x30/0x30
[1052808.220460]  wait_on_page_writeback+0x43/0x90
[1052808.220793]  __filemap_fdatawait_range+0xae/0x120
[1052808.221128]  filemap_write_and_wait_range+0x5c/0xb0
[1052808.221497]  zpl_fsync+0x3c/0xa0 [zfs]
[1052808.221852]  vfs_fsync_range+0x48/0x80
[1052808.222207]  ? __fget_light+0x59/0x70
[1052808.222592]  do_fsync+0x3d/0x70
[1052808.223010]  __x64_sys_fsync+0x14/0x20
[1052808.223344]  do_syscall_64+0x57/0x190
[1052808.223677]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1052808.224013] RIP: 0033:0x7f3c470b3017
[1052808.224350] Code: Bad RIP value.
[1052808.224682] RSP: 002b:00007f3c3e7d5cb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[1052808.225026] RAX: ffffffffffffffda RBX: 0000000000000018 RCX: 00007f3c470b3017
[1052808.225370] RDX: 0000000000000000 RSI: 00007f3c46cb8b92 RDI: 0000000000000018
[1052808.225738] RBP: 00007f3c46cba3a0 R08: 00007f3c3e7d6700 R09: 00007f3c3e7d6700
[1052808.226083] R10: 0000000000000012 R11: 0000000000000293 R12: 0000000000000002
[1052808.226430] R13: 0000000000000000 R14: 0000560145e954f0 R15: 00007ffd438ac490
[1052808.226896] INFO: task zfs:866 blocked for more than 362 seconds.
[1052808.227297]       Tainted: P           O      5.4.114-1-pve #1
[1052808.227651] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1052808.228009] zfs             D    0   866    863 0x80004000
[1052808.228365] Call Trace:
[1052808.228710]  __schedule+0x2e6/0x700
[1052808.229050]  schedule+0x33/0xa0
[1052808.229386]  io_schedule+0x16/0x40
[1052808.229724]  cv_wait_common+0xb5/0x130 [spl]
[1052808.230064]  ? wait_woken+0x80/0x80
[1052808.230403]  __cv_wait_io+0x18/0x20 [spl]
[1052808.230813]  txg_wait_synced_impl+0xc9/0x110 [zfs]
[1052808.231255]  txg_wait_synced+0x10/0x40 [zfs]
[1052808.231617]  dsl_sync_task_common+0x1b5/0x290 [zfs]
[1052808.231976]  ? dmu_recv_end_sync+0x580/0x580 [zfs]
[1052808.232335]  ? dmu_recv_cleanup_ds+0x140/0x140 [zfs]
[1052808.232694]  ? dmu_recv_end_sync+0x580/0x580 [zfs]
[1052808.233051]  ? dmu_recv_cleanup_ds+0x140/0x140 [zfs]
[1052808.233411]  dsl_sync_task+0x1a/0x20 [zfs]
[1052808.233769]  dmu_recv_existing_end+0x61/0x80 [zfs]
[1052808.234129]  ? rrw_exit+0x6a/0x160 [zfs]
[1052808.234519]  ? _cond_resched+0x19/0x30
[1052808.234908]  ? down_read+0x12/0xa0
[1052808.235251]  ? zvol_find_by_name_hash+0xcf/0x250 [zfs]
[1052808.235587]  dmu_recv_end+0x8e/0xb0 [zfs]
[1052808.235933]  zfs_ioc_recv_impl+0xa3c/0x1030 [zfs]
[1052808.236283]  zfs_ioc_recv+0x18b/0x330 [zfs]
[1052808.236604]  ? ns_capable_common+0x2f/0x50
[1052808.236923]  ? capable+0x19/0x20
[1052808.237239]  ? _cond_resched+0x19/0x30
[1052808.237560]  ? __kmalloc_node+0x1e0/0x330
[1052808.237878]  ? spl_kmem_alloc+0xdc/0x130 [spl]
[1052808.238225]  zfsdev_ioctl_common+0x5b2/0x820 [zfs]
[1052808.238597]  ? __kmalloc_node+0x267/0x330
[1052808.238982]  ? spl_kmem_free+0x33/0x40 [spl]
[1052808.239332]  zfsdev_ioctl+0x54/0xe0 [zfs]
[1052808.239669]  do_vfs_ioctl+0xa9/0x640
[1052808.240002]  ? __vfs_read+0x29/0x40
[1052808.240334]  ? vfs_read+0x99/0x160
[1052808.240662]  ksys_ioctl+0x67/0x90
[1052808.240986]  __x64_sys_ioctl+0x1a/0x20
[1052808.241312]  do_syscall_64+0x57/0x190
[1052808.241630]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1052808.241946] RIP: 0033:0x7ffaa1a67427
[1052808.242259] Code: Bad RIP value.
[1052808.242611] RSP: 002b:00007ffeb18fe628 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1052808.242982] RAX: ffffffffffffffda RBX: 00007ffeb1901d80 RCX: 00007ffaa1a67427
[1052808.243292] RDX: 00007ffeb18fe7d0 RSI: 0000000000005a1b RDI: 0000000000000005
[1052808.243600] RBP: 00007ffeb1902dc0 R08: 0000000000000003 R09: 00007ffaa1b32f10
[1052808.243904] R10: 000055d3489e9010 R11: 0000000000000246 R12: 00007ffeb18fe7d0
[1052808.244201] R13: 00007ffeb1908c20 R14: 0000000000000000 R15: 0000000000000000
rincebrain commented 3 years ago

That does seem like a different issue or issues, yes. Perhaps go open a separate issue?

pquan commented 3 years ago

And one more, again another system. A receiver. This is an ATOM N2800 whose only purpose in life is backup storage. The system is responsive, but load average is 5 and the cpu is doing nothing but wait I think It's time to ditch encryption on zfs for the time being.

[ 5531.803779] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 5531.803842] #PF: supervisor read access in kernel mode
[ 5531.803882] #PF: error_code(0x0000) - not-present page
[ 5531.803924] PGD 0 P4D 0
[ 5531.803964] Oops: 0000 [#1] SMP NOPTI
[ 5531.804009] CPU: 2 PID: 2276 Comm: receive_writer Tainted: P           O      5.4.114-1-pve #1
[ 5531.804056] Hardware name:  /DN2800MT, BIOS MTCDT10N.86A.0158.2012.0808.1446 08/08/2012
[ 5531.804307] RIP: 0010:abd_verify+0x5/0x70 [zfs]
[ 5531.804359] Code: 00 00 00 66 90 0f 1f 44 00 00 55 8b 07 c1 e8 06 48 89 e5 83 e0 01 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <8b> 07 a8 01 75 4f 55 48 89 e5 41 55 41 54 49 89 fc 53 a8 40 74 40
[ 5531.804432] RSP: 0018:ffffaf6cda4639d8 EFLAGS: 00010246
[ 5531.804493] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 5531.804553] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 5531.804614] RBP: ffffaf6cda4639f0 R08: 0000000000000094 R09: 0000000000000000
[ 5531.804676] R10: 0000000000000f86 R11: 0000000000000000 R12: 0000000000004000
[ 5531.804734] R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc089de90
[ 5531.804794] FS:  0000000000000000(0000) GS:ffff8fd3abb00000(0000) knlGS:0000000000000000
[ 5531.804875] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5531.804928] CR2: 0000000000000000 CR3: 0000000126fe2000 CR4: 00000000000006e0
[ 5531.804990] Call Trace:
[ 5531.805235]  ? abd_borrow_buf+0x17/0x40 [zfs]
[ 5531.805502]  abd_borrow_buf_copy+0x2c/0x80 [zfs]
[ 5531.805785]  zio_crypt_copy_dnode_bonus+0x31/0x140 [zfs]
[ 5531.805854]  ? preempt_schedule_common+0xe/0x20
[ 5531.806097]  arc_buf_fill+0x402/0xb40 [zfs]
[ 5531.806374]  arc_untransform+0x21/0x80 [zfs]
[ 5531.806653]  dbuf_read_verify_dnode_crypt+0xee/0x160 [zfs]
[ 5531.806921]  dbuf_read_impl.constprop.33+0x4bd/0x6e0 [zfs]
[ 5531.807187]  ? dbuf_cons+0xab/0xc0 [zfs]
[ 5531.807258]  ? _cond_resched+0x19/0x30
[ 5531.807317]  ? mutex_lock+0x12/0x30
[ 5531.807573]  ? aggsum_add+0x188/0x1b0 [zfs]
[ 5531.807639]  ? _cond_resched+0x19/0x30
[ 5531.807693]  ? down_read+0x12/0xa0
[ 5531.807967]  dbuf_read+0x1b2/0x510 [zfs]
[ 5531.808041]  ? _cond_resched+0x19/0x30
[ 5531.808096]  ? mutex_lock+0x12/0x30
[ 5531.808348]  dmu_bonus_hold_by_dnode+0x134/0x1b0 [zfs]
[ 5531.808620]  receive_object+0x434/0xd10 [zfs]
[ 5531.808896]  ? receive_freeobjects.isra.15+0xa1/0x120 [zfs]
[ 5531.808975]  ? _cond_resched+0x19/0x30
[ 5531.809032]  ? mutex_lock+0x12/0x30
[ 5531.809290]  receive_writer_thread+0x279/0xa00 [zfs]
[ 5531.809368]  ? set_user_nice.part.73+0xf9/0x210
[ 5531.809653]  ? receive_process_write_record+0x180/0x180 [zfs]
[ 5531.809737]  thread_generic_wrapper+0x74/0x90 [spl]
[ 5531.809809]  ? thread_generic_wrapper+0x74/0x90 [spl]
[ 5531.809866]  kthread+0x120/0x140
[ 5531.809931]  ? __thread_exit+0x20/0x20 [spl]
[ 5531.809984]  ? kthread_park+0x90/0x90
[ 5531.810043]  ret_from_fork+0x1f/0x40
[ 5531.810097] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables ipmi_devintf ipmi_msghandler iptable_filter bpfilter dummy bonding softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_hda_intel gma500_gfx snd_intel_dspcfg snd_hda_codec drm_kms_helper snd_hda_core intel_powerclamp drm snd_hwdep coretemp i2c_algo_bit snd_pcm fb_sys_fops syscopyarea sysfillrect snd_timer sysimgblt snd soundcore mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap sunrpc ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 ahci i2c_i801 libahci lpc_ich uhci_hcd ehci_pci e1000e ehci_hcd video
[ 5531.810456] CR2: 0000000000000000
[ 5531.810509] ---[ end trace 0e383133d20bdebe ]---
[ 5531.810763] RIP: 0010:abd_verify+0x5/0x70 [zfs]
[ 5531.810830] Code: 00 00 00 66 90 0f 1f 44 00 00 55 8b 07 c1 e8 06 48 89 e5 83 e0 01 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <8b> 07 a8 01 75 4f 55 48 89 e5 41 55 41 54 49 89 fc 53 a8 40 74 40
[ 5531.810932] RSP: 0018:ffffaf6cda4639d8 EFLAGS: 00010246
[ 5531.810984] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 5531.811040] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 5531.811096] RBP: ffffaf6cda4639f0 R08: 0000000000000094 R09: 0000000000000000
[ 5531.811148] R10: 0000000000000f86 R11: 0000000000000000 R12: 0000000000004000
[ 5531.811200] R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc089de90
[ 5531.811255] FS:  0000000000000000(0000) GS:ffff8fd3abb00000(0000) knlGS:0000000000000000
[ 5531.811321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5531.811371] CR2: 0000000000000000 CR3: 0000000126fe2000 CR4: 00000000000006e0
[ 6820.375942] perf: interrupt took too long (9720 > 9717), lowering kernel.perf_event_max_sample_rate to 20500

Here's the perf top output:

Samples: 16  of event 'cycles:ppp', 1 Hz, Event count (approx.): 428731477
Overhead  Shared O  Symbol
  30.73%  [kernel]  [k] menu_select
  25.94%  [kernel]  [k] do_idle
  22.27%  [kernel]  [k] get_next_timer_interrupt
  21.05%  [kernel]  [k] _perf_event_enable
   0.01%  [kernel]  [k] native_write_msr

And some more:

Samples: 18  of event 'cycles:ppp', 1 Hz, Event count (approx.): 2070828844
Overhead  Shared O  Symbol
  78.46%  [kernel]  [k] find_next_bit
  18.75%  [kernel]  [k] nr_iowait_cpu
   0.86%  [kernel]  [k] menu_select
   0.72%  [kernel]  [k] do_idle
   0.62%  [kernel]  [k] get_next_timer_interrupt
   0.59%  [kernel]  [k] _perf_event_enable
   0.00%  [kernel]  [k] native_write_msr

and more

Samples: 23  of event 'cycles:ppp', 1 Hz, Event count (approx.): 8895722211
Overhead  Shared Object     Symbol
  40.38%  [kernel]          [k] hrtimer_next_event_without
  16.04%  [kernel]          [k] get_page_from_freelist
  16.03%  libc-2.28.so      [.] 0x000000000007fd48
  16.03%  perl              [.] Perl_av_store
   6.84%  [kernel]          [k] release_pages
   3.68%  [kernel]          [k] find_next_bit
   0.88%  [kernel]          [k] nr_iowait_cpu
   0.04%  [kernel]          [k] menu_select
   0.03%  [kernel]          [k] do_idle
   0.03%  [kernel]          [k] get_next_timer_interrupt
   0.03%  [kernel]          [k] _perf_event_enable
   0.00%  [kernel]          [k] native_write_msr
Samples: 24  of event 'cycles:ppp', 1 Hz, Event count (approx.): 7349431964
Overhead  Shared Object     Symbol
  63.61%  [kernel]          [k] timekeeping_max_deferment
  14.69%  [kernel]          [k] hrtimer_next_event_without
   5.84%  [kernel]          [k] get_page_from_freelist
   5.83%  libc-2.28.so      [.] 0x000000000007fd48
   5.83%  perl              [.] Perl_av_store
   2.49%  [kernel]          [k] release_pages
   1.34%  [kernel]          [k] find_next_bit
   0.32%  [kernel]          [k] nr_iowait_cpu
   0.01%  [kernel]          [k] menu_select
   0.01%  [kernel]          [k] do_idle
   0.01%  [kernel]          [k] get_next_timer_interrupt
   0.01%  [kernel]          [k] _perf_event_enable
   0.00%  [kernel]          [k] native_write_msr

And during normal operation, this is what I get from perf:

Samples: 41  of event 'cycles:ppp', 2 Hz, Event count (approx.): 76165144189
Overhead  Shared Object       Symbol
  75.53%  [kernel]            [k] entry_SYSCALL_64_after_hwframe
  10.16%  [kernel]            [k] gcm_generic_mul
   2.01%  [kernel]            [k] lz4_compress_zfs
   1.20%  [kernel]            [k] syscall_return_via_sysret
   1.19%  [kernel]            [k] __percpu_up_read
   1.07%  [kernel]            [k] run_posix_cpu_timers
   1.07%  [kernel]            [k] kmem_cache_alloc
   1.07%  [kernel]            [k] aes_encrypt_amd64
   1.01%  [kernel]            [k] __close_fd
   0.94%  libpthread-2.28.so  [.] __close
   0.94%  [kernel]            [k] __x86_indirect_thunk_rax
   0.94%  liblzo2.so.2.0.0    [.] lzo_adler32
   0.93%  liblzo2.so.2.0.0    [.] lzo1x_decompress_safe
   0.91%  [kernel]            [k] __ptrace_may_access
   0.90%  [kernel]            [k] menu_select
   0.11%  [kernel]            [k] irq_work_run_list
   0.04%  [kernel]            [k] event_function_call
   0.00%  [kernel]            [k] native_write_msr

and encryption (while still OK):

Samples: 282  of event 'cycles:ppp', 2 Hz, Event count (approx.): 82194053347
Overhead  Shared Object       Symbol
  36.58%  [kernel]            [k] gcm_generic_mul
   4.03%  [kernel]            [k] lz4_compress_zfs
   2.82%  [kernel]            [k] aes_encrypt_amd64
   2.50%  [kernel]            [k] vdev_disk_dio_put
   2.48%  [kernel]            [k] entry_SYSCALL_64_after_hwframe
   2.16%  libcrypto.so.1.1    [.] 0x000000000009d128
   2.01%  libcrypto.so.1.1    [.] CRYPTO_ctr128_encrypt
   2.00%  libcrypto.so.1.1    [.] 0x000000000009d136
   1.55%  libcrypto.so.1.1    [.] 0x000000000009d0c8
   1.36%  libcrypto.so.1.1    [.] 0x000000000009d144
   1.15%  liblzo2.so.2.0.0    [.] lzo_adler32
   1.11%  [kernel]            [k] aes_xor_block
   1.04%  [kernel]            [k] copy_user_generic_unrolled
   1.02%  liblzo2.so.2.0.0    [.] lzo1x_decompress_safe
   0.99%  [kernel]            [k] gcm_init
   0.99%  [kernel]            [k] prepare_exit_to_usermode
   0.99%  libcrypto.so.1.1    [.] 0x000000000009d104
   0.99%  [kernel]            [k] psi_task_change
   0.92%  [kernel]            [k] __update_load_avg_cfs_rq
   0.91%  [kernel]            [k] mutex_lock
rincebrain commented 3 years ago

If it's not working well for you (understatement), then until someone offers a workaround or solution, not using encryption is probably a reasonable choice,

For the moment, since I've not been able to run down a reproducer for my own use, I'm building a kernel with more debug bits for the system I hit #12106 on, hoping that since many of the top calls in the stacktrace are the same, ameliorating it will help here.

I would doubt it, but if anyone could synthesize a pool with one of the datasets+snapshots that, when sent, sometimes results in this, on it, having verified that just sending from that precise pool and snapshot(s) would trigger it, and would be willing to share that pool (in this case, I'm imagining as sparse file(s)) (the keys to unlock it would not be necessary, I expect), it would probably be extraordinarily helpful.

Of course, unless this happens to someone on an unimportant-but-encrypted-and-being-backed-up dataset some of the time, I would not be surprised if nobody volunteers even their encrypted data to someone for this purpose. So I'll keep tinkering. :)

pquan commented 3 years ago

Well, some data is just encrypted for compliance reasons.

I don't want to come across as a whiner. I actually feel sad for not being able to isolate the issue better. ZFS has been rock solid over the years (well.. excluding some rocks we hit along the way).

I will try to synthesize a test, where one server creates an artificially large dataset of files, while snapshotting it, and while listing those snapshots. Then I'll try to send this dataset to another server, which will be receive it, list it's own snapshots and delete some of the older snapshots along the way. This is what a sanoid/syncoid backup system does in reality. Once every 5 minutes on each server. It's not the mere send/receive that is in play here.

rincebrain commented 3 years ago

Sure, but I'd expect the same compliance reasons to probably preclude sharing.

I don't think you're whining, and even if I did, "my filesystem is crashing" is a pretty good basis. Avoiding hitting the issue is just a reasonable thing to do.

For you, it may not be. It's possible in general it's not too, but other people have reported issues with (apparently) just sending, and the callstacks haven't (unless I've overlooked something) suggested anything else.

(I still think you should open a separate bug for the problems you saw that looked quite different earlier in the thread, assuming I didn't miss you doing so already.)

On Fri, Jun 18, 2021, 12:30 PM pquan @.***> wrote:

Well, some data is just encrypted for compliance reasons.

I don't want to come across as a whiner. I actually feel sad for not being able to isolate the issue better. ZFS has been rock solid over the years (well.. excluding some rocks we hit along the way).

I will try to synthesize a test, where one server creates an artificially large dataset of files, while snapshotting it, and while listing those snapshots. Then I'll try to send this dataset to another server, which will be receive it, list it's own snapshots and delete some of the older snapshots along the way. This is what a sanoid/syncoid backup system does in reality. Once every 5 minutes on each server. It's not the mere send/receive that is in play here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/11679#issuecomment-864152850, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7PUJC7YPGVGLVYQTJ3TTNYBFANCNFSM4YPT72KQ .