openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.61k stars 1.75k forks source link

Kernel panic, NULL pointer dereference in zap_lockdir_impl #6639

Closed cytrinox closed 7 years ago

cytrinox commented 7 years ago

System information

Type Version/Name
Distribution Name Debian Stretch
Distribution Version 9.1
Linux Kernel 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) x86_64 GNU/Linux
Architecture x86_64
ZFS Version git build (d9549cba9640cd3b09d76b8cbd54387728b7be24) - 2017-09-11
SPL Version git build (9df9692637aeee416f509c7f39655beb2d35b549) - 2017-08-11

Describe the problem you're observing

I was notified by a friend that the bleeding edge build from zfs git master freezes his new AMD Ryzen based server. To make sure this problem is related (or not) to Ryzen, I build a current module on my own and set up a physical server (Intel Xeon) with closest match possible to his zfs configuration. Suprising for me, I can reproduce the problem.

Long story short, I've set up a new Debian 9.1 installation with two disks, created partitions sd{a,b}[5-12] for testing purposes, added new pool with raidz2 (5 devs), 2 logs, 2 caches and one encrypted dataset with compression enabled.

Describe how to reproduce the problem

To reproduce, I've run rsync and dbench parallel (!) multiple times (runtime ~2h). During these runs, I have changed the dedup property for dataset and pool. After some time, the rsync/dbench processes stops and a kernel panic was logged.

# zpool create -f -o feature@encryption=enabled data raidz2 sda5 sdb5 sda6 sdb6 sda7 cache sda8 sdb8 log sda9 sdb9
# zfs compression=lz4 data
# zfs create -o keyformat=passphrase -o keylocation=file:///root/.diskkey -o encryption=aes-256-gcm data/test
# rsync -Pavi /usr /data/test/
# dbench -D /data/test/ 50
# zfs set dedup=on data/test
# zfs set dedup=on data

On the AMD Ryzen system, it is reproducible without using dedup. But his system has >10TB and it happens most time after 1-2TiB are transferred via rsync to a dataset. On my setup, I only have 250G for tests, so I have played with dedup to generate more I/O (unsure if dedup will really produce more I/O but for my case, it helped). At the time of writing, I was unable to test if the same kernel panic happens on my system without using dedup.

Include any warning/errors/backtraces from the system logs

~# zpool status
  pool: data
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda5    ONLINE       0     0     0
            sdb5    ONLINE       0     0     0
            sda6    ONLINE       0     0     0
            sdb6    ONLINE       0     0     0
            sda7    ONLINE       0     0     0
        logs
          sda9      ONLINE       0     0     0
          sdb9      ONLINE       0     0     0
        cache
          sda8      ONLINE       0     0     0
          sdb8      ONLINE       0     0     0

errors: No known data errors

Log from my system:

[ 7886.231978] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 7886.232040] IP: [<ffffffffc09b4e6f>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 7886.232160] PGD 11c76b067
[ 7886.232168] PUD 10e967067
[ 7886.232195] PMD 0

[ 7886.232223] Oops: 0000 [#1] SMP
[ 7886.232249] Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) coretemp                                               iTCO_wdt kvm_intel kvm iTCO_vendor_support ppdev serio_raw evdev pcspkr irqbypass sg shpchp parport_pc parport                                               i3200_edac lpc_ich mfd_core edac_core video button acpi_cpufreq ip_tables x_tables autofs4 ext4 crc16 jbd2 fsc                                              rypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache raid10 raid456 async_raid6_recov async                                              _memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear raid1 md_mod h                                              id_generic usbhid hid sr_mod cdrom sd_mod ahci libahci libata scsi_mod i2c_i801 i2c_smbus ehci_pci uhci_hcd ehc                                              i_hcd e1000e ptp pps_core usbcore usb_common floppy
[ 7886.232677] CPU: 3 PID: 4123 Comm: bash Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 7886.232732] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 7886.232783] task: ffff93f1acdb8d80 task.stack: ffffa2134466c000
[ 7886.232814] RIP: 0010:[<ffffffffc09b4e6f>]  [<ffffffffc09b4e6f>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 7886.232915] RSP: 0018:ffffa2134466fb38  EFLAGS: 00010246
[ 7886.232945] RAX: 0000000000000000 RBX: ffff93f18ac13618 RCX: 000000000000018f
[ 7886.232979] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000130
[ 7886.233013] RBP: ffffa2134466fcf8 R08: 0000000000000001 R09: 0000000000000001
[ 7886.233047] R10: 2000000000000000 R11: 0000008000000000 R12: 0000000000000002
[ 7886.233081] R13: ffff93f2a8535000 R14: 0000000000000001 R15: 0000000000000000
[ 7886.233116] FS:  00007f195453bb40(0000) GS:ffff93f2afd80000(0000) knlGS:0000000000000000
[ 7886.233168] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7886.233199] CR2: 0000000000000000 CR3: 00000001084e7000 CR4: 00000000000006e0
[ 7886.233233] Stack:
[ 7886.233256]  0000000000000000 000000008ac13698 0000000000019251 0000000000000000
[ 7886.233313]  ffffffffc094eede 0000001200000000 000000000000004f 0000000000019251
[ 7886.233370]  0002000000000400 0000002c00000014 00000000000000a8 0000000001000001
[ 7886.233428] Call Trace:
[ 7886.233490]  [<ffffffffc094eede>] ? dnode_rele_and_unlock+0x4e/0x80 [zfs]
[ 7886.233568]  [<ffffffffc09b58de>] ? zap_lockdir+0x8e/0xb0 [zfs]
[ 7886.233643]  [<ffffffffc09b5aa6>] ? zap_cursor_retrieve+0x1a6/0x2e0 [zfs]
[ 7886.233710]  [<ffffffffc0936eda>] ? dmu_prefetch+0x15a/0x1e0 [zfs]
[ 7886.233786]  [<ffffffffc09d9805>] ? zfs_readdir+0x125/0x400 [zfs]
[ 7886.233821]  [<ffffffffbeb85c86>] ? __alloc_pages_nodemask+0xf6/0x260
[ 7886.233855]  [<ffffffffbebf8668>] ? mem_cgroup_commit_charge+0x78/0x4b0
[ 7886.233931]  [<ffffffffc09fa4cc>] ? zpl_iterate+0x4c/0x70 [zfs]
[ 7886.233963]  [<ffffffffbec1678b>] ? iterate_dir+0x16b/0x190
[ 7886.233994]  [<ffffffffbec16c78>] ? SyS_getdents+0x98/0x120
[ 7886.234026]  [<ffffffffbec169e0>] ? fillonedir+0xe0/0xe0
[ 7886.234058]  [<ffffffffbf00627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 7886.234091] Code: 24 49 8d bf b8 00 00 00 0f 84 43 03 00 00 e8 39 ba 10 fe 8b 04 24 e9 e6 fd ff ff 48 8b 43                                               18 b9 8f 01 00 00 31 f6 bf 30 01 00 00 <48> 8b 10 48 8b 40 08 48 89 54 24 28 48 c7 c2 d8 ff a6 c0 48 89
[ 7886.234308] RIP  [<ffffffffc09b4e6f>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 7886.234386]  RSP <ffffa2134466fb38>
[ 7886.234413] CR2: 0000000000000000
[ 7886.234683] ---[ end trace e97ec28f27bbc75b ]---

Log from the AMD Ryzen System:

[ 3377.636533] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 3377.636629] IP: zap_lockdir_impl+0x326/0x750 [zfs]
[ 3377.636639] PGD 0 
[ 3377.636640] P4D 0 

[ 3377.636653] Oops: 0000 [#1] PREEMPT SMP
[ 3377.636660] Modules linked in: zfs(P) ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter zunicode(P) nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi arc4 snd_hda_codec_realtek snd_hda_codec_generic iwlmvm mac80211 btusb btrtl btbcm btintel zcommon(P) snd_hda_intel bluetooth znvpair(P) iwlwifi zavl(P) snd_hda_codec icp(P) snd_hda_core ccp spl cfg80211 snd_hwdep rng_core nouveau drbg ansi_cprng amd64_edac_mod video joydev edac_mce_amd ttm ecdh_generic kvm_amd drm_kms_helper rfkill sg drm nvidiafb snd_pcm snd_timer efi_pstore kvm wmi_bmof mxm_wmi evdev snd soundcore irqbypass pcspkr efivars sp5100_tco shpchp wmi button acpi_cpufreq nct6775 hwmon_vid parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod raid10
[ 3377.636768]  multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 sd_mod hid_lenovo hid_generic usbhid hid raid1 md_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd i2c_piix4 ahci nvme libahci igb e1000e nvme_core dca xhci_pci ptp libata pps_core xhci_hcd scsi_mod usbcore gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core [last unloaded: zfs]
[ 3377.636835] CPU: 4 PID: 6018 Comm: rsync Tainted: P                4.13.0ct-zfs+ #1
[ 3377.636844] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P3.10 08/25/2017
[ 3377.636854] task: ffff9b92b8fae000 task.stack: ffffaf5521cc0000
[ 3377.636929] RIP: 0010:zap_lockdir_impl+0x326/0x750 [zfs]
[ 3377.636937] RSP: 0018:ffffaf5521cc3968 EFLAGS: 00010246
[ 3377.636945] RAX: 0000000000000000 RBX: ffff9b933fb37e80 RCX: 000000000000018f
[ 3377.636953] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000128
[ 3377.636961] RBP: ffffaf5521cc3aa0 R08: 0000000000000001 R09: 0000000000000001
[ 3377.636969] R10: 2000000000000000 R11: 0000008000000000 R12: 0000000000000002
[ 3377.636977] R13: ffff9b9407fea800 R14: 0000000000000001 R15: 0000000000000000
[ 3377.636986] FS:  00007fe50afbae80(0000) GS:ffff9b941e700000(0000) knlGS:0000000000000000
[ 3377.636995] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3377.637003] CR2: 0000000000000000 CR3: 0000000fc66e7000 CR4: 00000000003406e0
[ 3377.637011] Call Trace:
[ 3377.637075]  ? dnode_rele_and_unlock+0x6a/0xc0 [zfs]
[ 3377.637148]  ? zap_lockdir+0xb6/0xe0 [zfs]
[ 3377.637217]  ? zap_lookup_norm+0x7c/0x100 [zfs]
[ 3377.637285]  ? zap_lookup+0x2f/0x40 [zfs]
[ 3377.637354]  ? zfs_dirent_lock+0x619/0x650 [zfs]
[ 3377.637364]  ? preempt_count_add+0x56/0xa0
[ 3377.637373]  ? _raw_spin_lock+0x13/0x40
[ 3377.637380]  ? _raw_spin_unlock+0x16/0x40
[ 3377.637449]  ? zfs_dirlook+0xa1/0x2c0 [zfs]
[ 3377.637519]  ? zfs_lookup+0x37b/0x3e0 [zfs]
[ 3377.637588]  ? zpl_lookup+0xe8/0x210 [zfs]
[ 3377.637596]  ? __d_lookup+0x78/0x120
[ 3377.637605]  ? path_openat+0x10e9/0x1420
[ 3377.637614]  ? do_filp_open+0x94/0x100
[ 3377.637623]  ? ___cache_free+0x45/0x310
[ 3377.637631]  ? __check_object_size+0xc0/0x1c0
[ 3377.637642]  ? _raw_spin_unlock+0x16/0x40
[ 3377.637649]  ? __alloc_fd+0xaf/0x160
[ 3377.637657]  ? do_sys_open+0x142/0x240
[ 3377.637665]  ? entry_SYSCALL_64_fastpath+0x1e/0xa9
[ 3377.637673] Code: 24 49 8d bf b8 00 00 00 0f 84 42 03 00 00 e8 72 3e dd ee 8b 04 24 e9 bd fd ff ff 48 8b 43 18 b9 8f 01 00 00 31 f6 bf 28 01 00 00 <48> 8b 10 48 8b 40 08 48 89 54 24 28 48 c7 c2 d8 e7 39 c1 48 89 
[ 3377.637789] RIP: zap_lockdir_impl+0x326/0x750 [zfs] RSP: ffffaf5521cc3968
[ 3377.637797] CR2: 0000000000000000
[ 3377.643254] ---[ end trace f388486b26180082 ]---
[ 3378.151244] BUG: unable to handle kernel paging request at fffffffef89e7070
[ 3378.151275] IP: virt_efi_get_next_variable+0x9b/0x180
[ 3378.151286] PGD b8260c067 
[ 3378.151287] P4D b8260c067 
[ 3378.151294] PUD 0 

[ 3378.151313] Oops: 0000 [#2] PREEMPT SMP
[ 3378.152012] Modules linked in: zfs(P) ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter zunicode(P) nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi arc4 snd_hda_codec_realtek snd_hda_codec_generic iwlmvm mac80211 btusb btrtl btbcm btintel zcommon(P) snd_hda_intel bluetooth znvpair(P) iwlwifi zavl(P) snd_hda_codec icp(P) snd_hda_core ccp spl cfg80211 snd_hwdep rng_core nouveau drbg ansi_cprng amd64_edac_mod video joydev edac_mce_amd ttm ecdh_generic kvm_amd drm_kms_helper rfkill sg drm nvidiafb snd_pcm snd_timer efi_pstore kvm wmi_bmof mxm_wmi evdev snd soundcore irqbypass pcspkr efivars sp5100_tco shpchp wmi button acpi_cpufreq nct6775 hwmon_vid parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod raid10
[ 3378.155138]  multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 sd_mod hid_lenovo hid_generic usbhid hid raid1 md_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd i2c_piix4 ahci nvme libahci igb e1000e nvme_core dca xhci_pci ptp libata pps_core xhci_hcd scsi_mod usbcore gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core [last unloaded: zfs]
[ 3378.158270] CPU: 4 PID: 830 Comm: kworker/4:2 Tainted: P      D         4.13.0ct-zfs+ #1
[ 3378.159248] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P3.10 08/25/2017
[ 3378.160579] Workqueue: events efivar_update_sysfs_entries [efivars]
[ 3378.161230] task: ffff9b940f310080 task.stack: ffffaf5507bf4000
[ 3378.162202] RIP: 0010:virt_efi_get_next_variable+0x9b/0x180
[ 3378.163172] RSP: 0018:ffffaf5507bf7d80 EFLAGS: 00010246
[ 3378.164147] RAX: fffffffef89e7018 RBX: ffff9b92d4264000 RCX: ffffaf5507bf7de0
[ 3378.165118] RDX: ffff9b92d4264000 RSI: ffffaf5507bf7dd8 RDI: 0000000000000246
[ 3378.166076] RBP: ffffaf5507bf7de0 R08: ffffffffb0c09000 R09: 0000000b8260c000
[ 3378.167021] R10: 00000000dd1920ff R11: 0000000000000000 R12: ffffaf5507bf7dd8
[ 3378.167971] R13: 8000000000000015 R14: 0000000000000246 R15: ffffffffb0cc8a88
[ 3378.168927] FS:  0000000000000000(0000) GS:ffff9b941e700000(0000) knlGS:0000000000000000
[ 3378.169885] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3378.170848] CR2: fffffffef89e7070 CR3: 0000000b82609000 CR4: 00000000003406e0
[ 3378.171789] Call Trace:
[ 3378.172936]  ? efivar_init+0x11e/0x3b0
[ 3378.173803]  ? copy_out_compat+0xc0/0xc0 [efivars]
[ 3378.174646]  ? finish_task_switch+0x80/0x220
[ 3378.175458]  ? __update_idle_core+0x20/0xb0
[ 3378.176236]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[ 3378.177399]  ? process_one_work+0x178/0x3b0
[ 3378.177922]  ? worker_thread+0x4a/0x3b0
[ 3378.178599]  ? kthread+0xfc/0x130
[ 3378.179329]  ? process_one_work+0x3b0/0x3b0
[ 3378.180049]  ? kthread_create_on_node+0x70/0x70
[ 3378.180774]  ? ret_from_fork+0x25/0x30
[ 3378.181487] Code: d5 b5 ff e8 a8 1a af ff 80 3d 99 90 96 00 00 75 71 9c 58 0f 1f 44 00 00 49 89 c6 48 8b 05 8e 13 7e 00 48 89 e9 48 89 da 4c 89 e6 <48> 8b 40 58 48 8b 78 50 e8 18 d8 b2 ff 48 c7 c6 5d 0b a5 b0 4c 
[ 3378.182268] RIP: virt_efi_get_next_variable+0x9b/0x180 RSP: ffffaf5507bf7d80
[ 3378.183036] CR2: fffffffef89e7070
[ 3378.183796] ---[ end trace f388486b26180083 ]---
[ 3378.184552] BUG: scheduling while atomic: kworker/4:2/830/0x00000003
[ 3378.184554] Modules linked in: zfs(P) ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter zunicode(P) nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi arc4 snd_hda_codec_realtek snd_hda_codec_generic iwlmvm mac80211 btusb btrtl btbcm btintel zcommon(P) snd_hda_intel bluetooth znvpair(P) iwlwifi zavl(P) snd_hda_codec icp(P) snd_hda_core ccp spl cfg80211 snd_hwdep rng_core nouveau drbg ansi_cprng amd64_edac_mod video joydev edac_mce_amd ttm ecdh_generic kvm_amd drm_kms_helper rfkill sg drm nvidiafb snd_pcm snd_timer efi_pstore kvm wmi_bmof mxm_wmi evdev snd soundcore irqbypass pcspkr efivars sp5100_tco shpchp wmi button acpi_cpufreq nct6775 hwmon_vid parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod raid10
[ 3378.184660]  multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 sd_mod hid_lenovo hid_generic usbhid hid raid1 md_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd i2c_piix4 ahci nvme libahci igb e1000e nvme_core dca xhci_pci ptp libata pps_core xhci_hcd scsi_mod usbcore gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core [last unloaded: zfs]
[ 3378.184699] Preemption disabled at:
[ 3378.184703] [<ffffffffb052cfb3>] virt_efi_get_next_variable+0x73/0x180
[ 3378.184716] CPU: 4 PID: 830 Comm: kworker/4:2 Tainted: P      D         4.13.0ct-zfs+ #1
[ 3378.184719] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P3.10 08/25/2017
[ 3378.184727] Workqueue: events efivar_update_sysfs_entries [efivars]
[ 3378.184732] Call Trace:
[ 3378.184737]  ? dump_stack+0x5c/0x86
[ 3378.184741]  ? virt_efi_get_next_variable+0x73/0x180
[ 3378.184745]  ? __schedule_bug+0x72/0xc0
[ 3378.184749]  ? __schedule+0x675/0x8b0
[ 3378.184754]  ? schedule+0x3c/0x90
[ 3378.184757]  ? schedule_timeout+0x225/0x3b0
[ 3378.184763]  ? __down_interruptible+0x90/0x120
[ 3378.184768]  ? down_interruptible+0x51/0x60
[ 3378.184772]  ? efivar_entry_set_safe+0x7a/0x1e0
[ 3378.184778]  ? efi_pstore_write+0x11e/0x160 [efi_pstore]
[ 3378.184783]  ? build_tree+0x1e4/0x440
[ 3378.184791]  ? pstore_dump+0x1a6/0x360
[ 3378.184798]  ? kmsg_dump+0xc7/0xf0
[ 3378.184803]  ? oops_end+0x86/0xe0
[ 3378.184807]  ? no_context+0x1dd/0x420
[ 3378.184813]  ? page_fault+0x28/0x30
[ 3378.184818]  ? virt_efi_get_next_variable+0x9b/0x180
[ 3378.184823]  ? efivar_init+0x11e/0x3b0
[ 3378.184827]  ? copy_out_compat+0xc0/0xc0 [efivars]
[ 3378.184832]  ? finish_task_switch+0x80/0x220
[ 3378.184836]  ? __update_idle_core+0x20/0xb0
[ 3378.184841]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[ 3378.184846]  ? process_one_work+0x178/0x3b0
[ 3378.184849]  ? worker_thread+0x4a/0x3b0
[ 3378.184854]  ? kthread+0xfc/0x130
[ 3378.184858]  ? process_one_work+0x3b0/0x3b0
[ 3378.184862]  ? kthread_create_on_node+0x70/0x70
[ 3378.184866]  ? ret_from_fork+0x25/0x30
[ 3378.184876] ------------[ cut here ]------------
[ 3378.184883] WARNING: CPU: 4 PID: 830 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x3f/0x470
[ 3378.184885] Modules linked in: zfs(P) ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter zunicode(P) nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi arc4 snd_hda_codec_realtek snd_hda_codec_generic iwlmvm mac80211 btusb btrtl btbcm btintel zcommon(P) snd_hda_intel bluetooth znvpair(P) iwlwifi zavl(P) snd_hda_codec icp(P) snd_hda_core ccp spl cfg80211 snd_hwdep rng_core nouveau drbg ansi_cprng amd64_edac_mod video joydev edac_mce_amd ttm ecdh_generic kvm_amd drm_kms_helper rfkill sg drm nvidiafb snd_pcm snd_timer efi_pstore kvm wmi_bmof mxm_wmi evdev snd soundcore irqbypass pcspkr efivars sp5100_tco shpchp wmi button acpi_cpufreq nct6775 hwmon_vid parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod raid10
[ 3378.184930]  multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 sd_mod hid_lenovo hid_generic usbhid hid raid1 md_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd i2c_piix4 ahci nvme libahci igb e1000e nvme_core dca xhci_pci ptp libata pps_core xhci_hcd scsi_mod usbcore gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core [last unloaded: zfs]
[ 3378.184955] CPU: 4 PID: 830 Comm: kworker/4:2 Tainted: P      D W       4.13.0ct-zfs+ #1
[ 3378.184957] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P3.10 08/25/2017
[ 3378.184961] Workqueue: events efivar_update_sysfs_entries [efivars]
[ 3378.184964] task: ffff9b940f310080 task.stack: ffffaf5507bf4000
[ 3378.184968] RIP: 0010:rcu_note_context_switch+0x3f/0x470
[ 3378.184970] RSP: 0018:ffffaf5507bf7820 EFLAGS: 00010002
[ 3378.184972] RAX: 0000000000000001 RBX: ffff9b940f310080 RCX: 0000000080000003
[ 3378.184974] RDX: 0000000080000001 RSI: ffffffffb0a506a2 RDI: ffffffffb0a20e55
[ 3378.184976] RBP: 0000000000000000 R08: 0000000000000064 R09: 00000000000004f1
[ 3378.184978] R10: ffffffffb0c06a80 R11: 0000000000000000 R12: ffff9b941e71adc0
[ 3378.184980] R13: ffff9b940f310080 R14: 0000000000000000 R15: ffff9b940e354000
[ 3378.184982] FS:  0000000000000000(0000) GS:ffff9b941e700000(0000) knlGS:0000000000000000
[ 3378.184984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3378.184986] CR2: fffffffef89e7070 CR3: 0000000b82609000 CR4: 00000000003406e0
[ 3378.184988] Call Trace:
[ 3378.184992]  ? __schedule+0x97/0x8b0
[ 3378.184995]  ? schedule+0x3c/0x90
[ 3378.184999]  ? schedule_timeout+0x225/0x3b0
[ 3378.185002]  ? __down_interruptible+0x90/0x120
[ 3378.185005]  ? down_interruptible+0x51/0x60
[ 3378.185008]  ? efivar_entry_set_safe+0x7a/0x1e0
[ 3378.185012]  ? efi_pstore_write+0x11e/0x160 [efi_pstore]
[ 3378.185015]  ? build_tree+0x1e4/0x440
[ 3378.185019]  ? pstore_dump+0x1a6/0x360
[ 3378.185023]  ? kmsg_dump+0xc7/0xf0
[ 3378.185026]  ? oops_end+0x86/0xe0
[ 3378.185029]  ? no_context+0x1dd/0x420
[ 3378.185033]  ? page_fault+0x28/0x30
[ 3378.185036]  ? virt_efi_get_next_variable+0x9b/0x180
[ 3378.185039]  ? efivar_init+0x11e/0x3b0
[ 3378.185042]  ? copy_out_compat+0xc0/0xc0 [efivars]
[ 3378.185045]  ? finish_task_switch+0x80/0x220
[ 3378.185049]  ? __update_idle_core+0x20/0xb0
[ 3378.185052]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[ 3378.185056]  ? process_one_work+0x178/0x3b0
[ 3378.185059]  ? worker_thread+0x4a/0x3b0
[ 3378.185062]  ? kthread+0xfc/0x130
[ 3378.185064]  ? process_one_work+0x3b0/0x3b0
[ 3378.185067]  ? kthread_create_on_node+0x70/0x70
[ 3378.185070]  ? ret_from_fork+0x25/0x30
[ 3378.185073] Code: 10 0f 1f 44 00 00 e8 f1 df ff ff 65 48 8b 1c 25 80 d1 00 00 40 84 ed 8b 83 f8 03 00 00 0f 85 ee 00 00 00 85 c0 0f 8e ee 00 00 00 <0f> ff 80 bb fc 03 00 00 00 0f 84 67 01 00 00 e8 bd c2 ff ff e8 
[ 3378.185111] ---[ end trace f388486b26180084 ]---
[ 3382.080466] ------------[ cut here ]------------
[ 3382.080477] WARNING: CPU: 4 PID: 5651 at arch/x86/kernel/fpu/core.c:46 __kernel_fpu_begin+0xb1/0xd0
[ 3382.080480] Modules linked in: zfs(P) ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter zunicode(P) nls_ascii nls_cp437 vfat fat snd_hda_codec_hdmi arc4 snd_hda_codec_realtek snd_hda_codec_generic iwlmvm mac80211 btusb btrtl btbcm btintel zcommon(P) snd_hda_intel bluetooth znvpair(P) iwlwifi zavl(P) snd_hda_codec icp(P) snd_hda_core ccp spl cfg80211 snd_hwdep rng_core nouveau drbg ansi_cprng amd64_edac_mod video joydev edac_mce_amd ttm ecdh_generic kvm_amd drm_kms_helper rfkill sg drm nvidiafb snd_pcm snd_timer efi_pstore kvm wmi_bmof mxm_wmi evdev snd soundcore irqbypass pcspkr efivars sp5100_tco shpchp wmi button acpi_cpufreq nct6775 hwmon_vid parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto algif_skcipher af_alg dm_crypt dm_mod raid10
[ 3382.080538]  multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 sd_mod hid_lenovo hid_generic usbhid hid raid1 md_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd i2c_piix4 ahci nvme libahci igb e1000e nvme_core dca xhci_pci ptp libata pps_core xhci_hcd scsi_mod usbcore gpio_amdpt gpio_generic i2c_designware_platform i2c_designware_core [last unloaded: zfs]
[ 3382.080575] CPU: 4 PID: 5651 Comm: z_wr_iss Tainted: P      D W       4.13.0ct-zfs+ #1
[ 3382.080578] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P3.10 08/25/2017
[ 3382.080582] task: ffff9b93ff4e6000 task.stack: ffffaf550e4dc000
[ 3382.080586] RIP: 0010:__kernel_fpu_begin+0xb1/0xd0
[ 3382.080589] RSP: 0018:ffffaf550e4df9b0 EFLAGS: 00010202
[ 3382.080593] RAX: 0000000080000001 RBX: ffffaf55322dd000 RCX: ffff9b93ff4e6000
[ 3382.080596] RDX: 0000000000000000 RSI: ffffaf55322dd000 RDI: ffffffffc07510fd
[ 3382.080599] RBP: ffffaf55322e4c00 R08: ffffaf550e4dfaa0 R09: 0000000000000000
[ 3382.080601] R10: ffff9b93ffaae000 R11: ffffaf550e4dfd48 R12: ffffaf550e4dfac0
[ 3382.080604] R13: 0000000000007c00 R14: 0000000000000000 R15: ffffffffc13a1a20
[ 3382.080608] FS:  0000000000000000(0000) GS:ffff9b941e700000(0000) knlGS:0000000000000000
[ 3382.080612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3382.080614] CR2: fffffffef89e7070 CR3: 0000000b82609000 CR4: 00000000003406e0
[ 3382.080616] Call Trace:
[ 3382.080626]  ? fletcher_4_avx2_native+0x22/0xb0 [zcommon]
[ 3382.080634]  ? abd_fletcher_4_iter+0x79/0xc0 [zcommon]
[ 3382.080641]  ? fletcher_4_incremental_native+0x170/0x170 [zcommon]
[ 3382.080686]  ? abd_iterate_func+0x99/0x110 [zfs]
[ 3382.080747]  ? abd_fletcher_4_native+0x75/0xb0 [zfs]
[ 3382.080754]  ? cache_grow_end+0x14e/0x180
[ 3382.080759]  ? kmem_cache_alloc_node_trace+0x129/0x600
[ 3382.080767]  ? spl_kmem_alloc+0xab/0x1a0 [spl]
[ 3382.080826]  ? zio_checksum_compute+0x186/0x4c0 [zfs]
[ 3382.080881]  ? spa_do_crypt_abd+0x301/0x410 [zfs]
[ 3382.080888]  ? spl_kmem_alloc+0xab/0x1a0 [spl]
[ 3382.080946]  ? zio_push_transform+0x48/0xb0 [zfs]
[ 3382.081004]  ? zio_encrypt+0x5a9/0x740 [zfs]
[ 3382.081062]  ? zio_checksum_generate+0x42/0x60 [zfs]
[ 3382.081116]  ? zio_execute+0x81/0xe0 [zfs]
[ 3382.081124]  ? taskq_thread+0x2d0/0x500 [spl]
[ 3382.081129]  ? wake_up_q+0x70/0x70
[ 3382.081184]  ? zio_reexecute+0x3c0/0x3c0 [zfs]
[ 3382.081188]  ? kthread+0xfc/0x130
[ 3382.081196]  ? taskq_thread_spawn+0x50/0x50 [spl]
[ 3382.081199]  ? kthread_create_on_node+0x70/0x70
[ 3382.081202]  ? kthread_create_on_node+0x70/0x70
[ 3382.081208]  ? ret_from_fork+0x25/0x30
[ 3382.081211] Code: 74 39 b8 ff ff ff ff 89 c2 48 0f c7 2f 31 c0 85 c0 75 1c f3 c3 0f ff eb 9e 48 0f ae 81 00 0c 00 00 66 0f 1f 84 00 00 00 00 00 c3 <0f> ff eb 93 0f ff 66 0f 1f 84 00 00 00 00 00 c3 0f ff eb c3 90 
[ 3382.081254] ---[ end trace f388486b26180085 ]---
[ 3399.159036] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 3399.159043]  Tasks blocked on level-0 rcu_node (CPUs 0-15): P830
[ 3399.159047]  (detected by 12, t=5252 jiffies, g=136347, c=136346, q=39241)
[ 3399.159053] kworker/4:2     S    0   830      2 0x00000000
[ 3399.159065] Workqueue: events efivar_update_sysfs_entries [efivars]
[ 3399.159068] Call Trace:
[ 3399.159076]  ? __schedule+0x3ef/0x8b0
[ 3399.159081]  ? schedule+0x3c/0x90
[ 3399.159084]  ? schedule_timeout+0x225/0x3b0
[ 3399.159088]  ? __down_interruptible+0x90/0x120
[ 3399.159093]  ? down_interruptible+0x51/0x60
[ 3399.159098]  ? efivar_entry_set_safe+0x7a/0x1e0
[ 3399.159103]  ? efi_pstore_write+0x11e/0x160 [efi_pstore]
[ 3399.159108]  ? build_tree+0x1e4/0x440
[ 3399.159114]  ? pstore_dump+0x1a6/0x360
[ 3399.159119]  ? kmsg_dump+0xc7/0xf0
[ 3399.159123]  ? oops_end+0x86/0xe0
[ 3399.159127]  ? no_context+0x1dd/0x420
[ 3399.159131]  ? page_fault+0x28/0x30
[ 3399.159135]  ? virt_efi_get_next_variable+0x9b/0x180
[ 3399.159139]  ? efivar_init+0x11e/0x3b0
[ 3399.159142]  ? copy_out_compat+0xc0/0xc0 [efivars]
[ 3399.159146]  ? finish_task_switch+0x80/0x220
[ 3399.159150]  ? __update_idle_core+0x20/0xb0
[ 3399.159154]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[ 3399.159158]  ? process_one_work+0x178/0x3b0
[ 3399.159161]  ? worker_thread+0x4a/0x3b0
[ 3399.159165]  ? kthread+0xfc/0x130
[ 3399.159167]  ? process_one_work+0x3b0/0x3b0
[ 3399.159171]  ? kthread_create_on_node+0x70/0x70
[ 3399.159174]  ? ret_from_fork+0x25/0x30
[ 3399.159177] kworker/4:2     S    0   830      2 0x00000000
[ 3399.159183] Workqueue: events efivar_update_sysfs_entries [efivars]
[ 3399.159185] Call Trace:
[ 3399.159188]  ? __schedule+0x3ef/0x8b0
[ 3399.159192]  ? schedule+0x3c/0x90
[ 3399.159195]  ? schedule_timeout+0x225/0x3b0
[ 3399.159199]  ? __down_interruptible+0x90/0x120
[ 3399.159202]  ? down_interruptible+0x51/0x60
[ 3399.159205]  ? efivar_entry_set_safe+0x7a/0x1e0
[ 3399.159209]  ? efi_pstore_write+0x11e/0x160 [efi_pstore]
[ 3399.159212]  ? build_tree+0x1e4/0x440
[ 3399.159217]  ? pstore_dump+0x1a6/0x360
[ 3399.159221]  ? kmsg_dump+0xc7/0xf0
[ 3399.159224]  ? oops_end+0x86/0xe0
[ 3399.159227]  ? no_context+0x1dd/0x420
[ 3399.159231]  ? page_fault+0x28/0x30
[ 3399.159236]  ? virt_efi_get_next_variable+0x9b/0x180
[ 3399.159239]  ? efivar_init+0x11e/0x3b0
[ 3399.159242]  ? copy_out_compat+0xc0/0xc0 [efivars]
[ 3399.159246]  ? finish_task_switch+0x80/0x220
[ 3399.159249]  ? __update_idle_core+0x20/0xb0
[ 3399.159253]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[ 3399.159256]  ? process_one_work+0x178/0x3b0
[ 3399.159259]  ? worker_thread+0x4a/0x3b0
[ 3399.159263]  ? kthread+0xfc/0x130
[ 3399.159265]  ? process_one_work+0x3b0/0x3b0
[ 3399.159269]  ? kthread_create_on_node+0x70/0x70
[ 3399.159272]  ? ret_from_fork+0x25/0x30
[ 3462.178211] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 3462.178220]  Tasks blocked on level-0 rcu_node (CPUs 0-15): P830
[ 3462.178223]  (detected by 12, t=21007 jiffies, g=136347, c=136346, q=149095)
[ 3462.178230] kworker/4:2     S    0   830      2 0x00000000
[ 3462.178241] Workqueue: events efivar_update_sysfs_entries [efivars]
[ 3462.178244] Call Trace:
[ 3462.178253]  ? __schedule+0x3ef/0x8b0
[ 3462.178257]  ? schedule+0x3c/0x90
[ 3462.178261]  ? schedule_timeout+0x225/0x3b0
[ 3462.178266]  ? __down_interruptible+0x90/0x120
[ 3462.178270]  ? down_interruptible+0x51/0x60
[ 3462.178275]  ? efivar_entry_set_safe+0x7a/0x1e0
[ 3462.178281]  ? efi_pstore_write+0x11e/0x160 [efi_pstore]
[ 3462.178285]  ? build_tree+0x1e4/0x440
[ 3462.178291]  ? pstore_dump+0x1a6/0x360
[ 3462.178296]  ? kmsg_dump+0xc7/0xf0
[ 3462.178300]  ? oops_end+0x86/0xe0
[ 3462.178304]  ? no_context+0x1dd/0x420
[ 3462.178309]  ? page_fault+0x28/0x30
[ 3462.178313]  ? virt_efi_get_next_variable+0x9b/0x180
[ 3462.178317]  ? efivar_init+0x11e/0x3b0
[ 3462.178320]  ? copy_out_compat+0xc0/0xc0 [efivars]
[ 3462.178324]  ? finish_task_switch+0x80/0x220
[ 3462.178328]  ? __update_idle_core+0x20/0xb0
[ 3462.178332]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[ 3462.178336]  ? process_one_work+0x178/0x3b0
[ 3462.178339]  ? worker_thread+0x4a/0x3b0
[ 3462.178342]  ? kthread+0xfc/0x130
[ 3462.178345]  ? process_one_work+0x3b0/0x3b0
[ 3462.178348]  ? kthread_create_on_node+0x70/0x70
[ 3462.178352]  ? ret_from_fork+0x25/0x30
[ 3462.178355] kworker/4:2     S    0   830      2 0x00000000
[ 3462.178360] Workqueue: events efivar_update_sysfs_entries [efivars]
[ 3462.178362] Call Trace:
[ 3462.178366]  ? __schedule+0x3ef/0x8b0
[ 3462.178369]  ? schedule+0x3c/0x90
[ 3462.178372]  ? schedule_timeout+0x225/0x3b0
[ 3462.178376]  ? __down_interruptible+0x90/0x120
[ 3462.178379]  ? down_interruptible+0x51/0x60
[ 3462.178382]  ? efivar_entry_set_safe+0x7a/0x1e0
[ 3462.178386]  ? efi_pstore_write+0x11e/0x160 [efi_pstore]
[ 3462.178389]  ? build_tree+0x1e4/0x440
[ 3462.178394]  ? pstore_dump+0x1a6/0x360
[ 3462.178398]  ? kmsg_dump+0xc7/0xf0
[ 3462.178401]  ? oops_end+0x86/0xe0
[ 3462.178404]  ? no_context+0x1dd/0x420
[ 3462.178409]  ? page_fault+0x28/0x30
[ 3462.178413]  ? virt_efi_get_next_variable+0x9b/0x180
[ 3462.178416]  ? efivar_init+0x11e/0x3b0
[ 3462.178419]  ? copy_out_compat+0xc0/0xc0 [efivars]
[ 3462.178423]  ? finish_task_switch+0x80/0x220
[ 3462.178426]  ? __update_idle_core+0x20/0xb0
[ 3462.178430]  ? efivar_update_sysfs_entries+0x24/0x60 [efivars]
[ 3462.178433]  ? process_one_work+0x178/0x3b0
[ 3462.178436]  ? worker_thread+0x4a/0x3b0
[ 3462.178440]  ? kthread+0xfc/0x130
[ 3462.178443]  ? process_one_work+0x3b0/0x3b0
[ 3462.178446]  ? kthread_create_on_node+0x70/0x70
[ 3462.178449]  ? ret_from_fork+0x25/0x30
cehteh commented 7 years ago

note from the owner of the said 'Ryzen' machine: I experience random freezes just by having the spl and zfs modules loaded (and automount the filesystem, but not accessing them). Nothing logged, just crash or freeze. I am still sorting out if this is due the infamous Ryzen Segfault bug or can be isolated to the ZFS modules. Anyway before starting with ZFS the machine was stable without freezes/kernel crashes even under stress tests, Segfaults can be triggered with the 'kill-ryzen.sh' and the processor is in progress to be RMA'ed.

drescherjm commented 7 years ago

Here is a thread about the Ryzen segfault bug: https://community.amd.com/thread/215773

cytrinox commented 7 years ago

I have found a procedure to always reproduce the problem on my Intel system. With the given encrypted data/test dataset, I run (after reboot):

for i in a b c d e f g; do (rsync -a /usr /data/test/$i &); done
dbench -D /data/test/ 50

After few minutes, zfs crash:

[   10.333113] ZFS: Loaded module v0.7.0-76_gb107b2304, ZFS pool version 5000, ZFS filesystem version 5
[   17.172589] IPv6: ADDRCONF(NETDEV_UP): enp13s0: link is not ready
[   19.692956] e1000e: enp13s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   19.693092] IPv6: ADDRCONF(NETDEV_CHANGE): enp13s0: link becomes ready
[ 2855.951983] perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[ 2914.384128] perf: interrupt took too long (3147 > 3132), lowering kernel.perf_event_max_sample_rate to 63500
[ 3084.585612] perf: interrupt took too long (3935 > 3933), lowering kernel.perf_event_max_sample_rate to 50750
[ 3440.467542] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 3440.467572] IP: [<ffffffffc1065e6f>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 3440.467701] PGD 0

[ 3440.467721] Oops: 0000 [#1] SMP
[ 3440.467731] Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) iTCO_wdt                                               iTCO_vendor_support kvm_intel ppdev evdev kvm parport_pc lpc_ich serio_raw pcspkr irqbypass sg mfd_core parpor                                              t video shpchp i3200_edac button edac_core acpi_cpufreq w83793 w83627hf hwmon_vid coretemp ip_tables x_tables a                                              utofs4 ext4 crc16 jbd2 fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache raid10 raid4                                              56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multi                                              path linear raid1 md_mod sr_mod cdrom sd_mod hid_generic usbhid hid ahci libahci i2c_i801 i2c_smbus libata e100                                              0e ptp scsi_mod uhci_hcd ehci_pci ehci_hcd pps_core usbcore usb_common floppy
[ 3440.468052] CPU: 1 PID: 3403 Comm: dbench Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 3440.468075] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 3440.468094] task: ffff95840978e440 task.stack: ffffa15397d90000
[ 3440.468108] RIP: 0010:[<ffffffffc1065e6f>]  [<ffffffffc1065e6f>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 3440.468195] RSP: 0018:ffffa15397d93ba8  EFLAGS: 00010246
[ 3440.468227] RAX: 0000000000000000 RBX: ffff9583995d2000 RCX: 000000000000018f
[ 3440.468262] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000130
[ 3440.468296] RBP: ffffa15397d93ce0 R08: 0000000000000001 R09: 0000000000000001
[ 3440.468331] R10: 2000000000000000 R11: 0000008000000000 R12: 0000000000000002
[ 3440.468365] R13: ffff958427d99800 R14: 0000000000000001 R15: 0000000000000000
[ 3440.468401] FS:  00007fb84d5ab700(0000) GS:ffff95842fc80000(0000) knlGS:0000000000000000
[ 3440.468454] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3440.468486] CR2: 0000000000000000 CR3: 00000000c927e000 CR4: 00000000000006e0
[ 3440.468520] Stack:
[ 3440.468544]  0000000000000000 00000000995d2080 0000000000000022 ffffffffc1120f20
[ 3440.468603]  ffffffffc0fffede 0000001200000000 000000000000004f 0000000000000022
[ 3440.468661]  0002000000000400 0000002c00000014 00000000000000a8 0000000001000001
[ 3440.468719] Call Trace:
[ 3440.468783]  [<ffffffffc0fffede>] ? dnode_rele_and_unlock+0x4e/0x80 [zfs]
[ 3440.468861]  [<ffffffffc10668de>] ? zap_lockdir+0x8e/0xb0 [zfs]
[ 3440.468937]  [<ffffffffc1066d38>] ? zap_lookup_norm+0x68/0xe0 [zfs]
[ 3440.469013]  [<ffffffffc1066ddf>] ? zap_lookup+0x2f/0x40 [zfs]
[ 3440.469088]  [<ffffffffc106fe45>] ? zfs_dirent_lock+0x645/0x670 [zfs]
[ 3440.469124]  [<ffffffffa9fabfb3>] ? list_lru_del+0xd3/0x120
[ 3440.469199]  [<ffffffffc108a3a6>] ? zfs_rmdir+0x156/0x490 [zfs]
[ 3440.469273]  [<ffffffffc10ababe>] ? zpl_rmdir+0x5e/0xb0 [zfs]
[ 3440.469306]  [<ffffffffaa00e30e>] ? vfs_rmdir+0xae/0x120
[ 3440.469337]  [<ffffffffaa012726>] ? do_rmdir+0x216/0x240
[ 3440.469370]  [<ffffffffaa40627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 3440.469403] Code: 24 49 8d bf b8 00 00 00 0f 84 43 03 00 00 e8 39 aa e5 e8 8b 04 24 e9 e6 fd ff ff 48 8b 43                                               18 b9 8f 01 00 00 31 f6 bf 30 01 00 00 <48> 8b 10 48 8b 40 08 48 89 54 24 28 48 c7 c2 d8 0f 12 c1 48 89
[ 3440.469621] RIP  [<ffffffffc1065e6f>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 3440.469699]  RSP <ffffa15397d93ba8>
[ 3440.469725] CR2: 0000000000000000
[ 3440.470016] ---[ end trace e5c42dd108a3cf94 ]---
cytrinox commented 7 years ago

With my test procedure in above comment, I've run the same test on an unencrypted dataset twice, without any kernel panic. Seems this issue is related to encryption.

tcaputi commented 7 years ago

@cytrinox so is this the complete set of commands to reproduce the issue?:

# zpool create -f -o feature@encryption=enabled data raidz2 sda5 sdb5 sda6 sdb6 sda7 cache sda8 sdb8 log sda9 sdb9
# zfs compression=lz4 data
# zfs create -o keyformat=passphrase -o keylocation=file:///root/.diskkey -o encryption=aes-256-gcm data/test
# for i in a b c d e f g; do (rsync -a /usr /data/test/$i &); done
# dbench -D /data/test/ 50
cytrinox commented 7 years ago

@tcaputi Yes, these are the steps I've run on my test machine. I've skipped the dedup=on step in my last few test runs and zfs always crashed. But I think all you need is high load on the system like copy 3-4 TiB via rsync to an encrypted dataset.

tcaputi commented 7 years ago

Were having trouble reproducing this. If you can get it to happen again, can you include the output of cat /proc/spl/kstat/zfs/dbgmsg?

tcaputi commented 7 years ago

can you also try to reproduce this with #6595 applied?

cytrinox commented 7 years ago

@tcaputi dbgmsg is empty..?

tcaputi commented 7 years ago

I'm sorry I should have been more specific: run these 2 commands to enable the dbgmsg

echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
echo 500000000 > /sys/module/zfs/parameters/zfs_dbgmsg_maxsize
cytrinox commented 7 years ago

Attached is a dump of dbgmsg just after the panic happens (not applied #6595) zfs_log.1.zip

cytrinox commented 7 years ago

@tcaputi Same problem with #6595 applied.

It happens every time when dbench tries to clean up.

[    9.550486] ZFS: Loaded module v0.7.0-79_gf81ef3a0c, ZFS pool version 5000, ZFS filesystem version 5
[   15.092587] IPv6: ADDRCONF(NETDEV_UP): enp13s0: link is not ready
[   17.524958] e1000e: enp13s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   17.525101] IPv6: ADDRCONF(NETDEV_CHANGE): enp13s0: link becomes ready
[  474.582832] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[  529.871724] perf: interrupt took too long (3129 > 3127), lowering kernel.perf_event_max_sample_rate to 63750
[  680.581935] perf: interrupt took too long (3918 > 3911), lowering kernel.perf_event_max_sample_rate to 51000
[ 1054.683936] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1054.683965] IP: [<ffffffffc10044bf>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 1054.684100] PGD 0

[ 1054.684117] Oops: 0000 [#1] SMP
[ 1054.684127] Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) kvm_inte                                              l kvm iTCO_wdt iTCO_vendor_support ppdev evdev irqbypass lpc_ich serio_raw pcspkr parport_pc sg mfd_core parpor                                              t shpchp i3200_edac edac_core video button acpi_cpufreq w83793 w83627hf hwmon_vid coretemp ip_tables x_tables a                                              utofs4 ext4 crc16 jbd2 fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache raid10 raid4                                              56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multi                                              path linear raid1 md_mod hid_generic usbhid hid sr_mod cdrom sd_mod i2c_i801 i2c_smbus ahci e1000e libahci liba                                              ta scsi_mod ptp uhci_hcd ehci_pci ehci_hcd pps_core usbcore usb_common floppy
[ 1054.684417] CPU: 3 PID: 14479 Comm: dbench Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1054.684440] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1054.684458] task: ffff8c86df71aec0 task.stack: ffffb80e0c448000
[ 1054.684472] RIP: 0010:[<ffffffffc10044bf>]  [<ffffffffc10044bf>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 1054.684556] RSP: 0018:ffffb80e0c44bba8  EFLAGS: 00010246
[ 1054.684587] RAX: 0000000000000000 RBX: ffff8c85f65f14e0 RCX: 000000000000018f
[ 1054.684621] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000130
[ 1054.684655] RBP: ffffb80e0c44bce0 R08: 0000000000000001 R09: 0000000000000001
[ 1054.684689] R10: 2000000000000000 R11: 0000008000000000 R12: 0000000000000002
[ 1054.684723] R13: ffff8c86df2b4000 R14: 0000000000000001 R15: 0000000000000000
[ 1054.684757] FS:  00007fa13b4c9700(0000) GS:ffff8c86efd80000(0000) knlGS:0000000000000000
[ 1054.684809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1054.684840] CR2: 0000000000000000 CR3: 0000000119f15000 CR4: 00000000000006e0
[ 1054.684874] Stack:
[ 1054.684897]  0000000000000000 00000000f65f1560 0000000000000022 ffffffffc10bef20
[ 1054.684955]  ffffffffc0f9e00e 0000001200000000 00000000000000a0 0000000000000022
[ 1054.685012]  0002000000000400 0000002c00000014 00000000000000a8 0000000001000001
[ 1054.685069] Call Trace:
[ 1054.685131]  [<ffffffffc0f9e00e>] ? dnode_rele_and_unlock+0x4e/0x80 [zfs]
[ 1054.685209]  [<ffffffffc1004f2e>] ? zap_lockdir+0x8e/0xb0 [zfs]
[ 1054.685284]  [<ffffffffc1005388>] ? zap_lookup_norm+0x68/0xe0 [zfs]
[ 1054.685360]  [<ffffffffc100542f>] ? zap_lookup+0x2f/0x40 [zfs]
[ 1054.685436]  [<ffffffffc100e495>] ? zfs_dirent_lock+0x645/0x670 [zfs]
[ 1054.685471]  [<ffffffffa2dabfb3>] ? list_lru_del+0xd3/0x120
[ 1054.685545]  [<ffffffffc1028a16>] ? zfs_rmdir+0x156/0x490 [zfs]
[ 1054.685620]  [<ffffffffc1049dfe>] ? zpl_rmdir+0x5e/0xb0 [zfs]
[ 1054.685653]  [<ffffffffa2e0e30e>] ? vfs_rmdir+0xae/0x120
[ 1054.685683]  [<ffffffffa2e12726>] ? do_rmdir+0x216/0x240
[ 1054.685716]  [<ffffffffa320627b>] ? system_call_fast_compare_end+0xc/0x9b
[ 1054.685749] Code: 24 49 8d bf b8 00 00 00 0f 84 43 03 00 00 e8 e9 c3 cb e1 8b 04 24 e9 e6 fd ff ff 48 8b 43                                               18 b9 8f 01 00 00 31 f6 bf 30 01 00 00 <48> 8b 10 48 8b 40 08 48 89 54 24 28 48 c7 c2 d8 ef 0b c1 48 89
[ 1054.685966] RIP  [<ffffffffc10044bf>] zap_lockdir_impl+0x2cf/0x700 [zfs]
[ 1054.686044]  RSP <ffffb80e0c44bba8>
[ 1054.686070] CR2: 0000000000000000
[ 1054.687813] ---[ end trace 8667c82b7eac459f ]---
tcaputi commented 7 years ago

I've had a machine running this test all night and haven't had any issues yet....

There is one interesting thing from your debug log which is a failure to decompress some data in the arc. I have a feeling that this might be caused by your L2ARC. If you can reproduce the issue consistently, can you try it again with this patch? At minimum this should confirm where I should be looking. I'll keep trying to reproduce the issue over here in the meantime.

diff --git a/module/zfs/arc.c b/module/zfs/arc.c
index 37b5bde..9f72484 100644
--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -2211,6 +2211,9 @@ arc_buf_fill(arc_buf_t *buf, spa_t *spa, uint64_t dsobj, arc_fill_flags_t flags)
                         * be impossible, but log it anyway so we can debug it.
                         */
                        if (error != 0) {
+#ifdef _KERNEL
+                               dump_stack();
+#endif
                                zfs_dbgmsg(
                                    "hdr %p, compress %d, psize %d, lsize %d",
                                    hdr, arc_hdr_get_compress(hdr),
tcaputi commented 7 years ago

Another good thing to know would be if you can reproduce the issue with compression disabled. Thanks again for the help.

cytrinox commented 7 years ago

I can reproduce this on every test cycle.

With:

# zfs get compression
NAME        PROPERTY     VALUE     SOURCE
data        compression  lz4       local
data/test   compression  off       local

I get the same panic at same function. I will apply your patch and test again, but I don't think it is related to compression. Anything else I can do?

tcaputi commented 7 years ago

I will apply your patch and test again, but I don't think it is related to compression.

The arc_buf_fill message is a bug, regardless of whether or not its causing this issue. I am thinking that this new bug is triggering another latent bug in a different part of the code. Either way, its the only thing I have to go on now, until I can get more information. These are some questions I have at the moment:

cytrinox commented 7 years ago

Well, my test system was a quick builded system to analyze if the problems on the AMD Ryzen system are reproducible on a completly different system (and both systems panic at the same function).

I will continue with testing, apply your patch for dump_stach() and check for arc_buf_fill.

My test system is:

~# lspci
00:00.0 Host bridge: Intel Corporation 3200/3210 Chipset DRAM Controller (rev 01)
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 02)
0d:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)
0f:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
11:04.0 VGA compatible controller: XGI Technology Inc. (eXtreme Graphics Innovation) Z9s/Z9m (XG21 core)
# cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           X3220  @ 2.40GHz
stepping        : 11
microcode       : 0xb6
cpu MHz         : 1600.000
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf eagerfpu pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow vnmi flexpriority dtherm
bugs            :
bogomips        : 4787.63
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

[... 3 more threads/cores skipped]
~# lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda       8:0    0 232.9G  0 disk  
├─sda1    8:1    0    20G  0 part  
│ └─md0   9:0    0    20G  0 raid1 /
├─sda2    8:2    0     1K  0 part  
├─sda5    8:5    0    25G  0 part  
├─sda6    8:6    0    25G  0 part  
├─sda7    8:7    0    25G  0 part  
├─sda8    8:8    0    25G  0 part  
├─sda9    8:9    0    25G  0 part  
├─sda10   8:10   0    25G  0 part  
├─sda11   8:11   0    25G  0 part  
└─sda12   8:12   0    25G  0 part  
sdb       8:16   0 232.9G  0 disk  
├─sdb1    8:17   0    20G  0 part  
│ └─md0   9:0    0    20G  0 raid1 /
├─sdb2    8:18   0     1K  0 part  
├─sdb5    8:21   0    25G  0 part  
├─sdb6    8:22   0    25G  0 part  
├─sdb7    8:23   0    25G  0 part  
├─sdb8    8:24   0    25G  0 part  
├─sdb9    8:25   0    25G  0 part  
├─sdb10   8:26   0    25G  0 part  
├─sdb11   8:27   0    25G  0 part  
└─sdb12   8:28   0    25G  0 part  
~# smartctl -a /dev/sda | head -n 16
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-3-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES
Device Model:     ST3250620NS
Serial Number:    9QE2V6KN
Firmware Version: 3.AEG
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Fri Sep 15 19:10:39 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

# smartctl -a /dev/sdb | head -n 19
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.9.0-3-amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES.2
Device Model:     ST3250310NS
Serial Number:    9SF16QCX
LU WWN Device Id: 5 000c50 0157abe5d
Firmware Version: SN06
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Fri Sep 15 19:11:03 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
# zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 0h14m with 0 errors on Wed Sep 13 20:27:01 2017
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            sda5    ONLINE       0     0     0
            sdb5    ONLINE       0     0     0
            sda6    ONLINE       0     0     0
            sdb6    ONLINE       0     0     0
            sda7    ONLINE       0     0     0
        logs
          sda9      ONLINE       0     0     0
          sdb9      ONLINE       0     0     0
        cache
          sda8      ONLINE       0     0     0
          sdb8      ONLINE       0     0     0

errors: No known data errors
~# cat /proc/meminfo 
MemTotal:        4049140 kB
MemFree:         3794380 kB
MemAvailable:    3711796 kB
Buffers:           12280 kB
Cached:            54676 kB
SwapCached:            0 kB
Active:            58468 kB
Inactive:          25796 kB
Active(anon):      17616 kB
Inactive(anon):     5376 kB
Active(file):      40852 kB
Inactive(file):    20420 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         17084 kB
Mapped:            18340 kB
Shmem:              5688 kB
Slab:              40644 kB
SReclaimable:      12768 kB
SUnreclaim:        27876 kB
KernelStack:        2944 kB
PageTables:         2100 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2024568 kB
Committed_AS:      56920 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      113088 kB
DirectMap2M:     4079616 kB
tcaputi commented 7 years ago

@cytrinox At this point, I'm a bit mystified at how you were able to reproduce the issue your friend had so easily, while I have been running tests for hours on both a physical and virtual machine and can't get anything strange to happen. I imagine both of you had different hardware (obviously different CPUs at least) so I don't think this is a contributing factor. I am running both tests on Ubuntu 14.04 so maybe I need to try installing Debian 9.1 as you have done. Either that or there's something I'm missing on my end.

cehteh commented 7 years ago

On 2017-09-15 18:32, Tom Caputi wrote:

@cytrinox At this point, I'm a bit mystified at how you were able to reproduce the issue your friend had so easily, while I have been running tests for hours on both a physical and virtual machine and can't get anything strange to happen. I imagine both of you had different hardware (obviously different CPUs at least) so I don't think this is a contributing factor. I am running both tests on Ubuntu 14.04 so maybe I need to try installing Debian 9.1 as you have done. Either that or there's something I'm missing on my end.

To add even more confusion: My first encounter with this bug was with a vanilla 4.13 kernel where I merged the zfs/spl sources (./copy_builtin method) and fixed some EXPORT_SYMBOL_GPL() things.

I suspected this rather crude hack the culprit and then reverted to the stock debian kernel and installed cytrinoxs dkms. With the same results.

It took quite some time and stress to cause the issue. Needed to run the rsync for hours.

Also, possibly result of the bug, While the 'big' rsync job was running I was rsync'ing /usr/src to another ZFS dataset in the same pool. That /usr/src mainly contained the linux kernel sources, spl and zfs (all git repos). Trying to compile a kernel there barfed, git fsck complained about errors and the then kernel crashed. When I recall correctly rsyncing after reboot did not fix the issues, But from there one my CPU gone really bad and became unreliable, could be related to that and I have to wait for replacement from AMD.

Christian

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/6639#issuecomment-329864830

cytrinox commented 7 years ago

@tcaputi here the dump_stack just before the panic:

[  810.257849]  0000000000000000 ffffffffa3328574 ffff8e26df8ddc80 ffff8e26f88bd370
[  810.257854]  ffffffffc0b4f660 ffffffffc0b4e9c0 ffff8e26df8ddc80 0000000000000400
[  810.257857]  935ee5940903e97f 0000000000000001 ffff8e2767e7a000 0000000000000001
[  810.257861] Call Trace:
[  810.257869]  [<ffffffffa3328574>] ? dump_stack+0x5c/0x78
[  810.257960]  [<ffffffffc0b4f660>] ? arc_buf_fill+0x1c0/0xad0 [zfs]
[  810.258008]  [<ffffffffc0b4e9c0>] ? arc_get_data_impl.isra.27+0x1a0/0x390 [zfs]
[  810.258040]  [<ffffffffc0b5019f>] ? arc_buf_alloc_impl.isra.35+0x20f/0x330 [zfs]
[  810.258071]  [<ffffffffc0b516ea>] ? arc_read_done+0x15a/0x480 [zfs]
[  810.258074]  [<ffffffffa360380e>] ? mutex_lock+0xe/0x30
[  810.258105]  [<ffffffffc0b522f7>] ? l2arc_read_done+0x8e7/0xba0 [zfs]
[  810.258108]  [<ffffffffa30acd19>] ? dequeue_entity+0x259/0xbd0
[  810.258110]  [<ffffffffa360380e>] ? mutex_lock+0xe/0x30
[  810.258155]  [<ffffffffc0bcd141>] ? vdev_stat_update+0x251/0x4f0 [zfs]
[  810.258199]  [<ffffffffc0c207ea>] ? zio_done+0x30a/0xe50 [zfs]
[  810.258200]  [<ffffffffa360380e>] ? mutex_lock+0xe/0x30
[  810.258245]  [<ffffffffc0c19750>] ? zio_wait_for_children+0x80/0xa0 [zfs]
[  810.258289]  [<ffffffffc0c1b0b1>] ? zio_execute+0x81/0xe0 [zfs]
[  810.258299]  [<ffffffffc07552b6>] ? taskq_thread+0x286/0x460 [spl]
[  810.258302]  [<ffffffffa30a1800>] ? wake_up_q+0x70/0x70
[  810.258346]  [<ffffffffc0c1b030>] ? zio_reexecute+0x3d0/0x3d0 [zfs]
[  810.258352]  [<ffffffffc0755030>] ? task_done+0x90/0x90 [spl]
[  810.258354]  [<ffffffffa30965d7>] ? kthread+0xd7/0xf0
[  810.258356]  [<ffffffffa3096500>] ? kthread_park+0x60/0x60
[  810.258358]  [<ffffffffa36064f5>] ? ret_from_fork+0x25/0x30
tcaputi commented 7 years ago

@cytrinox Yeah. Thats in the L2ARC code which is kind of what I thought. were you able to reproduce this with the L2ARC devices removed?

cytrinox commented 7 years ago

@tcaputi No, without L2ARC I can't reproduce the panic. And I can't reproduce it without encryption but L2ARC enabled. So it seems to be the combination of L2ARC + encryption.

I've build my dkms with debug symbols and your patch, so the address have changed a little bit:

[  982.754522] BUG: unable to handle kernel NULL pointer dereference at           (null)
[  982.754547] IP: [<ffffffffc0d905a0>] zap_lockdir_impl+0x430/0xa90 [zfs]
Reading symbols from /lib/modules/4.9.0-3-amd64/updates/dkms/zfs.ko...done.
(gdb) list *(zap_lockdir_impl+0x430)
0xd45d0 is in zap_lockdir_impl (/var/lib/dkms/zfs/0.7.0/build/module/zfs/zap_micro.c:394).
389     {
390             zap_t *winner;
391             zap_t *zap;
392             int i;
393             uint64_t *zap_hdr = (uint64_t *)db->db_data;
394             uint64_t zap_block_type = zap_hdr[0];
395             uint64_t zap_magic = zap_hdr[1];
396
397             ASSERT3U(MZAP_ENT_LEN, ==, sizeof (mzap_ent_phys_t));
398
(gdb) 

db->db_data points to nowhere, hmm?

I've tried a litte patch:

diff --git a/module/zfs/zap_micro.c b/module/zfs/zap_micro.c
index 3ebf995c6..8a5b10d68 100644
--- a/module/zfs/zap_micro.c
+++ b/module/zfs/zap_micro.c
@@ -500,6 +500,10 @@ zap_lockdir_impl(dmu_buf_t *db, void *tag, dmu_tx_t *tx,

        zap = dmu_buf_get_user(db);
        if (zap == NULL) {
+               if(db->db_data == NULL) {
+                       // ugly ugly ugly fix
+                       return (SET_ERROR(EIO));
+               }
                zap = mzap_open(os, obj, db);
                if (zap == NULL) {
                        /*

No more panics, but multiple I/O errors in rsync and zap_lookup_impl() errors in log (more or less expected, I thought if this is a L2ARC problem, only the error count for cache devs in zpool status will increase but will not report any error to userland "rsync"). In fact, all error counters in zpool status are zero.

8 0 0x01 -1 0 10005265910 1111766114981
timestamp    message 
1505556615   zap_micro.c:930:zap_lookup_impl(): error 2
1505556615   zap_micro.c:930:zap_lookup_impl(): error 2
1505556615   zap_micro.c:930:zap_lookup_impl(): error 2
1505556615   zap_micro.c:930:zap_lookup_impl(): error 2
1505556615   dnode.c:576:dnode_allocate(): os=ffff8e5cd379f800 obj=186267 txg=24457 blocksize=512 ibs=17 dn_slots=1
1505556615   zap_micro.c:930:zap_lookup_impl(): error 2
1505556615   zap_micro.c:930:zap_lookup_impl(): error 2
1505556615   dmu_tx.c:617:dmu_tx_dirty_buf(): found txh type 2 beginblk=0 endblk=7fffffffffffff
cytrinox commented 7 years ago

A little bit more debugging:

diff --git a/module/zfs/zio_compress.c b/module/zfs/zio_compress.c
index 7e44d16e4..d0c48d65b 100644
--- a/module/zfs/zio_compress.c
+++ b/module/zfs/zio_compress.c
@@ -152,5 +152,10 @@ zio_decompress_data(enum zio_compress c, abd_t *src, void *dst,
        int ret = zio_decompress_data_buf(c, tmp, dst, s_len, d_len);
        abd_return_buf(src, tmp, s_len);

+
+       if(ret != 0) {
+                zfs_dbgmsg("XXX decomp_data err: src %p dest %p slen %lu dlen %lu", tmp, dst, s_len, d_len);
+       }
+
        return (ret);
 }

results in:

1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2322749600 dest ffff9e23964b7c00 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22db496800 dest ffff9e22da8a9000 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2322749600 dest ffff9e22e7d46800 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22db496800 dest ffff9e233dddec00 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2343253600 dest ffff9e233dddec00 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2343253600 dest ffff9e23964b7c00 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2303848800 dest ffff9e22da8a9000 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2303848800 dest ffff9e22da8a9000 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2299a76400 dest ffff9e233dddec00 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2299a76400 dest ffff9e233dddec00 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22cd94dc00 dest ffff9e233dddec00 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22cd94dc00 dest ffff9e22e7d46800 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2314ac8000 dest ffff9e239f3f7c00 slen 512 dlen 1024
1505566116   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e2314ac8000 dest ffff9e238e3ccc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239b409000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e239fe65c00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e23a8dcb000 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024
1505566241   zio_compress.c:157:zio_decompress_data(): XXX decomp_data err: src ffff9e22b565d400 dest ffff9e229164dc00 slen 512 dlen 1024

The return code is 1, which means that this is triggered in lz4.c:

        /* invalid compressed buffer size encoded at start */
        if (bufsiz + sizeof (bufsiz) > s_len)
                return (1);

Disclaimer: I've no deep knowlege about zfs internals, for me it seems like at some point (when dbench issues a lot of rmdir() calls) garbage is passed to the decompression routines in src which leads to an unhandled NULL ptr because of missing checks in zap_lockdir_impl().

What can I test/debug next?

tcaputi commented 7 years ago

@cytrinox Thanks a lot for looking into this for me. I've been busy for the weekend, but I should be able to fix this first thing tomorrow morning with the info you've given me. I'll let you know if I need anything else (I might need you to try fixes to ensure everything works).

tcaputi commented 7 years ago

Think I found a fix. Could you try applying this patch? Your issue should be fixed with just the top part. I will add this into #6595

diff --git a/module/zfs/arc.c b/module/zfs/arc.c
index 37b5bde..d4c6be1 100644
--- a/module/zfs/arc.c
+++ b/module/zfs/arc.c
@@ -7911,9 +7911,15 @@ l2arc_untransform(zio_t *zio, l2arc_read_callback_t *cb)
     */
    ASSERT3U(BP_GET_TYPE(bp), !=, DMU_OT_INTENT_LOG);
    ASSERT(MUTEX_HELD(HDR_LOCK(hdr)));
+   ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL);

-   /* If the data was encrypted, decrypt it now */
-   if (HDR_ENCRYPTED(hdr)) {
+   /*
+    * If the data was encrypted, decrypt it now. Note that
+    * we must check the bp here and not the hdr, since the
+    * hdr does not have its encryption parameters updated
+    * until arc_read_done().
+    */
+   if (BP_IS_ENCRYPTED(hdr)) {
        abd_t *eabd = arc_get_data_abd(hdr,
            arc_hdr_size(hdr), hdr);

@@ -8039,7 +8045,16 @@ l2arc_read_done(zio_t *zio)
         */
        abd_free(cb->l2rcb_abd);
        zio->io_size = zio->io_orig_size = arc_hdr_size(hdr);
-       zio->io_abd = zio->io_orig_abd = hdr->b_l1hdr.b_pabd;
+
+       if (BP_IS_ENCRYPTED(&cb->l2rcb_bp) &&
+           (cb->l2rcb_flags & ZIO_FLAG_RAW_ENCRYPT)) {
+           ASSERT(HDR_HAS_RABD(hdr));
+           zio->io_abd = zio->io_orig_abd =
+               hdr->b_crypt_hdr.b_rabd;
+       } else {
+           ASSERT3P(hdr->b_l1hdr.b_pabd, !=, NULL);
+           zio->io_abd = zio->io_orig_abd = hdr->b_l1hdr.b_pabd;
+       }
    }

    ASSERT3P(zio->io_abd, !=, NULL);
cytrinox commented 7 years ago

I've modified if (BP_IS_ENCRYPTED(hdr)) { to if (BP_IS_ENCRYPTED(bp)) {, otherwhise it wouldn't compile and I assume this was a typo anyway.

With this patch the issue seems to be fixed, I can't reproduce the panic. Both rsync and dbench went well. I've enabled dedup (as I have it in some of the first test steps which helped me finding a way to reproduce the panic). But even with dedup, everything runs fine - except one little point:

While watching arch_summary.py --page 3, ~2 out of 10 calls returns this:

L2 ARC Writes:
        Writes Sent: (FAULTED)                          1.69k
          Done Ratio:                   99.94%  1.69k
          Error Ratio:                  0.00%   0

The other calls returned with:

L2 ARC Writes:
        Writes Sent:                    100.00% 1.69k

I'm not sure if this is really something we should worry about. Attached is the log which was captured during Writes Sent: (FAULTED). Are these logged errors critical? Btw, I've reverted my (ugly) changes and have applied your patch on top of the post-encryption PR. dbgmsg.l2arc_fault.2.zip

EDIT: complete summary:

L2 ARC Summary: (HEALTHY)
        Low Memory Aborts:                      2
        Free on Write:                          36.18k
        R/W Clashes:                            30
        Bad Checksums:                          0
        IO Errors:                              0

L2 ARC Size: (Adaptive)                         16.64   GiB
        Compressed:                     45.54%  7.58    GiB
        Header Size:                    0.36%   61.80   MiB

L2 ARC Breakdown:                               317.63k
        Hit Ratio:                      25.59%  81.28k
        Miss Ratio:                     74.41%  236.35k
        Feeds:                                  6.03k

L2 ARC Writes:
        Writes Sent: (FAULTED)                          3.31k
          Done Ratio:                   99.97%  3.31k
          Error Ratio:                  0.00%   0

...what are R/W Clashes?

cytrinox commented 7 years ago

Next one (sorry @tcaputi for making stress tests ;-) ):

I've destroyed the pool, rebooted and created a new pool with the same configuration (expect for the two log devs which should be a mirror vdev instead).

Then, same procedure: I've created the encrypted dataset, run rsync + dbench while creating 3 snapshots during the run (no dedup involved at all steps!). I've send the first snapshot during the run to data/copy:

zfs send -v data/test@snp1 | zfs recv data/copy

This completes successfully and dbench and rsync too.

Then I've executed the same command again (which should fail because data/copy already exists). Immediately a kernel panic happens after this command:

[   11.871708] ZFS: Loaded module v0.7.0-80_ga771b1926 (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[  111.492584] IPv6: ADDRCONF(NETDEV_UP): enp13s0: link is not ready
[  113.752967] e1000e: enp13s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[  113.753121] IPv6: ADDRCONF(NETDEV_CHANGE): enp13s0: link becomes ready
[  400.695656] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[  492.359755] perf: interrupt took too long (3140 > 3128), lowering kernel.perf_event_max_sample_rate to 63500
[  660.770683] perf: interrupt took too long (3943 > 3925), lowering kernel.perf_event_max_sample_rate to 50500
[ 1143.298013] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.298036] PANIC at zio.c:501:zio_decrypt()
[ 1143.298048] Showing stack for process 25343
[ 1143.298052] CPU: 0 PID: 25343 Comm: z_rd_int_4 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1143.298053] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1143.298056]  0000000000000000 ffffffff9d128574 ffffffffc0f6f529 ffffa259c36e7c18
[ 1143.298060]  ffffffffc0448385 ffffffff9d0f7ef5 ffff93a700000028 ffffa259c36e7c28
[ 1143.298064]  ffffa259c36e7bc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1143.298068] Call Trace:
[ 1143.298075]  [<ffffffff9d128574>] ? dump_stack+0x5c/0x78
[ 1143.298090]  [<ffffffffc0448385>] ? spl_panic+0xc5/0x100 [spl]
[ 1143.298093]  [<ffffffff9d0f7ef5>] ? queue_unplugged+0x25/0xa0
[ 1143.298198]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1143.298276]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1143.298353]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1143.298429]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1143.298506]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1143.298583]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1143.298591]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1143.298596]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1143.298672]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1143.298681]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1143.298684]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1143.298686]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1143.298691]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1143.303386] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.303406] PANIC at zio.c:501:zio_decrypt()
[ 1143.303418] Showing stack for process 32230
[ 1143.303422] CPU: 1 PID: 32230 Comm: z_rd_int_4 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1143.303423] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1143.303425]  0000000000000000 ffffffff9d128574 ffffffffc0f6f529 ffffa259c6effc18
[ 1143.303429]  ffffffffc0448385 ffffffff9d0f7ef5 ffff93a700000028 ffffa259c6effc28
[ 1143.303433]  ffffa259c6effbc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1143.303436] Call Trace:
[ 1143.303441]  [<ffffffff9d128574>] ? dump_stack+0x5c/0x78
[ 1143.303451]  [<ffffffffc0448385>] ? spl_panic+0xc5/0x100 [spl]
[ 1143.303454]  [<ffffffff9d0f7ef5>] ? queue_unplugged+0x25/0xa0
[ 1143.303458]  [<ffffffff9ce2e905>] ? sched_clock+0x5/0x10
[ 1143.303460]  [<ffffffff9cea6aa2>] ? sched_clock_local+0x12/0x80
[ 1143.303531]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1143.303608]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1143.303685]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1143.303761]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1143.303837]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1143.303914]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1143.303922]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1143.303925]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1143.304001]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1143.304010]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1143.304012]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1143.304015]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1143.304018]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1143.307232] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.307256] PANIC at zio.c:501:zio_decrypt()
[ 1143.307261] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.307263] PANIC at zio.c:501:zio_decrypt()
[ 1143.307264] Showing stack for process 25339
[ 1143.307267] CPU: 0 PID: 25339 Comm: z_rd_int_0 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1143.307268] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1143.307274]  0000000000000000 ffffffff9d128574 ffffffffc0f6f529 ffffa259c36c7c18
[ 1143.307277]  ffffffffc0448385 ffff93a7ea54be00 0000000000000028 ffffa259c36c7c28
[ 1143.307279]  ffffa259c36c7bc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1143.307280] Call Trace:
[ 1143.307285]  [<ffffffff9d128574>] ? dump_stack+0x5c/0x78
[ 1143.307297]  [<ffffffffc0448385>] ? spl_panic+0xc5/0x100 [spl]
[ 1143.307375]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1143.307377] Showing stack for process 32228
[ 1143.307380]  [<ffffffff9d128396>] ? cpumask_next_and+0x26/0x40
[ 1143.307458]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1143.307536]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1143.307612]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1143.307689]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1143.307766]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1143.307774]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1143.307778]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1143.307855]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1143.307863]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1143.307866]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1143.307868]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1143.307872]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1143.307876] CPU: 2 PID: 32228 Comm: z_rd_int_7 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1143.307878] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1143.307880]  0000000000000000 ffffffff9d128574 ffffffffc0f6f529 ffffa259c6eefc18
[ 1143.307886]  ffffffffc0448385
[ 1143.307886] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.307889] PANIC at zio.c:501:zio_decrypt()
[ 1143.307891] Showing stack for process 25346
[ 1143.307916]  0000000000000000 0000000000000028 ffffa259c6eefc28
[ 1143.307919]  ffffa259c6eefbc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1143.307923] Call Trace:
[ 1143.307931]  [<ffffffff9d128574>] ? dump_stack+0x5c/0x78
[ 1143.307946]  [<ffffffffc0448385>] ? spl_panic+0xc5/0x100 [spl]
[ 1143.307950]  [<ffffffff9ce2e905>] ? sched_clock+0x5/0x10
[ 1143.307953]  [<ffffffff9cea6aa2>] ? sched_clock_local+0x12/0x80
[ 1143.308060]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1143.308143]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1143.308225]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1143.308306]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1143.308388]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1143.308468]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1143.308482]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1143.308489]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1143.308568]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1143.308581]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1143.308589]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1143.308594]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1143.308602]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1143.308609] CPU: 0 PID: 25346 Comm: z_rd_int_7 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1143.308615] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1143.308620]  0000000000000000 ffffffff9d128574 ffffffffc0f6f529 ffffa259c36ffc18
[ 1143.308632]  ffffffffc0448385 0000000000000001 ffff93a700000028 ffffa259c36ffc28
[ 1143.308635]  ffffa259c36ffbc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1143.308639] Call Trace:
[ 1143.308642]  [<ffffffff9d128574>] ? dump_stack+0x5c/0x78
[ 1143.308651]  [<ffffffffc0448385>] ? spl_panic+0xc5/0x100 [spl]
[ 1143.308655]  [<ffffffff9ce2e905>] ? sched_clock+0x5/0x10
[ 1143.308657]  [<ffffffff9cea6aa2>] ? sched_clock_local+0x12/0x80
[ 1143.308728]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1143.308810]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1143.308894]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1143.308978]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1143.309058]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1143.309135]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1143.309144]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1143.309147]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1143.309224]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1143.309233]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1143.309237]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1143.309239]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1143.309242]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1143.319745] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.319774] PANIC at zio.c:501:zio_decrypt()
[ 1143.332266] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.332286] PANIC at zio.c:501:zio_decrypt()
[ 1143.332289] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.332290] PANIC at zio.c:501:zio_decrypt()
[ 1143.332295] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.332296] PANIC at zio.c:501:zio_decrypt()
[ 1143.332426] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.332447] PANIC at zio.c:501:zio_decrypt()
[ 1143.332512] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.332547] PANIC at zio.c:501:zio_decrypt()
[ 1143.400327] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.400364] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.400365] PANIC at zio.c:501:zio_decrypt()
[ 1143.400398] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.400400] PANIC at zio.c:501:zio_decrypt()
[ 1143.400491] PANIC at zio.c:501:zio_decrypt()
[ 1143.400527] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.400537] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.400537] PANIC at zio.c:501:zio_decrypt()
[ 1143.400621] PANIC at zio.c:501:zio_decrypt()
[ 1143.405216] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.405234] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.405236] PANIC at zio.c:501:zio_decrypt()
[ 1143.405319] PANIC at zio.c:501:zio_decrypt()
[ 1143.426920] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.426923] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.426924] PANIC at zio.c:501:zio_decrypt()
[ 1143.427030] PANIC at zio.c:501:zio_decrypt()
[ 1143.451746] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.451775] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.451776] PANIC at zio.c:501:zio_decrypt()
[ 1143.451803] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.451804] PANIC at zio.c:501:zio_decrypt()
[ 1143.451827] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.451828] PANIC at zio.c:501:zio_decrypt()
[ 1143.451847] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.451849] PANIC at zio.c:501:zio_decrypt()
[ 1143.451851] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.451852] PANIC at zio.c:501:zio_decrypt()
[ 1143.451894] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.451896] PANIC at zio.c:501:zio_decrypt()
[ 1143.451943] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.451944] PANIC at zio.c:501:zio_decrypt()
[ 1143.452225] PANIC at zio.c:501:zio_decrypt()
[ 1143.475314] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.475320] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.475321] PANIC at zio.c:501:zio_decrypt()
[ 1143.475356] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.475357] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.475358] PANIC at zio.c:501:zio_decrypt()
[ 1143.475359] PANIC at zio.c:501:zio_decrypt()
[ 1143.475368] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.475369] PANIC at zio.c:501:zio_decrypt()
[ 1143.475601] PANIC at zio.c:501:zio_decrypt()
[ 1143.485537] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.485575] PANIC at zio.c:501:zio_decrypt()
[ 1143.485602] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.485604] PANIC at zio.c:501:zio_decrypt()
[ 1143.520719] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520757] PANIC at zio.c:501:zio_decrypt()
[ 1143.520775] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520776] PANIC at zio.c:501:zio_decrypt()
[ 1143.520780] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520781] PANIC at zio.c:501:zio_decrypt()
[ 1143.520833] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520835] PANIC at zio.c:501:zio_decrypt()
[ 1143.520850] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520851] PANIC at zio.c:501:zio_decrypt()
[ 1143.520860] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520861] PANIC at zio.c:501:zio_decrypt()
[ 1143.520900] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520901] PANIC at zio.c:501:zio_decrypt()
[ 1143.520910] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520912] PANIC at zio.c:501:zio_decrypt()
[ 1143.520949] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520951] PANIC at zio.c:501:zio_decrypt()
[ 1143.520958] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520959] PANIC at zio.c:501:zio_decrypt()
[ 1143.520970] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.520971] PANIC at zio.c:501:zio_decrypt()
[ 1143.521030] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.521031] PANIC at zio.c:501:zio_decrypt()
[ 1143.521078] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.521079] PANIC at zio.c:501:zio_decrypt()
[ 1143.530775] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.530813] PANIC at zio.c:501:zio_decrypt()
[ 1143.532596] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.532638] PANIC at zio.c:501:zio_decrypt()
[ 1143.532668] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.532671] PANIC at zio.c:501:zio_decrypt()
[ 1143.532686] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.532687] PANIC at zio.c:501:zio_decrypt()
[ 1143.532717] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.532718] PANIC at zio.c:501:zio_decrypt()
[ 1143.532725] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.532726] PANIC at zio.c:501:zio_decrypt()
[ 1143.532727] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.532728] PANIC at zio.c:501:zio_decrypt()
[ 1143.532803] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.532804] PANIC at zio.c:501:zio_decrypt()
[ 1143.532809] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.532810] PANIC at zio.c:501:zio_decrypt()
[ 1143.533146] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533182] PANIC at zio.c:501:zio_decrypt()
[ 1143.533246] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533253] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533255] PANIC at zio.c:501:zio_decrypt()
[ 1143.533347] PANIC at zio.c:501:zio_decrypt()
[ 1143.533569] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533572] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533574] PANIC at zio.c:501:zio_decrypt()
[ 1143.533669] PANIC at zio.c:501:zio_decrypt()
[ 1143.533710] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533749] PANIC at zio.c:501:zio_decrypt()
[ 1143.533769] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533771] PANIC at zio.c:501:zio_decrypt()
[ 1143.533804] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533805] PANIC at zio.c:501:zio_decrypt()
[ 1143.533841] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533844] PANIC at zio.c:501:zio_decrypt()
[ 1143.533851] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.533852] PANIC at zio.c:501:zio_decrypt()
[ 1143.534003] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.534003] PANIC at zio.c:501:zio_decrypt()
[ 1143.534162] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.534168] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.534169] PANIC at zio.c:501:zio_decrypt()
[ 1143.534192] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.534193] PANIC at zio.c:501:zio_decrypt()
[ 1143.534322] PANIC at zio.c:501:zio_decrypt()
[ 1143.540133] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.540160] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.540161] PANIC at zio.c:501:zio_decrypt()
[ 1143.540257] PANIC at zio.c:501:zio_decrypt()
[ 1143.556647] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1143.556684] PANIC at zio.c:501:zio_decrypt()
[ 1330.144033] INFO: task z_rd_int_0:25339 blocked for more than 120 seconds.
[ 1330.144072]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.144124] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.144177] z_rd_int_0      D    0 25339      2 0x00000000
[ 1330.144182]  ffff93a7e665e000 0000000000000000 ffff93a7e5c9e100 ffff93a7efc18240
[ 1330.144186]  ffff93a7c5655240 ffffa259c36c7a80 ffffffff9d4015d3 0000000000000000
[ 1330.144190]  0000000000000097 ffff93a7efc18240 ffffffffffffff02 ffff93a7e5c9e100
[ 1330.144193] Call Trace:
[ 1330.144201]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.144206]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.144217]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.144313]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.144317]  [<ffffffff9d128396>] ? cpumask_next_and+0x26/0x40
[ 1330.144394]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.144471]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.144546]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.144623]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.144699]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.144707]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.144711]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.144788]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.144796]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.144799]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.144802]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.144805]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.144808] INFO: task z_rd_int_1:25340 blocked for more than 120 seconds.
[ 1330.144843]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.144895] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.144947] z_rd_int_1      D    0 25340      2 0x00000000
[ 1330.144950]  ffff93a7e675a800 0000000000000000 ffff93a7e6789180 ffff93a7efc18240
[ 1330.144953]  ffffffff9da0e500 ffffa259c36cfa80 ffffffff9d4015d3 0000000000000020
[ 1330.144957]  00ffa259c36cfaa0 ffff93a7efc18240 8582c39c9a2eed08 ffff93a7e6789180
[ 1330.144961] Call Trace:
[ 1330.144964]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.144966]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.144975]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.144978]  [<ffffffff9cea0854>] ? check_preempt_curr+0x74/0x80
[ 1330.145047]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.145124]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.145200]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.145276]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.145352]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.145429]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.145437]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.145440]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.145516]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.145524]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.145527]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.145530]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.145532]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.145535] INFO: task z_rd_int_2:25341 blocked for more than 120 seconds.
[ 1330.145569]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.145621] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.145673] z_rd_int_2      D    0 25341      2 0x00000000
[ 1330.145676]  ffff93a7de9f5400 0000000000000000 ffff93a7d3473180 ffff93a7efd18240
[ 1330.145680]  ffff93a7e63d8080 ffffa259c36d7a80 ffffffff9d4015d3 0000000000000020
[ 1330.145683]  00ffa259c36d7aa0 ffff93a7efd18240 8f84e76a06653479 ffff93a7d3473180
[ 1330.145687] Call Trace:
[ 1330.145690]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.145692]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.145701]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.145770]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.145772]  [<ffffffff9ceb8804>] ? __wake_up+0x34/0x50
[ 1330.145849]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.145926]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.146002]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.146078]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.146154]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.146162]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.146165]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.146242]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.146250]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.146252]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.146255]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.146258]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.146260] INFO: task z_rd_int_3:25342 blocked for more than 120 seconds.
[ 1330.146295]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.146347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.146399] z_rd_int_3      D    0 25342      2 0x00000000
[ 1330.146402]  ffff93a7e6725000 ffff93a7e6725000 ffff93a7d52be340 ffff93a7efd98240
[ 1330.146406]  ffff93a75244e080 ffffa259c36dfa80 ffffffff9d4015d3 0000000000000020
[ 1330.146409]  00ffa259c36dfaa0 ffff93a7efd98240 f819cfcf2ea0d704 ffff93a7d52be340
[ 1330.146413] Call Trace:
[ 1330.146415]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.146418]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.146426]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.146495]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.146498]  [<ffffffff9ceb8804>] ? __wake_up+0x34/0x50
[ 1330.146574]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.146651]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.146727]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.146803]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.146880]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.146888]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.146891]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.146967]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.146975]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.146978]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.146980]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.146983]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.146985] INFO: task z_rd_int_4:25343 blocked for more than 120 seconds.
[ 1330.147020]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.147072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.147124] z_rd_int_4      D    0 25343      2 0x00000000
[ 1330.147127]  ffff93a7de9f5400 0000000000000000 ffff93a7e6616200 ffff93a7efc18240
[ 1330.147130]  ffffffff9da0e500 ffffa259c36e7a80 ffffffff9d4015d3 ffffffff9d815a7c
[ 1330.147134]  006f697a28207c7c ffff93a7efc18240 0000000000000286 ffff93a7e6616200
[ 1330.147138] Call Trace:
[ 1330.147141]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.147143]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.147152]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.147221]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.147297]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.147374]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.147450]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.147526]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.147602]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.147611]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.147614]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.147690]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.147698]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.147700]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.147703]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.147706]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.147708] INFO: task z_rd_int_5:25344 blocked for more than 120 seconds.
[ 1330.147743]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.147795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.147847] z_rd_int_5      D    0 25344      2 0x00000000
[ 1330.147850]  ffff93a7e675a800 0000000000000000 ffff93a7d5d10000 ffff93a7efd18240
[ 1330.147853]  ffff93a74292e040 ffffa259c36efa80 ffffffff9d4015d3 0000000000000020
[ 1330.147857]  00ffa259c36efaa0 ffff93a7efd18240 1e4ef3e699a091a3 ffff93a7d5d10000
[ 1330.147860] Call Trace:
[ 1330.147863]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.147866]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.147874]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.147943]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.147946]  [<ffffffff9ceb8804>] ? __wake_up+0x34/0x50
[ 1330.148010]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.148116]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.148195]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.148274]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.148353]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.148365]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.148371]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.148450]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.148461]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.148468]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.148473]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.148479]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.148484] INFO: task z_rd_int_6:25345 blocked for more than 120 seconds.
[ 1330.148523]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.148577] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.148629] z_rd_int_6      D    0 25345      2 0x00000000
[ 1330.148632]  ffff93a7e675a800 0000000000000000 ffff93a7c5654500 ffff93a7efd98240
[ 1330.148636]  ffff93a7eab4ad80 ffffa259c36f7a80 ffffffff9d4015d3 0000000000000020
[ 1330.148640]  00ffa259c36f7aa0 ffff93a7efd98240 2b36c8ad6e7a23e1 ffff93a7c5654500
[ 1330.148643] Call Trace:
[ 1330.148646]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.148649]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.148657]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.148727]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.148803]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.148882]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.148961]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.149039]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.149118]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.149129]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.149135]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.149214]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.149225]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.149230]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.149235]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.149239]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.149246] INFO: task z_rd_int_7:25346 blocked for more than 120 seconds.
[ 1330.149283]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.149335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.149387] z_rd_int_7      D    0 25346      2 0x00000000
[ 1330.149390]  ffff93a7e665e000 0000000000000000 ffff93a7c5655240 ffff93a7efc18240
[ 1330.149394]  ffff93a7e6328340 ffffa259c36ffa80 ffffffff9d4015d3 ffffffff9d815a7c
[ 1330.149397]  006f697a28207c7c ffff93a7efc18240 0000000000000286 ffff93a7c5655240
[ 1330.149401] Call Trace:
[ 1330.149404]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.149406]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.149415]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.149418]  [<ffffffff9ce2e905>] ? sched_clock+0x5/0x10
[ 1330.149420]  [<ffffffff9cea6aa2>] ? sched_clock_local+0x12/0x80
[ 1330.149489]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.149566]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.149644]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.149722]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.149798]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.149875]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.149884]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.149887]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.149964]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.149972]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.149975]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.149978]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.149981]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.149991] INFO: task z_rd_int_7:32228 blocked for more than 120 seconds.
[ 1330.150026]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.150078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.150131] z_rd_int_7      D    0 32228      2 0x00000000
[ 1330.150133]  ffff93a7de9f5400 0000000000000000 ffff93a7d0fb6100 ffff93a7efd18240
[ 1330.150137]  ffff93a7d5d10000 ffffa259c6eefa80 ffffffff9d4015d3 0000000000000000
[ 1330.150142]  0000000000000097 ffff93a7efd18240 ffffffffffffff10 ffff93a7d0fb6100
[ 1330.150146] Call Trace:
[ 1330.150149]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.150151]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.150160]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.150163]  [<ffffffff9ce2e905>] ? sched_clock+0x5/0x10
[ 1330.150165]  [<ffffffff9cea6aa2>] ? sched_clock_local+0x12/0x80
[ 1330.150234]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.150311]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.150389]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.150465]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.150543]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.150619]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.150629]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.150632]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.150708]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.150716]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.150720]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.150723]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.150725]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
[ 1330.150728] INFO: task z_rd_int_4:32230 blocked for more than 120 seconds.
[ 1330.150762]       Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1330.150814] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.150866] z_rd_int_4      D    0 32230      2 0x00000000
[ 1330.150869]  ffff93a7de6aac00 0000000000000000 ffff93a7e52d4500 ffff93a7efc98240
[ 1330.150873]  ffff93a7ea418180 ffffa259c6effa80 ffffffff9d4015d3 0000000000000000
[ 1330.150877]  0000000000000097 ffff93a7efc98240 ffffffffffffff10 ffff93a7e52d4500
[ 1330.150880] Call Trace:
[ 1330.150883]  [<ffffffff9d4015d3>] ? __schedule+0x233/0x6d0
[ 1330.150886]  [<ffffffff9d401aa2>] ? schedule+0x32/0x80
[ 1330.150894]  [<ffffffffc04483ab>] ? spl_panic+0xeb/0x100 [spl]
[ 1330.150898]  [<ffffffff9ce2e905>] ? sched_clock+0x5/0x10
[ 1330.150900]  [<ffffffff9cea6aa2>] ? sched_clock_local+0x12/0x80
[ 1330.150969]  [<ffffffffc0dd4ecf>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1330.151046]  [<ffffffffc0e78545>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1330.151124]  [<ffffffffc0e7b2b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1330.151199]  [<ffffffffc0e82e51>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1330.151277]  [<ffffffffc0e77159>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1330.151353]  [<ffffffffc0e78c54>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1330.151363]  [<ffffffffc04462b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1330.151366]  [<ffffffff9cea1800>] ? wake_up_q+0x70/0x70
[ 1330.151442]  [<ffffffffc0e78b70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1330.151450]  [<ffffffffc0446030>] ? task_done+0x90/0x90 [spl]
[ 1330.151454]  [<ffffffff9ce965d7>] ? kthread+0xd7/0xf0
[ 1330.151457]  [<ffffffff9ce96500>] ? kthread_park+0x60/0x60
[ 1330.151460]  [<ffffffff9d4064f5>] ? ret_from_fork+0x25/0x30
tcaputi commented 7 years ago

@cytrinox

While watching arch_summary.py --page 3, ~2 out of 10 calls returns this:

I'm not 100% convinced arc_summary.py is correct here. I'm looking at the code for it, and it doesn't really seem to make a ton of sense. Perhaps @behlendorf knows a bit more about what the script is trying to say here. Does it output the same thing when you test with unencrypted datasets?

...what are R/W Clashes?

That just means that the arc was trying to read a block that was simultaneously being written out to the L2ARC. Its a small race condition that is simply handled by considering the read to be a cache miss.

I'll look into the crash during the sends. Do you have a set of commands to reproduce the problem? Does the problem reproduce on a simple pool layout (a single disk pool for instance)? Is data/copy also an encrypted dataset?

richardelling commented 7 years ago

The (FAULTED) is a bogus alert. It is expected that writes to l2arc take non-zero time, so it is not unusual for the write to be in-progress when the arcstats snapshot is taken. You can blissfully ignore this alert.

tcaputi commented 7 years ago

@richardelling Thanks for confirming that for me. We should probably open an issue for that somewhere.

cytrinox commented 7 years ago

@tcaputi I'm trying to find a good procedure to reproduce it, but it's more complicated than the initial panic.

So far, I'm unable to reproduce the above panic, if I repeat my steps the kernel completely stops now and only prints a stack trace on the console - but the stack trace is different from the above (see photo attached). Because the machine is dead, I'm unable to get the full trace. I will try to figure out what happens here.

Because the trace mention l2arc_read_done(), I will try a pool with a single vdev + one l2arc vdev.

The steps to reproduce are basically:

So far, this happens on every run. I will try it now with only a single disk + L2ARC. photo_trace

Should we continue this in this issue or should i open a new one?

EDIT: data/copy was NOT an encrypted dataset on the first test run. But on the following tests I always use zfs send -pwv so I expect them to be encrypted, too.

cytrinox commented 7 years ago

@tcaputi Okay, I found a good procedure:

sda5 and sda8 are both 25GiB sized MBR partitions.

root@srv-demo-1:~# zpool create -f -o feature@encryption=enabled data sda5 cache sda8
root@srv-demo-1:~# zfs set compression=lz4 data
root@srv-demo-1:~# zpool status
  pool: data
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        data        ONLINE       0     0     0
          sda5      ONLINE       0     0     0
        cache
          sda8      ONLINE       0     0     0

errors: No known data errors
root@srv-demo-1:~# zfs create -o compression=lz4 -o keyformat=passphrase -o keylocation=file:///root/.diskkey -o encryption=aes-256-gcm data/test
root@srv-demo-1:~# for i in a b c d e; do (rsync -a /usr /data/test/$i &); done # no need for dbench
# waited for > 2GiB size of data/test dataset
root@srv-demo-1:~# zfs snapshot data/test@snap1
root@srv-demo-1:~# zfs snapshot data/test@snap2
root@srv-demo-1:~# zfs snapshot data/test@snap3
root@srv-demo-1:~# zfs send -v data/test@snap1 | zfs recv data/copy
.. wait until finished, now start a second time (rsync still running, thats important!)
root@srv-demo-1:~# zfs send -v data/test@snap1 | zfs recv data/copy
.. PANIC.
root@srv-demo-1:~# zfs list -t all
NAME              USED  AVAIL  REFER  MOUNTPOINT
data             5.42G  18.7G    25K  /data
data/copy        1.61G  18.7G  1.61G  /data/copy
data/copy@snap1     0B      -  1.61G  -
data/test        3.81G  18.7G  3.80G  /data/test
data/test@snap1  4.88M      -  1.70G  -
data/test@snap2  4.17M      -  2.12G  -
data/test@snap3  3.08M      -  2.41G  -

I've skipped the -wp send params because if I use them I always get a system freeze and I find no way to get the trace. So we should focus on the above steps first.

I've attached the full dmesg output.

dmesg.zfssend.1.txt

tcaputi commented 7 years ago

@cytrinox

So far, I'm unable to reproduce the above panic, if I repeat my steps the kernel completely stops now and only prints a stack trace on the console - but the stack trace is different from the above (see photo attached).

Is this with #6595 applied as well? This issue looks very similar to one I already fixed there, but it is possible I missed something.

cytrinox commented 7 years ago

@tcaputi Yes, I've applied #6595 and your patch from this issue.

commit a771b1926c3650c408c4a5fb42643fdf4ad8467a
Author: root <root@srv-demo-1.mpsoft.local>
Date:   Tue Sep 19 09:27:25 2017 +0200

    bugfix 1 L2ARC

commit f81ef3a0c9d3c41b17b3ea939171255cb36c39a5
Merge: b107b2304 5ee5cc6be
Author: root <root@srv-demo-1.mpsoft.local>
Date:   Fri Sep 15 09:26:34 2017 +0200

    Merge remote-tracking branch 'datto/post_encryption'

commit 5ee5cc6bebd827bdf8a0809d85744e78954b34db
Author: Tom Caputi <tcaputi@datto.com>
Date:   Fri Sep 15 02:02:40 2017 -0400

    Small fixes and debugging for slog_015_pos

    Signed-off-by: Tom Caputi <tcaputi@datto.com>
commit b107b2304f8d309662cdaaf7478261c5d66a8769
Author: Daniel Vogelbacher <daniel@vogelbacher.name>
Date:   Tue Sep 12 23:44:37 2017 +0200

    Add missing patch files for contrib/debian

commit 8ac1974b49c7a52cb69d372b5eb68fbd01025dcf
Author: Daniel Vogelbacher <daniel@vogelbacher.name>
Date:   Tue Sep 12 23:22:59 2017 +0200

    Disable removal of test scripts during cleanup

commit 04e13adf322f547eda9e8268df918ff09929e122
Author: Daniel Vogelbacher <daniel@vogelbacher.name>
Date:   Tue Sep 12 22:43:40 2017 +0200

    Remove zpios from debian scrips, add icp

commit 5700ed6598bb5b744f997a16259a3e56e39303d5
Author: Tom Caputi <tcaputi@datto.com>
Date:   Tue Sep 12 16:15:11 2017 -0400

    Encryption patch follow-up

    * PBKDF2 implementation changed to OpenSSL implementation.

    * HKDF implementation moved to its own file and tests
      added to ensure correctness.

    * Removed libzfs's now unnecessary dependency on libzpool
      and libicp.

    * Ztest can now create and test encrypted datasets. This is
      currently disabled until issue #6526 is resolved, but
      otherwise functions as advertised.

    * Several small bug fixes discovered after enabling ztest
      to run on encrypted datasets.

    * Fixed coverity defects added by the encryption patch.

    * Updated man pages for encrypted send / receive behavior.

    * Fixed a bug where encrypted datasets could receive
      DRR_WRITE_EMBEDDED records.

    * Minor code cleanups / consolidation.

    Signed-off-by: Tom Caputi <tcaputi@datto.com>

EDIT: Added full log back to Sep. 12

tcaputi commented 7 years ago

@cytrinox I am seeing something a bit different. I can't reproduce the original bug at all. I can get the send to hang by doing the sends with -wp, but the kernel does not panic. Are you using a debug or non-debug build?

EDIT: OK. Now I have seen it panic the kernel once now. Still haven't seen any problems without -wp yet.

cytrinox commented 7 years ago

@tcaputi I'm using a build with debug symbols but I've not configured any special debug options.

tcaputi commented 7 years ago

I have now seen this issue cause several different kinds of failures, including hard kernel panics, hung sends, and errors in the arc and zio code. I'll see what I can figure out.

cehteh commented 7 years ago

I have a replacement Ryzen processor now and can contribute to testing. @tcaputi do you have a patch series or repository on top of 0.7.2 for testing the newest code?

cytrinox commented 7 years ago

You could use https://github.com/cytrinox/zfs/tree/post-crypto which contains the latest fix for this issue and the post-crypto PR. I don't know if it is possible to cherry pick the whole crypto patches into 0.7.2-branch.

tcaputi commented 7 years ago

I am still working on getting everything fixed. At this point, I have one bug left to run down that I am still trying to resolve. I apologize for the delay, but the good news is that this test has exposed and allowed me to fix a good number of issues with the arc and raw send code. I'll post another update when I have more info.

tcaputi commented 7 years ago

As of now I believe I have resolved all of the problems here locally. It turns out that the crashes here were caused by 4 separate problems, most of which were fixed with 1 liners. I will take the rest of tonight and tomorrow to run some tests to try and ensure that everything is truly resolved. I should have an update commit, rebased on the current master posted to #6595 sometime tomorrow, unless I find anything else that is amiss.

Sorry for the long wait here.

cytrinox commented 7 years ago

@tcaputi That are good news! I can run a test on friday/saturday but I'am out of office next week. I will merge your updated PR into my deb-dkms repository for @cehteh.

In two weeks, I'm continuing with testing.

cehteh commented 7 years ago

i can test right away, the computer here is now fixed and as long it is not 'in production' i have the resources to run tests.

tcaputi commented 7 years ago

I'll get it pushed tomorrow morning. I need to take some time to clean up all of my debugging and rebase so I probably won't have it tonight

tcaputi commented 7 years ago

Pushed the fixes. I'll watch buildbot and make sure everything is ok.

cytrinox commented 7 years ago

@tcaputi I've updated my branch with your commit but can still reproduce the error from https://github.com/zfsonlinux/zfs/issues/6639#issuecomment-330581434 with the same steps described. As soon as I send the same snapshot again, zfs crashes:

[    8.129414] spl: loading out-of-tree module taints kernel.
[    8.131202] SPL: Loaded module v0.7.0-15_g275146c
[    8.907961] icp: module license 'CDDL' taints kernel.
[    8.907963] icp: module license 'CDDL' taints kernel.
[    8.907968] icp: module license 'CDDL' taints kernel.
[    8.907970] icp: module license 'CDDL' taints kernel.
[    8.907971] Disabling lock debugging due to kernel taint
[   11.388180] ZFS: Loaded module v0.7.0-97_g5f88d2c8a (DEBUG mode), ZFS pool version 5000, ZFS filesystem version 5
[   12.664578] IPv6: ADDRCONF(NETDEV_UP): enp13s0: link is not ready
[   14.928957] e1000e: enp13s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   14.929094] IPv6: ADDRCONF(NETDEV_CHANGE): enp13s0: link becomes ready
[  914.750202] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[ 1070.644460] perf: interrupt took too long (3138 > 3133), lowering kernel.perf_event_max_sample_rate to 63500
[ 1263.178234] perf: interrupt took too long (3935 > 3922), lowering kernel.perf_event_max_sample_rate to 50750
[ 1381.676139] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.676161] PANIC at zio.c:501:zio_decrypt()
[ 1381.676173] Showing stack for process 1146
[ 1381.676175] CPU: 0 PID: 1146 Comm: z_rd_int_0 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1381.676176] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1381.676179]  0000000000000000 ffffffffb9528574 ffffffffc0db5619 ffffa796c3943c18
[ 1381.676182]  ffffffffc07cf385 0000000000000001 ffff93a700000028 ffffa796c3943c28
[ 1381.676185]  ffffa796c3943bc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1381.676188] Call Trace:
[ 1381.676195]  [<ffffffffb9528574>] ? dump_stack+0x5c/0x78
[ 1381.676209]  [<ffffffffc07cf385>] ? spl_panic+0xc5/0x100 [spl]
[ 1381.676213]  [<ffffffffb92a0854>] ? check_preempt_curr+0x74/0x80
[ 1381.676320]  [<ffffffffc0c1a24f>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1381.676372]  [<ffffffffc0cbd945>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1381.676424]  [<ffffffffc0cc06b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1381.676475]  [<ffffffffc0cc8251>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1381.676526]  [<ffffffffc0cbc559>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1381.676578]  [<ffffffffc0cbe054>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1381.676584]  [<ffffffffc07cd2b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1381.676586]  [<ffffffffb92a1800>] ? wake_up_q+0x70/0x70
[ 1381.676638]  [<ffffffffc0cbdf70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1381.676643]  [<ffffffffc07cd030>] ? task_done+0x90/0x90 [spl]
[ 1381.676646]  [<ffffffffb92965d7>] ? kthread+0xd7/0xf0
[ 1381.676648]  [<ffffffffb9296500>] ? kthread_park+0x60/0x60
[ 1381.676651]  [<ffffffffb98064f5>] ? ret_from_fork+0x25/0x30
[ 1381.741895] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.741919] PANIC at zio.c:501:zio_decrypt()
[ 1381.741931] Showing stack for process 1713
[ 1381.741935] CPU: 1 PID: 1713 Comm: z_rd_int_2 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1381.741936] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1381.741939]  0000000000000000 ffffffffb9528574 ffffffffc0db5619 ffffa796c722fc18
[ 1381.741944]  ffffffffc07cf385 ffffffffb94f7ef5 ffff93a600000028 ffffa796c722fc28
[ 1381.741947]  ffffa796c722fbc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1381.741951] Call Trace:
[ 1381.741960]  [<ffffffffb9528574>] ? dump_stack+0x5c/0x78
[ 1381.741976]  [<ffffffffc07cf385>] ? spl_panic+0xc5/0x100 [spl]
[ 1381.741979]  [<ffffffffb94f7ef5>] ? queue_unplugged+0x25/0xa0
[ 1381.741982]  [<ffffffffb9528396>] ? cpumask_next_and+0x26/0x40
[ 1381.742091]  [<ffffffffc0c1a24f>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1381.742142]  [<ffffffffc0cbd945>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1381.742194]  [<ffffffffc0cc06b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1381.742245]  [<ffffffffc0cc8251>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1381.742296]  [<ffffffffc0cbc559>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1381.742347]  [<ffffffffc0cbe054>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1381.742352]  [<ffffffffc07cd2b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1381.742356]  [<ffffffffb92a1800>] ? wake_up_q+0x70/0x70
[ 1381.742407]  [<ffffffffc0cbdf70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1381.742412]  [<ffffffffc07cd030>] ? task_done+0x90/0x90 [spl]
[ 1381.742414]  [<ffffffffb92965d7>] ? kthread+0xd7/0xf0
[ 1381.742416]  [<ffffffffb9296500>] ? kthread_park+0x60/0x60
[ 1381.742418]  [<ffffffffb9296500>] ? kthread_park+0x60/0x60
[ 1381.742421]  [<ffffffffb98064f5>] ? ret_from_fork+0x25/0x30
[ 1381.814730] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.814731] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.814734] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.814735] PANIC at zio.c:501:zio_decrypt()
[ 1381.814740] PANIC at zio.c:501:zio_decrypt()
[ 1381.814742] Showing stack for process 1150
[ 1381.814743] Showing stack for process 1151
[ 1381.814746] CPU: 3 PID: 1150 Comm: z_rd_int_3 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1381.814748] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1381.814753]  0000000000000000 ffffffffb9528574 ffffffffc0db5619 ffffa796c395bc18
[ 1381.814755]  ffffffffc07cf385 ffff93a72fd182a8 ffffa79600000028 ffffa796c395bc28
[ 1381.814759]  ffffa796c395bbc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1381.814760] Call Trace:
[ 1381.814769]  [<ffffffffb9528574>] ? dump_stack+0x5c/0x78
[ 1381.814785]  [<ffffffffc07cf385>] ? spl_panic+0xc5/0x100 [spl]
[ 1381.814888]  [<ffffffffc0c1a24f>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1381.814939]  [<ffffffffc0cbd945>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1381.814990]  [<ffffffffc0cc06b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1381.815041]  [<ffffffffc0cc8251>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1381.815042] PANIC at zio.c:501:zio_decrypt()
[ 1381.815093]  [<ffffffffc0cbc559>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1381.815144]  [<ffffffffc0cbe054>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1381.815149]  [<ffffffffc07cd2b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1381.815150] Showing stack for process 1709
[ 1381.815154]  [<ffffffffb92a1800>] ? wake_up_q+0x70/0x70
[ 1381.815205]  [<ffffffffc0cbdf70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1381.815211]  [<ffffffffc07cd030>] ? task_done+0x90/0x90 [spl]
[ 1381.815213]  [<ffffffffb92965d7>] ? kthread+0xd7/0xf0
[ 1381.815215]  [<ffffffffb9296500>] ? kthread_park+0x60/0x60
[ 1381.815219]  [<ffffffffb98064f5>] ? ret_from_fork+0x25/0x30
[ 1381.815222] CPU: 2 PID: 1709 Comm: z_rd_int_0 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1381.815223] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1381.815224]  0000000000000000 ffffffffb9528574 ffffffffc0db5619 ffffa796cc75bc18
[ 1381.815227]  ffffffffc07cf385 0000000000000021 ffff93a700000028 ffffa796cc75bc28
[ 1381.815229]  ffffa796cc75bbc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1381.815232] Call Trace:
[ 1381.815235]  [<ffffffffb9528574>] ? dump_stack+0x5c/0x78
[ 1381.815241]  [<ffffffffc07cf385>] ? spl_panic+0xc5/0x100 [spl]
[ 1381.815244]  [<ffffffffb92a082e>] ? check_preempt_curr+0x4e/0x80
[ 1381.815290]  [<ffffffffc0c1a24f>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1381.815342]  [<ffffffffc0cbd945>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1381.815393]  [<ffffffffc0cc06b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1381.815444]  [<ffffffffc0cc8251>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1381.815495]  [<ffffffffc0cbc559>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1381.815546]  [<ffffffffc0cbe054>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1381.815551]  [<ffffffffc07cd2b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1381.815553]  [<ffffffffb92a1800>] ? wake_up_q+0x70/0x70
[ 1381.815604]  [<ffffffffc0cbdf70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1381.815610]  [<ffffffffc07cd030>] ? task_done+0x90/0x90 [spl]
[ 1381.815612]  [<ffffffffb92965d7>] ? kthread+0xd7/0xf0
[ 1381.815614]  [<ffffffffb9296500>] ? kthread_park+0x60/0x60
[ 1381.815616]  [<ffffffffb98064f5>] ? ret_from_fork+0x25/0x30
[ 1381.815619] CPU: 0 PID: 1151 Comm: z_rd_int_4 Tainted: P           O    4.9.0-3-amd64 #1 Debian 4.9.30-2+deb9u3
[ 1381.815620] Hardware name: extracomputer X7SBL/X7SBL, BIOS 1.2a       12/19/2008
[ 1381.815622]  0000000000000000 ffffffffb9528574 ffffffffc0db5619 ffffa796c3963c18
[ 1381.815626]  ffffffffc07cf385 ffff93a72a54ca00 0000000000000028 ffffa796c3963c28
[ 1381.815628]  ffffa796c3963bc8 7228594649524556 2032203d21207465 2d6f697a28207c7c
[ 1381.815633] Call Trace:
[ 1381.815637]  [<ffffffffb9528574>] ? dump_stack+0x5c/0x78
[ 1381.815646]  [<ffffffffc07cf385>] ? spl_panic+0xc5/0x100 [spl]
[ 1381.815709]  [<ffffffffc0c1a24f>] ? spa_do_crypt_abd+0x10f/0x4b0 [zfs]
[ 1381.815711]  [<ffffffffb9528396>] ? cpumask_next_and+0x26/0x40
[ 1381.815762]  [<ffffffffc0cbd945>] ? zio_decrypt+0x425/0x490 [zfs]
[ 1381.815814]  [<ffffffffc0cc06b6>] ? zio_pop_transforms+0x66/0x80 [zfs]
[ 1381.815865]  [<ffffffffc0cc8251>] ? zio_done+0x2a1/0x1bb0 [zfs]
[ 1381.815917]  [<ffffffffc0cbc559>] ? zio_vdev_io_assess+0x49/0x330 [zfs]
[ 1381.815968]  [<ffffffffc0cbe054>] ? zio_execute+0xe4/0x2e0 [zfs]
[ 1381.815974]  [<ffffffffc07cd2b6>] ? taskq_thread+0x286/0x460 [spl]
[ 1381.815976]  [<ffffffffb92a1800>] ? wake_up_q+0x70/0x70
[ 1381.816028]  [<ffffffffc0cbdf70>] ? zio_taskq_member.isra.7.constprop.11+0x70/0x70 [zfs]
[ 1381.816038]  [<ffffffffc07cd030>] ? task_done+0x90/0x90 [spl]
[ 1381.816042]  [<ffffffffb92965d7>] ? kthread+0xd7/0xf0
[ 1381.816048]  [<ffffffffb9296500>] ? kthread_park+0x60/0x60
[ 1381.816053]  [<ffffffffb98064f5>] ? ret_from_fork+0x25/0x30
[ 1381.820322] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820343] PANIC at zio.c:501:zio_decrypt()
[ 1381.820385] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820394] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820396] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820397] PANIC at zio.c:501:zio_decrypt()
[ 1381.820399] PANIC at zio.c:501:zio_decrypt()
[ 1381.820463] PANIC at zio.c:501:zio_decrypt()
[ 1381.820500] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820523] PANIC at zio.c:501:zio_decrypt()
[ 1381.820576] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820587] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820588] PANIC at zio.c:501:zio_decrypt()
[ 1381.820675] PANIC at zio.c:501:zio_decrypt()
[ 1381.820739] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820775] PANIC at zio.c:501:zio_decrypt()
[ 1381.820864] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820886] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.820888] PANIC at zio.c:501:zio_decrypt()
[ 1381.820975] PANIC at zio.c:501:zio_decrypt()
[ 1381.852356] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.852393] PANIC at zio.c:501:zio_decrypt()
[ 1381.909190] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1381.909227] PANIC at zio.c:501:zio_decrypt()
[ 1382.016267] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.016321] PANIC at zio.c:501:zio_decrypt()
[ 1382.039304] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.039341] PANIC at zio.c:501:zio_decrypt()
[ 1382.044398] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.044403] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.044405] PANIC at zio.c:501:zio_decrypt()
[ 1382.044497] PANIC at zio.c:501:zio_decrypt()
[ 1382.060830] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.060869] PANIC at zio.c:501:zio_decrypt()
[ 1382.131893] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.131896] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.131898] PANIC at zio.c:501:zio_decrypt()
[ 1382.131995] PANIC at zio.c:501:zio_decrypt()
[ 1382.166597] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.166633] PANIC at zio.c:501:zio_decrypt()
[ 1382.192439] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.192477] PANIC at zio.c:501:zio_decrypt()
[ 1382.220823] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.220860] PANIC at zio.c:501:zio_decrypt()
[ 1382.261772] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.261809] PANIC at zio.c:501:zio_decrypt()
[ 1382.281250] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.281253] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.281258] PANIC at zio.c:501:zio_decrypt()
[ 1382.281370] PANIC at zio.c:501:zio_decrypt()
[ 1382.289297] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.289334] PANIC at zio.c:501:zio_decrypt()
[ 1382.296358] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.296394] PANIC at zio.c:501:zio_decrypt()
[ 1382.296493] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.296528] PANIC at zio.c:501:zio_decrypt()
[ 1382.299479] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.299516] PANIC at zio.c:501:zio_decrypt()
[ 1382.301945] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.301953] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.301954] PANIC at zio.c:501:zio_decrypt()
[ 1382.302056] PANIC at zio.c:501:zio_decrypt()
[ 1382.305842] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.305867] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.305868] PANIC at zio.c:501:zio_decrypt()
[ 1382.305926] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.305927] PANIC at zio.c:501:zio_decrypt()
[ 1382.305938] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.305939] PANIC at zio.c:501:zio_decrypt()
[ 1382.306072] PANIC at zio.c:501:zio_decrypt()
[ 1382.321955] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.321991] PANIC at zio.c:501:zio_decrypt()
[ 1382.343147] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343173] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343174] PANIC at zio.c:501:zio_decrypt()
[ 1382.343185] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343187] PANIC at zio.c:501:zio_decrypt()
[ 1382.343204] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343205] PANIC at zio.c:501:zio_decrypt()
[ 1382.343268] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343269] PANIC at zio.c:501:zio_decrypt()
[ 1382.343275] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343276] PANIC at zio.c:501:zio_decrypt()
[ 1382.343289] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343290] PANIC at zio.c:501:zio_decrypt()
[ 1382.343326] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343327] PANIC at zio.c:501:zio_decrypt()
[ 1382.343353] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343353] PANIC at zio.c:501:zio_decrypt()
[ 1382.343403] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343404] PANIC at zio.c:501:zio_decrypt()
[ 1382.343467] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.343468] PANIC at zio.c:501:zio_decrypt()
[ 1382.343862] PANIC at zio.c:501:zio_decrypt()
[ 1382.436331] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.436369] PANIC at zio.c:501:zio_decrypt()
[ 1382.442108] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.442147] PANIC at zio.c:501:zio_decrypt()
[ 1382.451401] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.451437] PANIC at zio.c:501:zio_decrypt()
[ 1382.461862] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.461900] PANIC at zio.c:501:zio_decrypt()
[ 1382.465424] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.465433] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.465434] PANIC at zio.c:501:zio_decrypt()
[ 1382.465528] PANIC at zio.c:501:zio_decrypt()
[ 1382.528509] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.528549] PANIC at zio.c:501:zio_decrypt()
[ 1382.533768] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.533807] PANIC at zio.c:501:zio_decrypt()
[ 1382.538989] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.539025] PANIC at zio.c:501:zio_decrypt()
[ 1382.543889] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.543944] PANIC at zio.c:501:zio_decrypt()
[ 1382.587807] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.587846] PANIC at zio.c:501:zio_decrypt()
[ 1382.632943] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.632982] PANIC at zio.c:501:zio_decrypt()
[ 1382.638618] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.638649] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.638651] PANIC at zio.c:501:zio_decrypt()
[ 1382.638731] PANIC at zio.c:501:zio_decrypt()
[ 1382.706341] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.706381] PANIC at zio.c:501:zio_decrypt()
[ 1382.706393] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.706395] PANIC at zio.c:501:zio_decrypt()
[ 1382.711137] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.711174] PANIC at zio.c:501:zio_decrypt()
[ 1382.715623] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.715662] PANIC at zio.c:501:zio_decrypt()
[ 1382.721871] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.721907] PANIC at zio.c:501:zio_decrypt()
[ 1382.768806] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.768842] PANIC at zio.c:501:zio_decrypt()
[ 1382.803723] VERIFY(ret != 2 || (zio->io_flags & ZIO_FLAG_SPECULATIVE)) failed
[ 1382.803760] PANIC at zio.c:501:zio_decrypt()
cytrinox commented 7 years ago

@tcaputi Maybe this can help you: I've added the following patch to provide a little more info about why VERIFY() fails:

/usr/src/zfs# git diff HEAD^..HEAD
diff --git a/module/zfs/zio.c b/module/zfs/zio.c
index 508011a86..11aac6e39 100644
--- a/module/zfs/zio.c
+++ b/module/zfs/zio.c
@@ -435,6 +435,7 @@ zio_decrypt(zio_t *zio, abd_t *data, uint64_t size)
                            zio->io_abd, tmp, zio->io_size, lsize);
                        if (ret != 0) {
                                ret = SET_ERROR(EIO);
+                               zfs_dbgmsg("debug_error #1: %i", ret);
                                goto error;
                        }
                        ret = zio_crypt_do_indirect_mac_checksum(B_FALSE,
@@ -447,7 +448,10 @@ zio_decrypt(zio_t *zio, abd_t *data, uint64_t size)
                abd_copy(data, zio->io_abd, size);

                if (ret != 0)
+               {
+                       zfs_dbgmsg("debug_error #2: %i", ret);
                        goto error;
+               }

                return;
        }
@@ -470,7 +474,10 @@ zio_decrypt(zio_t *zio, abd_t *data, uint64_t size)
                abd_copy(data, zio->io_abd, size);

                if (ret != 0)
+               {
+                       zfs_dbgmsg("debug_error #3: %i", ret);
                        goto error;
+               }

                return;
        }
@@ -492,11 +499,15 @@ zio_decrypt(zio_t *zio, abd_t *data, uint64_t size)
                abd_copy(data, zio->io_abd, size);

        if (ret != 0)
+       {
+               zfs_dbgmsg("debug_error #4: %i", ret);
                goto error;
+       }

        return;

 error:
+       zfs_dbgmsg("debug_error #5 flags: %i", zio->io_flags);
        /* assert that the key was found unless this was speculative */
        ASSERT(ret != ENOENT || (zio->io_flags & ZIO_FLAG_SPECULATIVE));

The call of spa_do_crypt_abd() returns 2. Here is the log this patch produce when zfs crashes:

1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681942   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681942   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 384
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 8388992
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 8388992
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 384
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 8388992
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 384
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
1506681943   zio.c:503:zio_decrypt(): debug_error #4: 2
1506681943   zio.c:510:zio_decrypt(): debug_error #5 flags: 128
tcaputi commented 7 years ago

@cytrinox Hmmm.... I'm not able to reproduce it with the following script. Can you give this a try?

modprobe -r zfs
modprobe zfs
echo 1 > /sys/module/zfs/parameters/zfs_dbgmsg_enable
echo 500000000 > /sys/module/zfs/parameters/zfs_dbgmsg_maxsize
echo 'password' > /root/diskkey
zpool create -f -o feature@encryption=enabled data sdb cache sdc
# zfs set compression=lz4 data
zfs create -o keyformat=passphrase -o keylocation=file:///root/diskkey -o encryption=aes-256-gcm data/test
for i in a b c d e; do (rsync -a /usr /data/test/$i &); done
zfs list data/test
ret=0
while [ $ret -lt $((2 * 1024 * 1024 * 1024)) ]; do
    echo "size = $ret"
    ret=$(zfs list data/test -pH | awk '{ print $2 }')
    sleep 1
done
zfs snapshot data/test@snap1
zfs snapshot data/test@snap2
zfs snapshot data/test@snap3
zfs send -v data/test@snap1 | zfs recv data/copy
zfs send -v data/test@snap1 | zfs recv data/copy2

If it doesn't can you try to figure out what's wrong with my script?

cytrinox commented 7 years ago

@tcaputi Your script differs from my routine:

zfs send -v data/test@snap1 | zfs recv data/copy
zfs send -v data/test@snap1 | zfs recv data/copy2

I use:

zfs send -v data/test@snap1 | zfs recv data/copy
zfs send -v data/test@snap1 | zfs recv data/copy

I don't know if the error happens when you write the snapshot to a new destination.

tcaputi commented 7 years ago

I get an error when doing that... Is your script supposed to send a snap2 instead?