openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.43k stars 1.73k forks source link

vdev_raidz_io_done hangs in archinux Kernel 5.14.5-zen1-1-zen #12571

Closed RiichiReki closed 1 year ago

RiichiReki commented 2 years ago

System information

Type Version/Name
Distribution Name archlinux
Distribution Version N/A
Kernel Version 5.14.5-zen1-1-zen
Architecture x86_64
OpenZFS Version 2.1.1

Describe the problem you're observing

Filesystem becomes unresponsive to all writing operations, reads can be done, albeit slower than usual. System can't be rebooted at this stage.

Describe how to reproduce the problem

Nothing, much, just using the system for about an hour. The same behaviour with kernel 5.10.66-1-lts.

Last known good configurations: Kernel 5.10.52-1 with OpenZFS 2.1.0 and Kernel 5.12.15.zen1-1-zen with Open ZFS 2.1.0

Include any warning/errors/backtraces from the system logs

trace found in dmesg:

[ 1777.258197] general protection fault, probably for non-canonical address 0xfffb91cd51d523e4: 0000 [#1] PREEMPT SMP NOPTI
[ 1777.258205] CPU: 3 PID: 1154440 Comm: z_rd_int_1 Tainted: P           OE     5.14.5-zen1-1-zen #1 527e2b4cd0257f4e7b2d51000f465dd82b7e48aa
[ 1777.258208] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4-F, BIOS P2.80 05/10/2021
[ 1777.258210] RIP: 0010:vdev_raidz_io_done+0x24/0xc20 [zfs]
[ 1777.258317] Code: 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 31 db 48 81 ec d0 00 00 00 4c 8b a7 58 02 00 00 48 89 7c 24 18 <41> 8b 6c 24 04 65 48 8b 04 25 28 00 00 00 48 89 84 24 c8 00 00 00
[ 1777.258320] RSP: 0018:ffff9d29566b7c68 EFLAGS: 00010282
[ 1777.258323] RAX: ffffffffc141ebd0 RBX: 0000000000000000 RCX: 0000000000000000
[ 1777.258325] RDX: ffff91cd8fdb0000 RSI: 0000000000000000 RDI: ffff91d626a889c0
[ 1777.258326] RBP: ffff91d626a88d88 R08: 0000000000000019 R09: 0000000000000000
[ 1777.258327] R10: 0000000000000000 R11: ffff91d626a88d01 R12: fffb91cd51d523e0
[ 1777.258329] R13: ffff91d626a88da8 R14: ffff91ccb08bc000 R15: ffff91cc88221800
[ 1777.258330] FS:  0000000000000000(0000) GS:ffff91db7eac0000(0000) knlGS:0000000000000000
[ 1777.258332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1777.258334] CR2: 000025800b004000 CR3: 0000000910072000 CR4: 0000000000350ee0
[ 1777.258336] Call Trace:
[ 1777.258341]  ? kmem_cache_free+0x3d5/0x420
[ 1777.258346]  ? spl_kmem_cache_free+0x109/0x1d0 [spl 9278f6a37ab4857799152c2fede68359e1bcd163]
[ 1777.258356]  zio_vdev_io_done+0xc5/0x2b0 [zfs ef9ca6a7b3dcc991fc1e05c8aa6ce7649efbd33f]
[ 1777.258460]  zio_execute+0x83/0x1b0 [zfs ef9ca6a7b3dcc991fc1e05c8aa6ce7649efbd33f]
[ 1777.258556]  taskq_thread+0x2f7/0x6b0 [spl 9278f6a37ab4857799152c2fede68359e1bcd163]
[ 1777.258564]  ? wake_up_q+0x90/0x90
[ 1777.258569]  ? zio_vdev_io_start+0x3f0/0x3f0 [zfs ef9ca6a7b3dcc991fc1e05c8aa6ce7649efbd33f]
[ 1777.258663]  ? param_set_taskq_kick+0x130/0x130 [spl 9278f6a37ab4857799152c2fede68359e1bcd163]
[ 1777.258672]  kthread+0x1b0/0x1e0
[ 1777.258675]  ? __kthread_init_worker+0x50/0x50
[ 1777.258680]  ret_from_fork+0x22/0x30
[ 1777.258684] Modules linked in: cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth ecdh_generic rfkill ecc crc16 mousedev joydev intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd uvcvideo videobuf2_vmalloc snd_hda_codec_hdmi videobuf2_memops kvm videobuf2_v4l2 videobuf2_common snd_hda_intel snd_usb_audio snd_intel_dspcfg irqbypass videodev snd_usbmidi_lib snd_intel_sdw_acpi snd_rawmidi crct10dif_pclmul snd_seq_device crc32_pclmul vfat zfs(POE) wmi_bmof fat usbhid mc snd_hda_codec ghash_clmulni_intel snd_hda_core snd_hwdep aesni_intel r8169 snd_pcm crypto_simd cryptd snd_timer realtek rapl mdio_devres snd sp5100_tco pcspkr ccp i2c_piix4 k10temp soundcore libphy tpm_crb wmi tpm_tis tpm_tis_core tpm gpio_amdpt gpio_generic pinctrl_amd rng_core mac_hid acpi_cpufreq zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) crypto_user fuse bpf_preload ip_tables x_tables xfs libcrc32c crc32c_generic crc32c_intel xhci_pci
[ 1777.258739]  xhci_pci_renesas amdgpu drm_ttm_helper ttm agpgart gpu_sched i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm
[ 1777.258784] ---[ end trace 3a45098c5e760b4c ]---
[ 1777.258787] RIP: 0010:vdev_raidz_io_done+0x24/0xc20 [zfs]
[ 1777.258890] Code: 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 31 db 48 81 ec d0 00 00 00 4c 8b a7 58 02 00 00 48 89 7c 24 18 <41> 8b 6c 24 04 65 48 8b 04 25 28 00 00 00 48 89 84 24 c8 00 00 00
[ 1777.258892] RSP: 0018:ffff9d29566b7c68 EFLAGS: 00010282
[ 1777.258894] RAX: ffffffffc141ebd0 RBX: 0000000000000000 RCX: 0000000000000000
[ 1777.258896] RDX: ffff91cd8fdb0000 RSI: 0000000000000000 RDI: ffff91d626a889c0
[ 1777.258897] RBP: ffff91d626a88d88 R08: 0000000000000019 R09: 0000000000000000
[ 1777.258898] R10: 0000000000000000 R11: ffff91d626a88d01 R12: fffb91cd51d523e0
[ 1777.258899] R13: ffff91d626a88da8 R14: ffff91ccb08bc000 R15: ffff91cc88221800
[ 1777.258901] FS:  0000000000000000(0000) GS:ffff91db7eb00000(0000) knlGS:0000000000000000
[ 1777.258902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1777.258903] CR2: 00002c0509ae4000 CR3: 00000001dd850000 CR4: 0000000000350ee0
mabod commented 2 years ago

You are using kernel "5.14.5-zen1-1-zen". This kernel had issue with the cacule CPU scheduler. In that regard it is an experimental kernel as it is the first and only zen kernel with cacule scheduler. The zen kernel is back to normal CFS scheduler with v5.14.5-zen2. In the meantime version v5.14.6-zen1 is out.

Can you please check if the new linux-zen kernel fixes your problem?

Evernow commented 2 years ago

You are using kernel "5.14.5-zen1-1-zen". This kernel had issue with the cacule CPU scheduler. In that regard it is an experimental kernel as it is the first and only zen kernel with cacule scheduler. The zen kernel is back to normal CFS scheduler with v5.14.5-zen2. In the meantime version v5.14.6-zen1 is out.

Can you please check if the new linux-zen kernel fixes your problem?

Look at the bug report, they used the non-zen LTS kernel and also reproduced the issue.

brian-maloney commented 2 years ago

I just had a somewhat similar issue on LTS when my system came under memory/IO pressure for the first time after upgrading to 2.1.1. Mine was on kmem_cache_alloc so both are related to cache operations.

general protection fault, probably for non-canonical address 0x18377d83bddc99a3: 0000 [#1] SMP PTI
CPU: 1 PID: 664265 Comm: duplicacy Tainted: P           OE     5.10.68-1-lts #1
Hardware name: Dell Inc. PowerEdge 1950/0NK937, BIOS 2.7.0 10/30/2010
RIP: 0010:kmem_cache_alloc+0xdd/0x260
Code: 05 00 b9 b6 69 49 8b 00 49 83 78 10 00 48 89 04 24 0f 84 3c 01 00 00 48 85 c0 0f 84 33 01 00 00 8b 4d 28 48 8b 7d 00 48 01 c1 <48> 8b 19 48 89 ce 48 33 9d b8 00 00 00 48 8d 4a 01 48 0f ce 48 31
RSP: 0018:ffffbdb98977f398 EFLAGS: 00010206
RAX: 18377d83bddc79a3 RBX: ffff9ee363bf1d40 RCX: 18377d83bddc99a3
RDX: 000000000003dc4c RSI: 0000000000042c20 RDI: 00003ed518004690
RBP: ffff9ee340d8f600 R08: ffffddb97fc44690 R09: 50fdd07762fe9f0f
R10: 7488049bcc4dbc05 R11: 3f5412f6ce1fc6a4 R12: 0000000000042c20
R13: ffffffffc0303954 R14: 0000000000000000 R15: 0000000000042c20
FS:  000000c00006ee90(0000) GS:ffff9ee467c40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3adcf008 CR3: 000000012e51e000 CR4: 00000000000006e0
Call Trace:
 spl_kmem_cache_alloc+0x74/0x7f0 [spl]
 ? hkdf_sha512+0x6c/0x90 [zfs]
 zio_do_crypt_data+0x488/0x1110 [zfs]
 spa_do_crypt_abd+0x20d/0x370 [zfs]
 arc_buf_fill+0x4d0/0xcd0 [zfs]
 ? spl_kmem_cache_alloc+0x74/0x7f0 [spl]
 arc_untransform+0x1e/0x80 [zfs]
 dbuf_read_verify_dnode_crypt+0x112/0x180 [zfs]
 dbuf_read_impl.constprop.0+0x561/0x6d0 [zfs]
 dbuf_read+0x106/0x5d0 [zfs]
 dmu_bonus_hold_by_dnode+0x86/0x190 [zfs]
 dmu_bonus_hold+0x56/0x90 [zfs]
 zfs_zget+0xd5/0x270 [zfs]
 zfs_dirent_lock+0x4df/0x6c0 [zfs]
 zfs_dirlook+0xac/0x2e0 [zfs]
 ? zfs_zaccess+0x233/0x450 [zfs]
 zfs_lookup+0x248/0x3f0 [zfs]
 zpl_lookup+0xf2/0x210 [zfs]
 ? step_into+0xd3/0x700
 __lookup_slow+0x85/0x150
 walk_component+0x141/0x1b0
 path_lookupat+0x5f/0x190
 filename_lookup+0xc2/0x1d0
 vfs_statx+0x86/0x140
 __do_sys_newfstatat+0x46/0x80
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4b9dba
Code: e8 ab bf f7 ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
RSP: 002b:000000c00021f920 EFLAGS: 00000206 ORIG_RAX: 0000000000000106
RAX: ffffffffffffffda RBX: 000000c00003a500 RCX: 00000000004b9dba
RDX: 000000c035bb4038 RSI: 000000c01a60b960 RDI: ffffffffffffff9c
RBP: 000000c00021f998 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000206 R12: ffffffffffffffff
R13: 000000000000003b R14: 000000000000003a R15: 0000000000000049
Modules linked in: ip6table_security ip6table_raw ip6table_mangle iptable_security iptable_raw iptable_mangle xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink ip6table_nat ip6table_filter ip6_tables xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc ext4 crc32c_generic crc16 mbcache jbd2 ipmi_ssif radeon i2c_algo_bit iTCO_wdt vfat ttm intel_pmc_bxt raid1 fat dcdbas gpio_ich iTCO_vendor_support drm_kms_helper md_mod cec syscopyarea sysfillrect sysimgblt lpc_ich joydev pcspkr coretemp fb_sys_fops mousedev ipmi_si ipmi_devintf bnx2 i5000_edac mac_hid ipmi_msghandler rng_core i5k_amb drm fuse agpgart bpf_preload ip_tables x_tables zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) uas usb_storage usbhid sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas ata_piix mptscsih mptbase
---[ end trace dcbc4d59744f9d49 ]---
mabod commented 2 years ago

Just for the records: I was not able to reproduce this crash with an artificial test. I was running two fio benchmark jobs on two separate pools (one internal RAID10, one external RAIDZ2 on USB).

I have 64 GB RAM + 12 GB swap. Each fio process was running 500 jobs with 100 MB each. On top of that I put memory stress on the system with

stress-ng --vm 1 --vm-keep --vm-bytes 40G

The system was swapping. With several "Memory pressure relief" messages in the journal. All operations completed without error. This was on Arch Linux with kernel 5.14.8-arch1-1 and zfs-2.1.1-1 and zfs-kmod-2.1.1-1 (from archzfs repo)

RiichiReki commented 2 years ago

I tried again with 5.14.7-zen1-1-zen and the issue seems to be gone. But since then, I have also enabled automatic snapshots & snapshot pruning.

I will try again with lts later today if needed.

brian-maloney commented 2 years ago

My system has made it through 2 backups since the crash and been fine, so unfortunately I can't provide any more help with identifying this issue.

rincebrain commented 2 years ago

I just had a somewhat similar issue on LTS when my system came under memory/IO pressure for the first time after upgrading to 2.1.1. Mine was on kmem_cache_alloc so both are related to cache operations.

general protection fault, probably for non-canonical address 0x18377d83bddc99a3: 0000 [#1] SMP PTI
CPU: 1 PID: 664265 Comm: duplicacy Tainted: P           OE     5.10.68-1-lts #1
Hardware name: Dell Inc. PowerEdge 1950/0NK937, BIOS 2.7.0 10/30/2010
RIP: 0010:kmem_cache_alloc+0xdd/0x260
Code: 05 00 b9 b6 69 49 8b 00 49 83 78 10 00 48 89 04 24 0f 84 3c 01 00 00 48 85 c0 0f 84 33 01 00 00 8b 4d 28 48 8b 7d 00 48 01 c1 <48> 8b 19 48 89 ce 48 33 9d b8 00 00 00 48 8d 4a 01 48 0f ce 48 31
RSP: 0018:ffffbdb98977f398 EFLAGS: 00010206
RAX: 18377d83bddc79a3 RBX: ffff9ee363bf1d40 RCX: 18377d83bddc99a3
RDX: 000000000003dc4c RSI: 0000000000042c20 RDI: 00003ed518004690
RBP: ffff9ee340d8f600 R08: ffffddb97fc44690 R09: 50fdd07762fe9f0f
R10: 7488049bcc4dbc05 R11: 3f5412f6ce1fc6a4 R12: 0000000000042c20
R13: ffffffffc0303954 R14: 0000000000000000 R15: 0000000000042c20
FS:  000000c00006ee90(0000) GS:ffff9ee467c40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3adcf008 CR3: 000000012e51e000 CR4: 00000000000006e0
Call Trace:
 spl_kmem_cache_alloc+0x74/0x7f0 [spl]
 ? hkdf_sha512+0x6c/0x90 [zfs]
 zio_do_crypt_data+0x488/0x1110 [zfs]
 spa_do_crypt_abd+0x20d/0x370 [zfs]
 arc_buf_fill+0x4d0/0xcd0 [zfs]
 ? spl_kmem_cache_alloc+0x74/0x7f0 [spl]
 arc_untransform+0x1e/0x80 [zfs]
 dbuf_read_verify_dnode_crypt+0x112/0x180 [zfs]
 dbuf_read_impl.constprop.0+0x561/0x6d0 [zfs]
 dbuf_read+0x106/0x5d0 [zfs]
 dmu_bonus_hold_by_dnode+0x86/0x190 [zfs]
 dmu_bonus_hold+0x56/0x90 [zfs]
 zfs_zget+0xd5/0x270 [zfs]
 zfs_dirent_lock+0x4df/0x6c0 [zfs]
 zfs_dirlook+0xac/0x2e0 [zfs]
 ? zfs_zaccess+0x233/0x450 [zfs]
 zfs_lookup+0x248/0x3f0 [zfs]
 zpl_lookup+0xf2/0x210 [zfs]
 ? step_into+0xd3/0x700
 __lookup_slow+0x85/0x150
 walk_component+0x141/0x1b0
 path_lookupat+0x5f/0x190
 filename_lookup+0xc2/0x1d0
 vfs_statx+0x86/0x140
 __do_sys_newfstatat+0x46/0x80
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4b9dba
Code: e8 ab bf f7 ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
RSP: 002b:000000c00021f920 EFLAGS: 00000206 ORIG_RAX: 0000000000000106
RAX: ffffffffffffffda RBX: 000000c00003a500 RCX: 00000000004b9dba
RDX: 000000c035bb4038 RSI: 000000c01a60b960 RDI: ffffffffffffff9c
RBP: 000000c00021f998 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000206 R12: ffffffffffffffff
R13: 000000000000003b R14: 000000000000003a R15: 0000000000000049
Modules linked in: ip6table_security ip6table_raw ip6table_mangle iptable_security iptable_raw iptable_mangle xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink ip6table_nat ip6table_filter ip6_tables xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc ext4 crc32c_generic crc16 mbcache jbd2 ipmi_ssif radeon i2c_algo_bit iTCO_wdt vfat ttm intel_pmc_bxt raid1 fat dcdbas gpio_ich iTCO_vendor_support drm_kms_helper md_mod cec syscopyarea sysfillrect sysimgblt lpc_ich joydev pcspkr coretemp fb_sys_fops mousedev ipmi_si ipmi_devintf bnx2 i5000_edac mac_hid ipmi_msghandler rng_core i5k_amb drm fuse agpgart bpf_preload ip_tables x_tables zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) uas usb_storage usbhid sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas ata_piix mptscsih mptbase
---[ end trace dcbc4d59744f9d49 ]---

For completeness, I'd speculate your issue is not the same as the original reporter's, but is "just" #11679 or one of its relatives (there's what seems like a refcounting bug when encryption is involved. sometimes this results in something still being referenced after it's freed and reallocated, and then someone gets a nasty surprise when it gets freed again...)

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.