Closed RiichiReki closed 1 year ago
You are using kernel "5.14.5-zen1-1-zen". This kernel had issue with the cacule CPU scheduler. In that regard it is an experimental kernel as it is the first and only zen kernel with cacule scheduler. The zen kernel is back to normal CFS scheduler with v5.14.5-zen2. In the meantime version v5.14.6-zen1 is out.
Can you please check if the new linux-zen kernel fixes your problem?
You are using kernel "5.14.5-zen1-1-zen". This kernel had issue with the cacule CPU scheduler. In that regard it is an experimental kernel as it is the first and only zen kernel with cacule scheduler. The zen kernel is back to normal CFS scheduler with v5.14.5-zen2. In the meantime version v5.14.6-zen1 is out.
Can you please check if the new linux-zen kernel fixes your problem?
Look at the bug report, they used the non-zen LTS kernel and also reproduced the issue.
I just had a somewhat similar issue on LTS when my system came under memory/IO pressure for the first time after upgrading to 2.1.1. Mine was on kmem_cache_alloc
so both are related to cache operations.
general protection fault, probably for non-canonical address 0x18377d83bddc99a3: 0000 [#1] SMP PTI
CPU: 1 PID: 664265 Comm: duplicacy Tainted: P OE 5.10.68-1-lts #1
Hardware name: Dell Inc. PowerEdge 1950/0NK937, BIOS 2.7.0 10/30/2010
RIP: 0010:kmem_cache_alloc+0xdd/0x260
Code: 05 00 b9 b6 69 49 8b 00 49 83 78 10 00 48 89 04 24 0f 84 3c 01 00 00 48 85 c0 0f 84 33 01 00 00 8b 4d 28 48 8b 7d 00 48 01 c1 <48> 8b 19 48 89 ce 48 33 9d b8 00 00 00 48 8d 4a 01 48 0f ce 48 31
RSP: 0018:ffffbdb98977f398 EFLAGS: 00010206
RAX: 18377d83bddc79a3 RBX: ffff9ee363bf1d40 RCX: 18377d83bddc99a3
RDX: 000000000003dc4c RSI: 0000000000042c20 RDI: 00003ed518004690
RBP: ffff9ee340d8f600 R08: ffffddb97fc44690 R09: 50fdd07762fe9f0f
R10: 7488049bcc4dbc05 R11: 3f5412f6ce1fc6a4 R12: 0000000000042c20
R13: ffffffffc0303954 R14: 0000000000000000 R15: 0000000000042c20
FS: 000000c00006ee90(0000) GS:ffff9ee467c40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee3adcf008 CR3: 000000012e51e000 CR4: 00000000000006e0
Call Trace:
spl_kmem_cache_alloc+0x74/0x7f0 [spl]
? hkdf_sha512+0x6c/0x90 [zfs]
zio_do_crypt_data+0x488/0x1110 [zfs]
spa_do_crypt_abd+0x20d/0x370 [zfs]
arc_buf_fill+0x4d0/0xcd0 [zfs]
? spl_kmem_cache_alloc+0x74/0x7f0 [spl]
arc_untransform+0x1e/0x80 [zfs]
dbuf_read_verify_dnode_crypt+0x112/0x180 [zfs]
dbuf_read_impl.constprop.0+0x561/0x6d0 [zfs]
dbuf_read+0x106/0x5d0 [zfs]
dmu_bonus_hold_by_dnode+0x86/0x190 [zfs]
dmu_bonus_hold+0x56/0x90 [zfs]
zfs_zget+0xd5/0x270 [zfs]
zfs_dirent_lock+0x4df/0x6c0 [zfs]
zfs_dirlook+0xac/0x2e0 [zfs]
? zfs_zaccess+0x233/0x450 [zfs]
zfs_lookup+0x248/0x3f0 [zfs]
zpl_lookup+0xf2/0x210 [zfs]
? step_into+0xd3/0x700
__lookup_slow+0x85/0x150
walk_component+0x141/0x1b0
path_lookupat+0x5f/0x190
filename_lookup+0xc2/0x1d0
vfs_statx+0x86/0x140
__do_sys_newfstatat+0x46/0x80
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4b9dba
Code: e8 ab bf f7 ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48
RSP: 002b:000000c00021f920 EFLAGS: 00000206 ORIG_RAX: 0000000000000106
RAX: ffffffffffffffda RBX: 000000c00003a500 RCX: 00000000004b9dba
RDX: 000000c035bb4038 RSI: 000000c01a60b960 RDI: ffffffffffffff9c
RBP: 000000c00021f998 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000206 R12: ffffffffffffffff
R13: 000000000000003b R14: 000000000000003a R15: 0000000000000049
Modules linked in: ip6table_security ip6table_raw ip6table_mangle iptable_security iptable_raw iptable_mangle xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink ip6table_nat ip6table_filter ip6_tables xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc ext4 crc32c_generic crc16 mbcache jbd2 ipmi_ssif radeon i2c_algo_bit iTCO_wdt vfat ttm intel_pmc_bxt raid1 fat dcdbas gpio_ich iTCO_vendor_support drm_kms_helper md_mod cec syscopyarea sysfillrect sysimgblt lpc_ich joydev pcspkr coretemp fb_sys_fops mousedev ipmi_si ipmi_devintf bnx2 i5000_edac mac_hid ipmi_msghandler rng_core i5k_amb drm fuse agpgart bpf_preload ip_tables x_tables zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) uas usb_storage usbhid sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas ata_piix mptscsih mptbase
---[ end trace dcbc4d59744f9d49 ]---
Just for the records: I was not able to reproduce this crash with an artificial test. I was running two fio benchmark jobs on two separate pools (one internal RAID10, one external RAIDZ2 on USB).
I have 64 GB RAM + 12 GB swap. Each fio process was running 500 jobs with 100 MB each. On top of that I put memory stress on the system with
stress-ng --vm 1 --vm-keep --vm-bytes 40G
The system was swapping. With several "Memory pressure relief" messages in the journal. All operations completed without error. This was on Arch Linux with kernel 5.14.8-arch1-1 and zfs-2.1.1-1 and zfs-kmod-2.1.1-1 (from archzfs repo)
I tried again with 5.14.7-zen1-1-zen and the issue seems to be gone. But since then, I have also enabled automatic snapshots & snapshot pruning.
I will try again with lts later today if needed.
My system has made it through 2 backups since the crash and been fine, so unfortunately I can't provide any more help with identifying this issue.
I just had a somewhat similar issue on LTS when my system came under memory/IO pressure for the first time after upgrading to 2.1.1. Mine was on
kmem_cache_alloc
so both are related to cache operations.general protection fault, probably for non-canonical address 0x18377d83bddc99a3: 0000 [#1] SMP PTI CPU: 1 PID: 664265 Comm: duplicacy Tainted: P OE 5.10.68-1-lts #1 Hardware name: Dell Inc. PowerEdge 1950/0NK937, BIOS 2.7.0 10/30/2010 RIP: 0010:kmem_cache_alloc+0xdd/0x260 Code: 05 00 b9 b6 69 49 8b 00 49 83 78 10 00 48 89 04 24 0f 84 3c 01 00 00 48 85 c0 0f 84 33 01 00 00 8b 4d 28 48 8b 7d 00 48 01 c1 <48> 8b 19 48 89 ce 48 33 9d b8 00 00 00 48 8d 4a 01 48 0f ce 48 31 RSP: 0018:ffffbdb98977f398 EFLAGS: 00010206 RAX: 18377d83bddc79a3 RBX: ffff9ee363bf1d40 RCX: 18377d83bddc99a3 RDX: 000000000003dc4c RSI: 0000000000042c20 RDI: 00003ed518004690 RBP: ffff9ee340d8f600 R08: ffffddb97fc44690 R09: 50fdd07762fe9f0f R10: 7488049bcc4dbc05 R11: 3f5412f6ce1fc6a4 R12: 0000000000042c20 R13: ffffffffc0303954 R14: 0000000000000000 R15: 0000000000042c20 FS: 000000c00006ee90(0000) GS:ffff9ee467c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fee3adcf008 CR3: 000000012e51e000 CR4: 00000000000006e0 Call Trace: spl_kmem_cache_alloc+0x74/0x7f0 [spl] ? hkdf_sha512+0x6c/0x90 [zfs] zio_do_crypt_data+0x488/0x1110 [zfs] spa_do_crypt_abd+0x20d/0x370 [zfs] arc_buf_fill+0x4d0/0xcd0 [zfs] ? spl_kmem_cache_alloc+0x74/0x7f0 [spl] arc_untransform+0x1e/0x80 [zfs] dbuf_read_verify_dnode_crypt+0x112/0x180 [zfs] dbuf_read_impl.constprop.0+0x561/0x6d0 [zfs] dbuf_read+0x106/0x5d0 [zfs] dmu_bonus_hold_by_dnode+0x86/0x190 [zfs] dmu_bonus_hold+0x56/0x90 [zfs] zfs_zget+0xd5/0x270 [zfs] zfs_dirent_lock+0x4df/0x6c0 [zfs] zfs_dirlook+0xac/0x2e0 [zfs] ? zfs_zaccess+0x233/0x450 [zfs] zfs_lookup+0x248/0x3f0 [zfs] zpl_lookup+0xf2/0x210 [zfs] ? step_into+0xd3/0x700 __lookup_slow+0x85/0x150 walk_component+0x141/0x1b0 path_lookupat+0x5f/0x190 filename_lookup+0xc2/0x1d0 vfs_statx+0x86/0x140 __do_sys_newfstatat+0x46/0x80 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x4b9dba Code: e8 ab bf f7 ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 4c 8b 54 24 28 4c 8b 44 24 30 4c 8b 4c 24 38 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 40 ff ff ff ff 48 c7 44 24 48 RSP: 002b:000000c00021f920 EFLAGS: 00000206 ORIG_RAX: 0000000000000106 RAX: ffffffffffffffda RBX: 000000c00003a500 RCX: 00000000004b9dba RDX: 000000c035bb4038 RSI: 000000c01a60b960 RDI: ffffffffffffff9c RBP: 000000c00021f998 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000100 R11: 0000000000000206 R12: ffffffffffffffff R13: 000000000000003b R14: 000000000000003a R15: 0000000000000049 Modules linked in: ip6table_security ip6table_raw ip6table_mangle iptable_security iptable_raw iptable_mangle xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink ip6table_nat ip6table_filter ip6_tables xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c br_netfilter bridge stp llc ext4 crc32c_generic crc16 mbcache jbd2 ipmi_ssif radeon i2c_algo_bit iTCO_wdt vfat ttm intel_pmc_bxt raid1 fat dcdbas gpio_ich iTCO_vendor_support drm_kms_helper md_mod cec syscopyarea sysfillrect sysimgblt lpc_ich joydev pcspkr coretemp fb_sys_fops mousedev ipmi_si ipmi_devintf bnx2 i5000_edac mac_hid ipmi_msghandler rng_core i5k_amb drm fuse agpgart bpf_preload ip_tables x_tables zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) uas usb_storage usbhid sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas ata_piix mptscsih mptbase ---[ end trace dcbc4d59744f9d49 ]---
For completeness, I'd speculate your issue is not the same as the original reporter's, but is "just" #11679 or one of its relatives (there's what seems like a refcounting bug when encryption is involved. sometimes this results in something still being referenced after it's freed and reallocated, and then someone gets a nasty surprise when it gets freed again...)
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Describe the problem you're observing
Filesystem becomes unresponsive to all writing operations, reads can be done, albeit slower than usual. System can't be rebooted at this stage.
Describe how to reproduce the problem
Nothing, much, just using the system for about an hour. The same behaviour with kernel 5.10.66-1-lts.
Last known good configurations: Kernel 5.10.52-1 with OpenZFS 2.1.0 and Kernel 5.12.15.zen1-1-zen with Open ZFS 2.1.0
Include any warning/errors/backtraces from the system logs
trace found in dmesg: