Consistent Kernel Oops when resilvering gets to 99%

ndt47 commented 1 year ago

System information

Describe the problem you're observing

Unable to successfully resilver a drive due to kernel Oops 99% of the way through the process. After the Oops all IO stops. The process will continue on reboot from approximately 98%, but will Oops again. It is consistently the same fault, at the exact same address. (Which leads me to believe a software bug rather than a hardware issue.) I have run a complete 4 passes through MemTestPro 10.2 without error.

Describe how to reproduce the problem

N/A

Include any warning/errors/backtraces from the system logs

Jan 12 08:51:55 TaylorPlex kernel: BUG: unable to handle page fault for address: 0000000000003da8
Jan 12 08:51:55 TaylorPlex kernel: #PF: supervisor write access in kernel mode
Jan 12 08:51:55 TaylorPlex kernel: #PF: error_code(0x0002) - not-present page
Jan 12 08:51:55 TaylorPlex kernel: PGD 0 P4D 0 
Jan 12 08:51:55 TaylorPlex kernel: Oops: 0002 [#1] SMP NOPTI
Jan 12 08:51:55 TaylorPlex kernel: CPU: 13 PID: 9885 Comm: dp_sync_taskq Tainted: P           O      5.15.46-Unraid #1
Jan 12 08:51:55 TaylorPlex kernel: Hardware name: Supermicro M12SWA-TF/M12SWA-TF, BIOS 2.0b 06/28/2022
Jan 12 08:51:55 TaylorPlex kernel: RIP: 0010:mutex_lock+0x1e/0x2a
Jan 12 08:51:55 TaylorPlex kernel: Code: 00 00 be 02 00 00 00 e9 37 fd ff ff 0f 1f 44 00 00 51 48 89 3c 24 e8 c3 f4 ff ff 31 c0 48 8b 3c 24 65 48 8b 14 25 c0 bb 01 00 <f0> 48 0f b1 17 74 03 5a eb c9 58 c3 0f 1f 44 00 00 55 50 48 89 3c
Jan 12 08:51:55 TaylorPlex kernel: RSP: 0018:ffffc90030333a60 EFLAGS: 00010246
Jan 12 08:51:55 TaylorPlex kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jan 12 08:51:55 TaylorPlex kernel: RDX: ffff88821afd3d00 RSI: ffffffffa08b121b RDI: 0000000000003da8
Jan 12 08:51:55 TaylorPlex kernel: RBP: 0000000000003d88 R08: ffff8884882057c0 R09: 0000000000000000
Jan 12 08:51:55 TaylorPlex kernel: R10: 047ef354472af54e R11: 01ef9636ea9722d0 R12: 0000000000003da8
Jan 12 08:51:55 TaylorPlex kernel: R13: ffff88821afd3d00 R14: 0000000000012000 R15: 0000000000000001
Jan 12 08:51:55 TaylorPlex kernel: FS:  0000000000000000(0000) GS:ffff88bf3d340000(0000) knlGS:0000000000000000
Jan 12 08:51:55 TaylorPlex kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 08:51:55 TaylorPlex kernel: CR2: 0000000000003da8 CR3: 000000047ee60000 CR4: 0000000000350ee0
Jan 12 08:51:55 TaylorPlex kernel: Call Trace:
Jan 12 08:51:55 TaylorPlex kernel: <TASK>
Jan 12 08:51:55 TaylorPlex kernel: zfs_ratelimit+0x2d/0xd9 [zfs]
Jan 12 08:51:55 TaylorPlex kernel: zfs_is_ratelimiting_event+0x29/0x76 [zfs]
Jan 12 08:51:55 TaylorPlex kernel: zfs_ereport_start_checksum+0xc3/0x22a [zfs]
Jan 12 08:51:55 TaylorPlex kernel: zio_checksum_verify+0x91/0x13b [zfs]
Jan 12 08:51:55 TaylorPlex kernel: zio_execute+0xb2/0xd9 [zfs]
Jan 12 08:51:55 TaylorPlex kernel: zio_vdev_io_start+0x225/0x233 [zfs]
Jan 12 08:51:55 TaylorPlex kernel: zio_nowait+0xe7/0xf9 [zfs]
Jan 12 08:51:55 TaylorPlex kernel: zio_ddt_read_start+0x18e/0x1bb [zfs]
Jan 12 08:51:55 TaylorPlex kernel: ? spl_kmem_alloc_impl+0xc6/0xf3 [spl]
Jan 12 08:51:55 TaylorPlex kernel: ? zio_nop_write+0x29d/0x29d [zfs]
Jan 12 08:51:55 TaylorPlex kernel: ? zio_push_transform+0x34/0x7a [zfs]
Jan 12 08:51:55 TaylorPlex kernel: zio_nowait+0xe7/0xf9 [zfs]
Jan 12 08:51:55 TaylorPlex kernel: arc_read+0xea7/0xef3 [zfs]
Jan 12 08:51:55 TaylorPlex kernel: dsl_scan_prefetch_thread+0x19a/0x248 [zfs]
Jan 12 08:51:55 TaylorPlex kernel: taskq_thread+0x277/0x3a1 [spl]
Jan 12 08:51:55 TaylorPlex kernel: ? wake_up_q+0x3e/0x3e
Jan 12 08:51:55 TaylorPlex kernel: ? taskq_dispatch_delay+0x111/0x111 [spl]
Jan 12 08:51:55 TaylorPlex kernel: kthread+0xde/0xe3
Jan 12 08:51:55 TaylorPlex kernel: ? set_kthread_struct+0x32/0x32
Jan 12 08:51:55 TaylorPlex kernel: ret_from_fork+0x22/0x30
Jan 12 08:51:55 TaylorPlex kernel: </TASK>
Jan 12 08:51:55 TaylorPlex kernel: Modules linked in: nfsd auth_rpcgss oid_registry lockd grace sunrpc md_mod iptable_mangle zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) nvidia(PO) ipmi_devintf efivarfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables mlx4_en mlx4_core ast drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm backlight igb agpgart syscopyarea sysfillrect ahci sysimgblt wmi_bmof amd64_edac edac_mce_amd kvm_amd kvm ipmi_ssif crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl atlantic fb_sys_fops i2c_algo_bit libahci i2c_piix4 mpt3sas input_leds nvme raid_class led_class acpi_ipmi k10temp ccp i2c_core scsi_transport_sas nvme_core wmi ipmi_si tpm_crb tpm_tis tpm_tis_core tpm button acpi_cpufreq [last unloaded: mlx4_core]
Jan 12 08:51:55 TaylorPlex kernel: CR2: 0000000000003da8
Jan 12 08:51:55 TaylorPlex kernel: ---[ end trace 7269605c3f5694ca ]---
Jan 12 08:51:55 TaylorPlex kernel: RIP: 0010:mutex_lock+0x1e/0x2a
Jan 12 08:51:55 TaylorPlex kernel: Code: 00 00 be 02 00 00 00 e9 37 fd ff ff 0f 1f 44 00 00 51 48 89 3c 24 e8 c3 f4 ff ff 31 c0 48 8b 3c 24 65 48 8b 14 25 c0 bb 01 00 <f0> 48 0f b1 17 74 03 5a eb c9 58 c3 0f 1f 44 00 00 55 50 48 89 3c
Jan 12 08:51:55 TaylorPlex kernel: RSP: 0018:ffffc90030333a60 EFLAGS: 00010246
Jan 12 08:51:55 TaylorPlex kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Jan 12 08:51:55 TaylorPlex kernel: RDX: ffff88821afd3d00 RSI: ffffffffa08b121b RDI: 0000000000003da8
Jan 12 08:51:55 TaylorPlex kernel: RBP: 0000000000003d88 R08: ffff8884882057c0 R09: 0000000000000000
Jan 12 08:51:55 TaylorPlex kernel: R10: 047ef354472af54e R11: 01ef9636ea9722d0 R12: 0000000000003da8
Jan 12 08:51:55 TaylorPlex kernel: R13: ffff88821afd3d00 R14: 0000000000012000 R15: 0000000000000001
Jan 12 08:51:55 TaylorPlex kernel: FS:  0000000000000000(0000) GS:ffff88bf3d340000(0000) knlGS:0000000000000000
Jan 12 08:51:55 TaylorPlex kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 12 08:51:55 TaylorPlex kernel: CR2: 0000000000003da8 CR3: 000000047ee60000 CR4: 0000000000350ee0

rincebrain commented 1 year ago

My only guess here, since you say when it's about to complete this happens, and zfs_ratelimit+0x2d on my build at least puts me in static __always_inline struct task_struct *get_current(void), is that somehow the thread is being destroyed while zfs_ratelimit is being called, and then mutex_lock is trying to reference a torn-down task and goes bang?

I don't see why this wouldn't go bang for a great many people though, especially if it's reliable for you.

ndt47 commented 1 year ago

I'm not sure. There must be a hardware component, one thing I did notice is that ZFS selected a 12.5TB hot spare to replace a 12.7 TB drive. (These are both 14TB labeled, one is SATA the other SAS.)

I've tried correcting that, but it didn't seem to help now. I've now tried with three (recent) versions of ZFS (2.1.5, 2.16, 2.17) on two base distros (Unraid 6.10.3, 6.11.5; TrueNAS Scale 22.12.0). I'm now trying with a different base OS (TrueNAS Core). My pool is in a bad state, it's still functioning but degraded. I have multiple drives resilvering across multiple vdevs (2 drives in 2 raidz3 vdevs).

rincebrain commented 1 year ago

Replacing a 12.7T drive with a 12.5T drive could be okay if the effective size the 12.7 was being treated as was <= the 12.5's size - I'd be moderately surprised if it could try to do that in not that case.

ndt47 commented 1 year ago

Yes, I did discover that there was already a 12.5 in the vdev, so the effective size was likely 12.5

openzfs / zfs