Open ndt47 opened 1 year ago
My only guess here, since you say when it's about to complete this happens, and zfs_ratelimit+0x2d
on my build at least puts me in static __always_inline struct task_struct *get_current(void)
, is that somehow the thread is being destroyed while zfs_ratelimit is being called, and then mutex_lock is trying to reference a torn-down task and goes bang?
I don't see why this wouldn't go bang for a great many people though, especially if it's reliable for you.
I'm not sure. There must be a hardware component, one thing I did notice is that ZFS selected a 12.5TB hot spare to replace a 12.7 TB drive. (These are both 14TB labeled, one is SATA the other SAS.)
I've tried correcting that, but it didn't seem to help now. I've now tried with three (recent) versions of ZFS (2.1.5, 2.16, 2.17) on two base distros (Unraid 6.10.3, 6.11.5; TrueNAS Scale 22.12.0). I'm now trying with a different base OS (TrueNAS Core). My pool is in a bad state, it's still functioning but degraded. I have multiple drives resilvering across multiple vdevs (2 drives in 2 raidz3 vdevs).
Replacing a 12.7T drive with a 12.5T drive could be okay if the effective size the 12.7 was being treated as was <= the 12.5's size - I'd be moderately surprised if it could try to do that in not that case.
Yes, I did discover that there was already a 12.5 in the vdev, so the effective size was likely 12.5
System information
Distribution Name | Unraid Distribution Version | 6.10.3 Kernel Version | Linux version 5.15.46-Unraid Architecture | x86-64 OpenZFS Version | 2.1.5
Describe the problem you're observing
Unable to successfully resilver a drive due to kernel Oops 99% of the way through the process. After the Oops all IO stops. The process will continue on reboot from approximately 98%, but will Oops again. It is consistently the same fault, at the exact same address. (Which leads me to believe a software bug rather than a hardware issue.) I have run a complete 4 passes through MemTestPro 10.2 without error.
Describe how to reproduce the problem
N/A
Include any warning/errors/backtraces from the system logs