Closed omgold closed 3 years ago
I'm wondering: The log shows kernel version "3.10.0-1062.1.1.el7_lustre.x86_64" (Lustre 2.13.0 or 2.12.3 on CentOS 7.7?) but you wrote "CentOS 7.8" and "3.10.0-1127.8.2.el7_lustre.x86_64" in the System Information section which is probably Lustre 2.12.5. (All these Lustre version were released with ZFS 0.7.13 as far as I remember.)
@knweiss Ah, sorry, the log has been from a short time before the kernel was updated. The modules are all built by myself, so versions does not match the ones in the Lustre repos. The actual versions are then 3.10.0-1062.1.1.el7_lustre.x86_64 for the kernel (as in the log) and Lustre 2.12.4. ZFS version 0.8.4 is correct, though.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Distribution Name | Centos Distribution Version | 7.8 Linux Kernel | 3.10.0-1127.8.2.el7_lustre.x86_64 Architecture | x86_64 ZFS Version | 0.8.4 SPL Version | 0.8.4
Describe the problem you're observing
When doing heavy IO on a Lustre filesystem which uses ZFS for OSTs sometimes kernel threads get stuck in zfs io code. See kernel backtrace below.
Describe how to reproduce the problem
Probably difficult, as it apparently is racy. Running IOR against the Lustre from many nodes regularly causes the issue after an hour or so.
Include any warning/errors/backtraces from the system logs