Open rincebrain opened 2 years ago
That's not great.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
@rincebrain a bit overdue but I've opened PR #14926 to address this issue. Please take a look if you get a chance.
Describe the problem you're observing
spl_kmem_cache_alloc
normally won't sleep...unless, on Linux, it decides it needs to refill:https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1274
Then it will happily go call
spl_cache_grow
: https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1122And then we hit several things that might crash and burn if we, say, are allocating inside a
kfpu_begin
block:https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1019
Amusingly,
test_bit
, since it uses atomics, will scream murder too (as will the waiting on something with preemption off, for that matter):https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1026-L1030
And then of course this fun one:
https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1078-L1080
Describe how to reproduce the problem
On a new kernel (5.17 is new enough, 5.10 is not), or not x86:
Include any warning/errors/backtraces from the system logs
~Pay no attention to the functions that aren't in the codebase.~
(Sorry if any of these aren't the best of examples, a lot of them ended up looking like PPPAANANNIICICC)
I added a variant of
zio_data_buf_alloc
withKM_NOSLEEP
for this, but here we still are.(
zfs_dbgmsg
also uses KM_SLEEP allocations, so that was another fun discovery:)