openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

`KM_NOSLEEP` is viewed as a gentle suggestion sometimes by `spl_kmem_cache_alloc` #13317

Open rincebrain opened 2 years ago

rincebrain commented 2 years ago
Type Version/Name
Distribution Name Debian
Distribution Version 11.3
Kernel Version 5.10.0-13-amd64
Architecture x86_64
OpenZFS Version ed715283

Describe the problem you're observing

spl_kmem_cache_alloc normally won't sleep...unless, on Linux, it decides it needs to refill:

https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1274

Then it will happily go call spl_cache_grow: https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1122

And then we hit several things that might crash and burn if we, say, are allocating inside a kfpu_begin block:

https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1019

Amusingly, test_bit, since it uses atomics, will scream murder too (as will the waiting on something with preemption off, for that matter):

https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1026-L1030

And then of course this fun one:

https://github.com/openzfs/zfs/blob/ed715283de8e65a30d777e9576399ab75014b6fe/module/os/linux/spl/spl-kmem-cache.c#L1078-L1080

Describe how to reproduce the problem

On a new kernel (5.17 is new enough, 5.10 is not), or not x86:

kfpu_begin();
while (1)
  kmem_cache_alloc(some_cache, somesize);
kfpu_end();

Include any warning/errors/backtraces from the system logs

~Pay no attention to the functions that aren't in the codebase.~

(Sorry if any of these aren't the best of examples, a lot of them ended up looking like PPPAANANNIICICC)

I added a variant of zio_data_buf_alloc with KM_NOSLEEP for this, but here we still are.

Apr 10 07:01:09 wrex kernel: [ 3904.758546] ------------[ cut here ]------------
Apr 10 07:01:09 wrex kernel: [ 3904.758583] WARNING: CPU: 0 PID: 171579 at arch/arm64/kernel/fpsimd.c:169 __get_cpu_fpsimd_context+0x34/0x40
Apr 10 07:01:09 wrex kernel: [ 3904.758586] Modules linked in: zfs(POE) icp(POE) zzstd(OE) zlua(OE) zcommon(POE) zunicode(POE) znvpair(POE) zavl(POE) spl(OE) cpufreq_conservative cpufreq_ondemand cpufreq_powersave cpufreq_userspace hci_u
art nls_ascii btqca nls_cp437 btrtl btbcm vfat btintel btsdio fat bluetooth jitterentropy_rng aes_neon_bs bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) aes_neon_blk crypto_simd videobuf2_vmalloc brcmfmac videobuf2_memops videobuf2_v4l2 cryptd vi
deobuf2_common brcmutil drbg vc4 videodev joydev snd_soc_core evdev cfg80211 mc sg snd_bcm2835(C) aes_arm64 cpufreq_dt aes_generic snd_pcm_dmaengine snd_pcm ansi_cprng raspberrypi_cpufreq snd_timer ecdh_generic rfkill snd ecc libaes ipro
c_rng200 soundcore pwm_bcm2835 cec rng_core bcm2835_wdt vchiq(C) bcm2711_thermal drm_kms_helper leds_gpio nf_tables libcrc32c nfnetlink fuse drm configfs ip_tables x_tables autofs4 ext4 hid_generic usbhid hid crc16 mbcache jbd2 sd_mod t1
0_pi crc_t10dif crct10dif_generic crc32c_generic uas usb_storage scsi_mod broadcom
Apr 10 07:01:09 wrex kernel: [ 3904.758834]  bcm_phy_lib xhci_pci dwc2 xhci_hcd genet udc_core roles mdio_bcm_unimac crct10dif_ce reset_raspberrypi crct10dif_common usbcore of_mdio sdhci_iproc sdhci_pltfm fixed_phy i2c_bcm2835 libphy usb
_common sdhci phy_generic fixed gpio_regulator [last unloaded: spl]
Apr 10 07:01:09 wrex kernel: [ 3904.758899] CPU: 0 PID: 171579 Comm: z_wr_iss Tainted: P        WC OE     5.10.0-13-arm64 #1 Debian 5.10.106-1
Apr 10 07:01:09 wrex kernel: [ 3904.758903] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
Apr 10 07:01:09 wrex kernel: [ 3904.758911] pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
Apr 10 07:01:09 wrex kernel: [ 3904.758918] pc : __get_cpu_fpsimd_context+0x34/0x40
Apr 10 07:01:09 wrex kernel: [ 3904.758924] lr : fpsimd_thread_switch+0x2c/0xdc
Apr 10 07:01:09 wrex kernel: [ 3904.758927] sp : ffff8000165eb670
Apr 10 07:01:09 wrex kernel: [ 3904.758931] x29: ffff8000165eb670 x28: ffff1f55e7594e00
Apr 10 07:01:09 wrex kernel: [ 3904.758939] x27: 0000000000000019 x26: ffff1f5581fa1e80
Apr 10 07:01:09 wrex kernel: [ 3904.758946] x25: ffff1f5581fa24b0 x24: 0000000000000000
Apr 10 07:01:09 wrex kernel: [ 3904.758954] x23: ffff1f563b7a5200 x22: ffff1f5586e8c780
Apr 10 07:01:09 wrex kernel: [ 3904.758961] x21: 0000000000000001 x20: ffff1f5581fa1e80
Apr 10 07:01:09 wrex kernel: [ 3904.758968] x19: ffffd02cbd1f3240 x18: 000000000000ac08
Apr 10 07:01:09 wrex kernel: [ 3904.758975] x17: ffff800016855010 x16: ffffd02cbc587090
Apr 10 07:01:09 wrex kernel: [ 3904.758982] x15: 0000000000000000 x14: 0000000000000000
Apr 10 07:01:09 wrex kernel: [ 3904.758989] x13: 0000000000000003 x12: 0000000000000003
Apr 10 07:01:09 wrex kernel: [ 3904.758995] x11: 0000000000000000 x10: 0000000000000001
Apr 10 07:01:09 wrex kernel: [ 3904.759003] x9 : ffffd02cbba17ef0 x8 : 0000000000000000
Apr 10 07:01:09 wrex kernel: [ 3904.759009] x7 : 000000000001d400 x6 : 0000038d25be8a41
Apr 10 07:01:09 wrex kernel: [ 3904.759017] x5 : 00ffffffffffffff x4 : 0000000000000015
Apr 10 07:01:09 wrex kernel: [ 3904.759024] x3 : 0000000000000001 x2 : 0000000000000001
Apr 10 07:01:09 wrex kernel: [ 3904.759030] x1 : ffff4f297e9ac000 x0 : ffffd02cbcde3088
Apr 10 07:01:09 wrex kernel: [ 3904.759039] Call trace:
Apr 10 07:01:09 wrex kernel: [ 3904.759047]  __get_cpu_fpsimd_context+0x34/0x40
Apr 10 07:01:09 wrex kernel: [ 3904.759053]  __switch_to+0x20/0x11c
Apr 10 07:01:09 wrex kernel: [ 3904.759061]  __schedule+0x2b8/0x7c0
Apr 10 07:01:09 wrex kernel: [ 3904.759066]  schedule+0x54/0xdc
Apr 10 07:01:09 wrex kernel: [ 3904.759072]  schedule_timeout+0xa0/0x1a4
Apr 10 07:01:09 wrex kernel: [ 3904.759090]  spl_kmem_cache_alloc+0x738/0x94c [spl]
Apr 10 07:01:09 wrex kernel: [ 3904.759217]  zio_data_buf_alloc_nosleep+0x44/0x84 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759282]  BrotliDefaultAllocFunc+0x38/0x70 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759349]  BrotliAllocate+0x2c/0x80 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759417]  EncodeData+0x1068/0x2180 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759484]  BrotliEncoderCompressStream.part.0+0x184/0xc50 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759550]  BrotliEncoderCompress+0x110/0x250 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759615]  brotli_compress_zfs+0x100/0x1e0 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759682]  zio_compress_data+0xf0/0x160 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759748]  zio_write_compress+0x318/0x7b0 [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759814]  zio_execute+0xb8/0x15c [zfs]
Apr 10 07:01:09 wrex kernel: [ 3904.759824]  taskq_thread+0x2bc/0x4c0 [spl]
Apr 10 07:01:09 wrex kernel: [ 3904.759831]  kthread+0x130/0x13c
Apr 10 07:01:09 wrex kernel: [ 3904.759837]  ret_from_fork+0x10/0x38
Apr 10 07:01:09 wrex kernel: [ 3904.759841] ---[ end trace cf08756ba0e3f165 ]---

(zfs_dbgmsg also uses KM_SLEEP allocations, so that was another fun discovery:

Apr 10 06:52:50 wrex kernel: [ 3405.567457] ------------[ cut here ]------------
Apr 10 06:52:50 wrex kernel: [ 3405.578990] WARNING: CPU: 2 PID: 163649 at arch/arm64/kernel/fpsimd.c:169 __get_cpu_fpsimd_context+0x34/0x40
Apr 10 06:52:50 wrex kernel: [ 3405.593874] Modules linked in: zfs(POE) icp(POE) zzstd(OE) zlua(OE) zcommon(POE) zunicode(POE) znvpair(POE) zavl(POE) spl(OE) cpufreq_conservative cpufreq_ondemand cpufreq_powersave cpufreq_userspace hci_u
art nls_ascii btqca nls_cp437 btrtl btbcm vfat btintel btsdio fat bluetooth jitterentropy_rng aes_neon_bs bcm2835_v4l2(C) bcm2835_mmal_vchiq(C) aes_neon_blk crypto_simd videobuf2_vmalloc brcmfmac videobuf2_memops videobuf2_v4l2 cryptd vi
deobuf2_common brcmutil drbg vc4 videodev joydev snd_soc_core evdev cfg80211 mc sg snd_bcm2835(C) aes_arm64 cpufreq_dt aes_generic snd_pcm_dmaengine snd_pcm ansi_cprng raspberrypi_cpufreq snd_timer ecdh_generic rfkill snd ecc libaes iproc_rng200 soundcore pwm_bcm2835 cec rng_core bcm2835_wdt vchiq(C) bcm2711_thermal drm_kms_helper leds_gpio nf_tables libcrc32c nfnetlink fuse drm configfs ip_tables x_tables autofs4 ext4 hid_generic usbhid hid crc16 mbcache jbd2 sd_mod t10_pi crc_t10dif crct10dif_generic crc32c_generic uas usb_storage scsi_mod broadcom
Apr 10 06:52:50 wrex kernel: [ 3405.594034]  bcm_phy_lib xhci_pci dwc2 xhci_hcd genet udc_core roles mdio_bcm_unimac crct10dif_ce reset_raspberrypi crct10dif_common usbcore of_mdio sdhci_iproc sdhci_pltfm fixed_phy i2c_bcm2835 libphy usb_common sdhci phy_generic fixed gpio_regulator [last unloaded: spl]
Apr 10 06:52:50 wrex kernel: [ 3405.737414] CPU: 2 PID: 163649 Comm: z_wr_iss Tainted: P        WC OE     5.10.0-13-arm64 #1 Debian 5.10.106-1
Apr 10 06:52:50 wrex kernel: [ 3405.755961] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
Apr 10 06:52:50 wrex kernel: [ 3405.771722] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--)
Apr 10 06:52:50 wrex kernel: [ 3405.787284] pc : __get_cpu_fpsimd_context+0x34/0x40
Apr 10 06:52:50 wrex kernel: [ 3405.802740] lr : fpsimd_thread_switch+0x2c/0xdc
Apr 10 06:52:50 wrex kernel: [ 3405.817824] sp : ffff8000128db820
Apr 10 06:52:50 wrex kernel: [ 3405.831941] x29: ffff8000128db820 x28: 000000000001bff4
Apr 10 06:52:50 wrex kernel: [ 3405.847208] x27: ffffd02cbd441000 x26: ffff1f5581facc40
Apr 10 06:52:50 wrex kernel: [ 3405.862518] x25: ffff1f5581fad278 x24: 0000000000000001
Apr 10 06:52:50 wrex kernel: [ 3405.877832] x23: ffff1f563b7e7200 x22: ffff1f5586eeed00
Apr 10 06:52:50 wrex kernel: [ 3405.893401] x21: 0000000000000001 x20: ffff1f5581facc40
Apr 10 06:52:50 wrex kernel: [ 3405.908752] x19: ffff1f5580ca0f40 x18: 00a8f385f7c82846
Apr 10 06:52:50 wrex kernel: [ 3405.924165] x17: 00000a8ea1083a11 x16: ffffd02cbbd1cbc4
Apr 10 06:52:50 wrex kernel: [ 3405.940051] x15: 00a9a0de2cc0c569 x14: 00000001000bd518
Apr 10 06:52:50 wrex kernel: [ 3405.955222] x13: ffffd02c9c1b5a60 x12: ffff1f560d08b0d8
Apr 10 06:52:50 wrex kernel: [ 3405.970745] x11: ffffd02c9c1b5a60 x10: 0000000000000001
Apr 10 06:52:50 wrex kernel: [ 3405.986112] x9 : ffffd02cbba17ef0 x8 : 000000000249f873
Apr 10 06:52:50 wrex kernel: [ 3406.001425] x7 : 02cf41983a79acb4 x6 : 0000000002cf4198
Apr 10 06:52:50 wrex kernel: [ 3406.016941] x5 : 0000000000000004 x4 : ffff1f5580ca1338
Apr 10 06:52:50 wrex kernel: [ 3406.032031] x3 : 0000000000000001 x2 : 0000000000000001
Apr 10 06:52:50 wrex kernel: [ 3406.046933] x1 : ffff4f297e9ee000 x0 : ffffd02cbcde3088
Apr 10 06:52:50 wrex kernel: [ 3406.061855] Call trace:
Apr 10 06:52:50 wrex kernel: [ 3406.074360]  __get_cpu_fpsimd_context+0x34/0x40
Apr 10 06:52:50 wrex kernel: [ 3406.087564]  __switch_to+0x20/0x11c
Apr 10 06:52:50 wrex kernel: [ 3406.099669]  __schedule+0x2b8/0x7c0
Apr 10 06:52:50 wrex kernel: [ 3406.111309]  _cond_resched+0x48/0x64
Apr 10 06:52:50 wrex kernel: [ 3406.122920]  slab_pre_alloc_hook.constprop.0+0x9c/0x100
Apr 10 06:52:50 wrex kernel: [ 3406.135439]  __kmalloc_node+0xa4/0x350
Apr 10 06:52:50 wrex kernel: [ 3406.146373]  spl_kvmalloc+0x8c/0xd0 [spl]
Apr 10 06:52:50 wrex kernel: [ 3406.157613]  spl_kmem_alloc_impl+0x114/0x140 [spl]
Apr 10 06:52:50 wrex kernel: [ 3406.169414]  spl_kmem_alloc+0x20/0x30 [spl]
Apr 10 06:52:50 wrex kernel: [ 3406.180842]  __dprintf+0x88/0x190 [zfs]
Apr 10 06:52:50 wrex kernel: [ 3406.192014]  BrotliEncoderCompress+0x30c/0x370 [zfs]
Apr 10 06:52:50 wrex kernel: [ 3406.204177]  brotli_compress_zfs+0x100/0x1e0 [zfs]
Apr 10 06:52:50 wrex kernel: [ 3406.215750]  zio_compress_data+0xf0/0x160 [zfs]
Apr 10 06:52:50 wrex kernel: [ 3406.227623]  zio_write_compress+0x318/0x7b0 [zfs]
Apr 10 06:52:50 wrex kernel: [ 3406.239363]  zio_execute+0xb8/0x15c [zfs]
Apr 10 06:52:50 wrex kernel: [ 3406.250592]  taskq_thread+0x2bc/0x4c0 [spl]
Apr 10 06:52:50 wrex kernel: [ 3406.261833]  kthread+0x130/0x13c
Apr 10 06:52:50 wrex kernel: [ 3406.272228]  ret_from_fork+0x10/0x38
Apr 10 06:52:50 wrex kernel: [ 3406.283087] ---[ end trace cf08756ba0e3f149 ]---

)

adamdmoss commented 2 years ago

That's not great.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

behlendorf commented 1 year ago

@rincebrain a bit overdue but I've opened PR #14926 to address this issue. Please take a look if you get a chance.