openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.47k stars 1.74k forks source link

Error: discard_granularity is 0 #16068

Closed multi closed 5 months ago

multi commented 5 months ago

System information

Type Version/Name
Distribution Name Arch
Distribution Version -
Kernel Version 6.7.11-hardened1-1-hardened
Architecture x86_64
OpenZFS Version 2.2.99.r398.g39be46f43f + https://github.com/openzfs/zfs/commit/917ff75e9510d19968ef3cc5c80b1cd0ef48f84d

Describe the problem you're observing

Kernel warning from __blkdev_issue_discard

It was triggered while syncing snapshots with syncoid

syncoid --recursive --no-sync-snap --sendoptions="Lcepw" --recvoptions="u" --delete-target-snapshots pool usbnvme/pool

From dmesg

[30860.067492] sdg1: Error: discard_granularity is 0.

sdg is a 2TB nvme (Seagate ZP2000CV30002) plugged into nvme2usb adapter (RTL9210B chipset, flashed with firmware version 1.32.87.082923), used for the usbnvme pool :) With discard support enabled, by using the following udev rule (why udev, see here):

ACTION=="add|change", ATTRS{idVendor}=="0bda", ATTRS{idProduct}=="9210", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap", ATTR{manage_start_stop}="1"

Attaching more details below.

Not sure, about the root cause of all - is it a buggy RTL9210B firmware or a kernel/zfs issue?

Describe how to reproduce the problem

Not sure :/ I rebooted after sawing the load of the machine and the error in dmesg. Pulled zfs-2.2.99-412_g99741bde59 and rebooted again. Issued a scrub on usbnvme, which finished without errors. After that a trim, which also showed no sign of errors 100% trimmed, nor kernel errors. Then, I've decided to try running again syncoid which started with the following:

Resuming interrupted zfs send/receive from pool to usbnvme/pool (~ 4 KB remaining):

and finished without errors...

Include any warning/errors/backtraces from the system logs

stack trace dmesg ``` [30860.066588] ------------[ cut here ]------------ [30860.066600] WARNING: CPU: 2 PID: 558845 at block/blk-lib.c:50 __blkdev_issue_discard+0x14a/0x190 [30860.066608] Modules linked in: uas ts_bm xt_LOG nf_log_syslog xt_string xt_tcpudp nft_chain_nat xt_MASQUERADE nf_nat nct6775 nft_compat nct6775_core hwmon_vid nf_tables drivetemp intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul i915 polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg vfat snd_intel_sdw_acpi sha512_ssse3 fat sha256_ssse3 sha1_ssse3 at24 iTCO_wdt drm_buddy i2c_algo_bit intel_pmc_bxt ttm iTCO_vendor_support mei_hdcp snd_hda_codec mei_pxp dm_crypt rapl intel_cstate snd_hda_core drm_display_helper snd_hwdep snd_pcm option intel_uncore cec i2c_i801 snd_timer mxm_wmi pcspkr i2c_smbus rc_core mei_me lpc_ich usb_wwan aesni_intel snd intel_gtt soundcore crypto_simd mei cryptd cbc joydev encrypted_keys trusted mousedev asn1_encoder input_leds tee evdev mac_hid nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc32c_generic crc32c_intel fuse loop dm_mod nfnetlink [30860.066666] dmi_sysfs ip_tables x_tables wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 ip6_udp_tunnel udp_tunnel usb_storage hid_logitech_hidpp raid1 hid_logitech_dj md_mod usbhid nvme nvme_core xhci_pci nvme_auth xhci_pci_renesas video wmi zfs(POE) spl(OE) r8168(OE) tg3 libphy [30860.066686] CPU: 2 PID: 558845 Comm: z_trim_iss Tainted: P OE T 6.7.11-hardened1-1-hardened #1 [30860.066688] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme4, BIOS P2.90P 12/17/2015 [30860.066689] RIP: 0010:__blkdev_issue_discard+0x14a/0x190 [30860.066692] Code: 48 39 d8 75 90 f7 da 89 d3 c1 eb 09 eb 8a 48 8b 04 24 4c 89 20 48 83 c4 08 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b 48 c7 c6 90 90 0c 94 48 c7 c7 e0 77 93 94 e8 21 91 7b 00 85 [30860.066694] RSP: 0018:ffffa50fa5b53cd0 EFLAGS: 00010246 [30860.066696] RAX: ffff99d5dddb9e40 RBX: ffff99d681e264a0 RCX: ffff99d400485100 [30860.066697] RDX: 0000000000000000 RSI: 000000004c042110 RDI: ffff99d70b84e540 [30860.066698] RBP: 0000000000000c40 R08: ffffa50fa5b53d10 R09: 0000000000000000 [30860.066699] R10: ffff99d4bafbc028 R11: ffff99d681e264a0 R12: 0000000000000000 [30860.066700] R13: 0000000000000040 R14: 0000000000000000 R15: ffff99d70b84e540 [30860.066702] FS: 0000000000000000(0000) GS:ffff99d70f300000(0000) knlGS:0000000000000000 [30860.066703] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [30860.066704] CR2: 000000f19a9c4018 CR3: 00000001b0108004 CR4: 00000000000606f0 [30860.066706] Call Trace: [30860.066708] [30860.066708] ? __blkdev_issue_discard+0x14a/0x190 [30860.066711] ? __warn+0x81/0x130 [30860.066715] ? __blkdev_issue_discard+0x14a/0x190 [30860.066717] ? report_bug+0x191/0x1c0 [30860.066721] ? handle_bug+0x3c/0x80 [30860.066723] ? exc_invalid_op+0x17/0x70 [30860.066725] ? asm_exc_invalid_op+0x1a/0x20 [30860.066731] ? __blkdev_issue_discard+0x14a/0x190 [30860.066735] vdev_disk_io_start+0x128/0x530 [zfs] [30860.066927] zio_vdev_io_start+0x12b/0x320 [zfs] [30860.067104] zio_execute+0x8d/0x130 [zfs] [30860.067278] taskq_thread+0x2d1/0x520 [spl] [30860.067294] ? __pfx_default_wake_function+0x10/0x10 [30860.067299] ? __pfx_zio_execute+0x10/0x10 [zfs] [30860.067467] ? __pfx_taskq_thread+0x10/0x10 [spl] [30860.067479] kthread+0xfa/0x130 [30860.067481] ? __pfx_kthread+0x10/0x10 [30860.067483] ret_from_fork+0x34/0x50 [30860.067485] ? __pfx_kthread+0x10/0x10 [30860.067486] ret_from_fork_asm+0x1b/0x30 [30860.067490] [30860.067491] ---[ end trace 0000000000000000 ]--- [30860.067492] sdg1: Error: discard_granularity is 0. [31090.318756] INFO: task txg_sync:558881 blocked for more than 122 seconds. [31090.318766] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31090.318769] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31090.318770] task:txg_sync state:D stack:0 pid:558881 tgid:558881 ppid:2 flags:0x00004000 [31090.318778] Call Trace: [31090.318781] [31090.318784] __schedule+0x3f0/0x1470 [31090.318795] ? preempt_count_add+0x6e/0xa0 [31090.318801] ? get_nohz_timer_target+0x1c/0x1a0 [31090.318809] schedule+0x35/0xe0 [31090.318815] schedule_timeout+0x98/0x160 [31090.318821] ? __pfx_process_timeout+0x10/0x10 [31090.318828] io_schedule_timeout+0x4a/0x70 [31090.318835] __cv_timedwait_common+0x12f/0x170 [spl] [31090.318861] ? __pfx_autoremove_wake_function+0x10/0x10 [31090.318867] __cv_timedwait_io+0x19/0x20 [spl] [31090.318889] zio_wait+0x14d/0x2d0 [zfs] [31090.319264] spa_sync+0x68/0x1070 [zfs] [31090.319629] ? _raw_spin_lock+0x17/0x40 [31090.319632] ? preempt_count_add+0x6e/0xa0 [31090.319636] ? _raw_spin_lock+0x17/0x40 [31090.319639] ? _raw_spin_unlock+0x19/0x40 [31090.319643] ? spa_txg_history_init_io+0x117/0x120 [zfs] [31090.319997] txg_sync_thread+0x203/0x3a0 [zfs] [31090.320350] ? __pfx_txg_sync_thread+0x10/0x10 [zfs] [31090.320701] ? __pfx_thread_generic_wrapper+0x10/0x10 [spl] [31090.320726] thread_generic_wrapper+0x5e/0x70 [spl] [31090.320751] kthread+0xfa/0x130 [31090.320756] ? __pfx_kthread+0x10/0x10 [31090.320759] ret_from_fork+0x34/0x50 [31090.320764] ? __pfx_kthread+0x10/0x10 [31090.320767] ret_from_fork_asm+0x1b/0x30 [31090.320774] [31090.320778] INFO: task zfs:558972 blocked for more than 122 seconds. [31090.320781] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31090.320783] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31090.320785] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [31090.320791] Call Trace: [31090.320792] [31090.320795] __schedule+0x3f0/0x1470 [31090.320801] ? ttwu_queue_wakelist+0xc6/0x110 [31090.320806] ? _raw_spin_unlock_irqrestore+0x27/0x50 [31090.320812] ? try_to_wake_up+0x256/0x6c0 [31090.320819] schedule+0x35/0xe0 [31090.320825] io_schedule+0x43/0x70 [31090.320831] cv_wait_common+0xaa/0x130 [spl] [31090.320852] ? __pfx_autoremove_wake_function+0x10/0x10 [31090.320857] txg_wait_synced_impl+0xfd/0x150 [zfs] [31090.321210] txg_wait_synced+0x10/0x40 [zfs] [31090.321565] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [31090.321932] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31090.322301] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31090.322654] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31090.323004] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31090.323354] dsl_sync_task+0x1a/0x30 [zfs] [31090.323719] dmu_recv_existing_end+0x6d/0xa0 [zfs] [31090.324074] dmu_recv_end+0x85/0xb0 [zfs] [31090.324423] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [31090.324774] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [31090.325123] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [31090.325479] zfsdev_ioctl+0x53/0xe0 [zfs] [31090.325643] __x64_sys_ioctl+0x94/0xd0 [31090.325645] do_syscall_64+0x64/0xe0 [31090.325649] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [31090.325653] RIP: 0033:0x6a6501cbf4ff [31090.325655] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [31090.325657] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [31090.325659] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [31090.325660] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [31090.325661] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [31090.325662] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [31090.325664] [31213.196644] INFO: task zfs:558972 blocked for more than 245 seconds. [31213.196652] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31213.196655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31213.196657] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [31213.196665] Call Trace: [31213.196668] [31213.196671] __schedule+0x3f0/0x1470 [31213.196683] ? ttwu_queue_wakelist+0xc6/0x110 [31213.196690] ? _raw_spin_unlock_irqrestore+0x27/0x50 [31213.196696] ? try_to_wake_up+0x256/0x6c0 [31213.196704] schedule+0x35/0xe0 [31213.196710] io_schedule+0x43/0x70 [31213.196716] cv_wait_common+0xaa/0x130 [spl] [31213.196742] ? __pfx_autoremove_wake_function+0x10/0x10 [31213.196749] txg_wait_synced_impl+0xfd/0x150 [zfs] [31213.197131] txg_wait_synced+0x10/0x40 [zfs] [31213.197482] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [31213.197850] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31213.198200] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31213.198551] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31213.198901] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31213.199251] dsl_sync_task+0x1a/0x30 [zfs] [31213.199617] dmu_recv_existing_end+0x6d/0xa0 [zfs] [31213.199984] dmu_recv_end+0x85/0xb0 [zfs] [31213.200335] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [31213.200686] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [31213.201036] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [31213.201383] zfsdev_ioctl+0x53/0xe0 [zfs] [31213.201710] __x64_sys_ioctl+0x94/0xd0 [31213.201715] do_syscall_64+0x64/0xe0 [31213.201722] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [31213.201730] RIP: 0033:0x6a6501cbf4ff [31213.201735] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [31213.201739] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [31213.201742] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [31213.201745] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [31213.201747] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [31213.201750] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [31213.201755] [31336.077881] INFO: task zfs:558972 blocked for more than 368 seconds. [31336.077890] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31336.077893] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31336.077895] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [31336.077903] Call Trace: [31336.077906] [31336.077910] __schedule+0x3f0/0x1470 [31336.077922] ? ttwu_queue_wakelist+0xc6/0x110 [31336.077929] ? _raw_spin_unlock_irqrestore+0x27/0x50 [31336.077935] ? try_to_wake_up+0x256/0x6c0 [31336.077942] schedule+0x35/0xe0 [31336.077948] io_schedule+0x43/0x70 [31336.077955] cv_wait_common+0xaa/0x130 [spl] [31336.077981] ? __pfx_autoremove_wake_function+0x10/0x10 [31336.077988] txg_wait_synced_impl+0xfd/0x150 [zfs] [31336.078372] txg_wait_synced+0x10/0x40 [zfs] [31336.078724] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [31336.079092] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31336.079444] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31336.079796] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31336.080191] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31336.080546] dsl_sync_task+0x1a/0x30 [zfs] [31336.080917] dmu_recv_existing_end+0x6d/0xa0 [zfs] [31336.081284] dmu_recv_end+0x85/0xb0 [zfs] [31336.081636] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [31336.081989] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [31336.082340] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [31336.082686] zfsdev_ioctl+0x53/0xe0 [zfs] [31336.083011] __x64_sys_ioctl+0x94/0xd0 [31336.083017] do_syscall_64+0x64/0xe0 [31336.083024] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [31336.083033] RIP: 0033:0x6a6501cbf4ff [31336.083037] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [31336.083042] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [31336.083045] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [31336.083048] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [31336.083050] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [31336.083052] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [31336.083058] [31458.959111] INFO: task zfs:558972 blocked for more than 491 seconds. [31458.959120] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31458.959123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31458.959125] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [31458.959133] Call Trace: [31458.959135] [31458.959140] __schedule+0x3f0/0x1470 [31458.959152] ? ttwu_queue_wakelist+0xc6/0x110 [31458.959159] ? _raw_spin_unlock_irqrestore+0x27/0x50 [31458.959165] ? try_to_wake_up+0x256/0x6c0 [31458.959172] schedule+0x35/0xe0 [31458.959179] io_schedule+0x43/0x70 [31458.959186] cv_wait_common+0xaa/0x130 [spl] [31458.959211] ? __pfx_autoremove_wake_function+0x10/0x10 [31458.959218] txg_wait_synced_impl+0xfd/0x150 [zfs] [31458.959611] txg_wait_synced+0x10/0x40 [zfs] [31458.959964] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [31458.960332] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31458.960683] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31458.961035] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31458.961384] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31458.961734] dsl_sync_task+0x1a/0x30 [zfs] [31458.962098] dmu_recv_existing_end+0x6d/0xa0 [zfs] [31458.962466] dmu_recv_end+0x85/0xb0 [zfs] [31458.962817] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [31458.963168] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [31458.963519] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [31458.963865] zfsdev_ioctl+0x53/0xe0 [zfs] [31458.964191] __x64_sys_ioctl+0x94/0xd0 [31458.964196] do_syscall_64+0x64/0xe0 [31458.964203] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [31458.964211] RIP: 0033:0x6a6501cbf4ff [31458.964216] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [31458.964221] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [31458.964224] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [31458.964226] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [31458.964229] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [31458.964231] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [31458.964236] [31581.840355] INFO: task zfs:558972 blocked for more than 614 seconds. [31581.840364] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31581.840367] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31581.840369] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [31581.840377] Call Trace: [31581.840379] [31581.840383] __schedule+0x3f0/0x1470 [31581.840394] ? ttwu_queue_wakelist+0xc6/0x110 [31581.840401] ? _raw_spin_unlock_irqrestore+0x27/0x50 [31581.840407] ? try_to_wake_up+0x256/0x6c0 [31581.840415] schedule+0x35/0xe0 [31581.840421] io_schedule+0x43/0x70 [31581.840427] cv_wait_common+0xaa/0x130 [spl] [31581.840453] ? __pfx_autoremove_wake_function+0x10/0x10 [31581.840459] txg_wait_synced_impl+0xfd/0x150 [zfs] [31581.840841] txg_wait_synced+0x10/0x40 [zfs] [31581.841192] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [31581.841560] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31581.841958] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31581.842311] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31581.842660] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31581.843010] dsl_sync_task+0x1a/0x30 [zfs] [31581.843375] dmu_recv_existing_end+0x6d/0xa0 [zfs] [31581.843751] dmu_recv_end+0x85/0xb0 [zfs] [31581.844103] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [31581.844456] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [31581.844806] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [31581.845152] zfsdev_ioctl+0x53/0xe0 [zfs] [31581.845477] __x64_sys_ioctl+0x94/0xd0 [31581.845483] do_syscall_64+0x64/0xe0 [31581.845490] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [31581.845498] RIP: 0033:0x6a6501cbf4ff [31581.845502] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [31581.845507] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [31581.845510] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [31581.845513] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [31581.845515] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [31581.845517] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [31581.845522] [31704.721535] INFO: task zfs:558972 blocked for more than 737 seconds. [31704.721545] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31704.721548] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31704.721550] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [31704.721557] Call Trace: [31704.721560] [31704.721564] __schedule+0x3f0/0x1470 [31704.721576] ? ttwu_queue_wakelist+0xc6/0x110 [31704.721583] ? _raw_spin_unlock_irqrestore+0x27/0x50 [31704.721589] ? try_to_wake_up+0x256/0x6c0 [31704.721597] schedule+0x35/0xe0 [31704.721603] io_schedule+0x43/0x70 [31704.721610] cv_wait_common+0xaa/0x130 [spl] [31704.721635] ? __pfx_autoremove_wake_function+0x10/0x10 [31704.721643] txg_wait_synced_impl+0xfd/0x150 [zfs] [31704.722034] txg_wait_synced+0x10/0x40 [zfs] [31704.722386] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [31704.722754] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31704.723106] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31704.723458] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31704.723807] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31704.724158] dsl_sync_task+0x1a/0x30 [zfs] [31704.724522] dmu_recv_existing_end+0x6d/0xa0 [zfs] [31704.724909] dmu_recv_end+0x85/0xb0 [zfs] [31704.725261] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [31704.725613] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [31704.725963] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [31704.726309] zfsdev_ioctl+0x53/0xe0 [zfs] [31704.726635] __x64_sys_ioctl+0x94/0xd0 [31704.726641] do_syscall_64+0x64/0xe0 [31704.726648] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [31704.726656] RIP: 0033:0x6a6501cbf4ff [31704.726660] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [31704.726665] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [31704.726668] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [31704.726671] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [31704.726673] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [31704.726675] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [31704.726681] [31827.602755] INFO: task zfs:558972 blocked for more than 860 seconds. [31827.602765] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31827.602768] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31827.602769] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [31827.602777] Call Trace: [31827.602780] [31827.602784] __schedule+0x3f0/0x1470 [31827.602795] ? ttwu_queue_wakelist+0xc6/0x110 [31827.602802] ? _raw_spin_unlock_irqrestore+0x27/0x50 [31827.602808] ? try_to_wake_up+0x256/0x6c0 [31827.602816] schedule+0x35/0xe0 [31827.602822] io_schedule+0x43/0x70 [31827.602829] cv_wait_common+0xaa/0x130 [spl] [31827.602854] ? __pfx_autoremove_wake_function+0x10/0x10 [31827.602861] txg_wait_synced_impl+0xfd/0x150 [zfs] [31827.603251] txg_wait_synced+0x10/0x40 [zfs] [31827.603604] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [31827.603972] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31827.604324] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31827.604675] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31827.605024] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31827.605374] dsl_sync_task+0x1a/0x30 [zfs] [31827.605738] dmu_recv_existing_end+0x6d/0xa0 [zfs] [31827.606112] dmu_recv_end+0x85/0xb0 [zfs] [31827.606463] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [31827.606815] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [31827.607166] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [31827.607511] zfsdev_ioctl+0x53/0xe0 [zfs] [31827.607836] __x64_sys_ioctl+0x94/0xd0 [31827.607842] do_syscall_64+0x64/0xe0 [31827.607849] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [31827.607857] RIP: 0033:0x6a6501cbf4ff [31827.607861] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [31827.607866] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [31827.607869] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [31827.607871] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [31827.607873] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [31827.607875] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [31827.607881] [31950.483978] INFO: task zfs:558972 blocked for more than 983 seconds. [31950.483987] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [31950.483990] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [31950.483992] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [31950.483999] Call Trace: [31950.484002] [31950.484005] __schedule+0x3f0/0x1470 [31950.484017] ? ttwu_queue_wakelist+0xc6/0x110 [31950.484024] ? _raw_spin_unlock_irqrestore+0x27/0x50 [31950.484030] ? try_to_wake_up+0x256/0x6c0 [31950.484038] schedule+0x35/0xe0 [31950.484044] io_schedule+0x43/0x70 [31950.484050] cv_wait_common+0xaa/0x130 [spl] [31950.484075] ? __pfx_autoremove_wake_function+0x10/0x10 [31950.484082] txg_wait_synced_impl+0xfd/0x150 [zfs] [31950.484465] txg_wait_synced+0x10/0x40 [zfs] [31950.484817] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [31950.485190] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31950.485543] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31950.485895] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [31950.486244] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [31950.486594] dsl_sync_task+0x1a/0x30 [zfs] [31950.486959] dmu_recv_existing_end+0x6d/0xa0 [zfs] [31950.487326] dmu_recv_end+0x85/0xb0 [zfs] [31950.487677] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [31950.488029] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [31950.488379] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [31950.488725] zfsdev_ioctl+0x53/0xe0 [zfs] [31950.489050] __x64_sys_ioctl+0x94/0xd0 [31950.489056] do_syscall_64+0x64/0xe0 [31950.489062] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [31950.489070] RIP: 0033:0x6a6501cbf4ff [31950.489074] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [31950.489079] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [31950.489082] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [31950.489085] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [31950.489087] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [31950.489089] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [31950.489094] [32073.365090] INFO: task zfs:558972 blocked for more than 1105 seconds. [32073.365099] Tainted: P W OE T 6.7.11-hardened1-1-hardened #1 [32073.365102] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32073.365103] task:zfs state:D stack:0 pid:558972 tgid:558972 ppid:558968 flags:0x00004002 [32073.365111] Call Trace: [32073.365114] [32073.365118] __schedule+0x3f0/0x1470 [32073.365129] ? ttwu_queue_wakelist+0xc6/0x110 [32073.365136] ? _raw_spin_unlock_irqrestore+0x27/0x50 [32073.365142] ? try_to_wake_up+0x256/0x6c0 [32073.365150] schedule+0x35/0xe0 [32073.365155] io_schedule+0x43/0x70 [32073.365162] cv_wait_common+0xaa/0x130 [spl] [32073.365188] ? __pfx_autoremove_wake_function+0x10/0x10 [32073.365195] txg_wait_synced_impl+0xfd/0x150 [zfs] [32073.365585] txg_wait_synced+0x10/0x40 [zfs] [32073.365939] dsl_sync_task_common+0x1cd/0x2b0 [zfs] [32073.366306] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [32073.366657] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [32073.367009] ? __pfx_dmu_recv_end_check+0x10/0x10 [zfs] [32073.367359] ? __pfx_dmu_recv_end_sync+0x10/0x10 [zfs] [32073.367709] dsl_sync_task+0x1a/0x30 [zfs] [32073.368074] dmu_recv_existing_end+0x6d/0xa0 [zfs] [32073.368449] dmu_recv_end+0x85/0xb0 [zfs] [32073.368805] zfs_ioc_recv_impl.constprop.0+0xe55/0x1220 [zfs] [32073.369157] zfs_ioc_recv_new+0x315/0x3a0 [zfs] [32073.369507] zfsdev_ioctl_common+0x483/0x9d0 [zfs] [32073.369853] zfsdev_ioctl+0x53/0xe0 [zfs] [32073.370178] __x64_sys_ioctl+0x94/0xd0 [32073.370184] do_syscall_64+0x64/0xe0 [32073.370191] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [32073.370199] RIP: 0033:0x6a6501cbf4ff [32073.370204] RSP: 002b:000074d9f06c7200 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [32073.370208] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00006a6501cbf4ff [32073.370211] RDX: 000074d9f06c7280 RSI: 0000000000005a46 RDI: 0000000000000004 [32073.370214] RBP: 000074d9f06ca870 R08: 0000000000000000 R09: 0000000000000001 [32073.370216] R10: 0000000000000004 R11: 0000000000000246 R12: 000074d9f06c7280 [32073.370218] R13: 0000000000005a46 R14: 000074d9f06ce001 R15: 000074d9f06ca8d0 [32073.370224] [32073.370226] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings ```
usb device dmesg ``` [ 795.878231] usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd [ 795.911243] usb 4-1: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=f0.01 [ 795.911253] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 795.911256] usb 4-1: Product: RTL9210B-CG [ 795.911259] usb 4-1: Manufacturer: Realtek [ 795.911261] usb 4-1: SerialNumber: 012345678932 [ 795.939524] usb 4-1: Enable of device-initiated U1 failed. [ 795.940026] usb 4-1: Enable of device-initiated U2 failed. [ 795.970070] usb 4-1: Enable of device-initiated U1 failed. [ 795.970710] usb 4-1: Enable of device-initiated U2 failed. [ 795.972684] scsi host13: uas [ 796.419421] scsi 13:0:0:0: Direct-Access Realtek RTL9210B-CG 1.00 PQ: 0 ANSI: 6 [ 796.459184] sd 13:0:0:0: [sdg] 488378646 4096-byte logical blocks: (2.00 TB/1.82 TiB) [ 796.460603] sd 13:0:0:0: [sdg] Write Protect is off [ 796.460608] sd 13:0:0:0: [sdg] Mode Sense: 37 00 00 08 [ 796.463754] sd 13:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 796.464938] sd 13:0:0:0: [sdg] Preferred minimum I/O size 4096 bytes [ 796.464944] sd 13:0:0:0: [sdg] Optimal transfer size 268431360 bytes [ 796.480698] sdg: sdg1 sdg9 [ 796.481050] sd 13:0:0:0: [sdg] Attached SCSI disk ```
lsusb.py -ciu ``` usb4 1d6b:0003 09 1IF [USB 3.00, 5000 Mbps, 0mA] (xhci-hcd 0000:00:14.0) hub 4-1 0bda:9210 00 1IF [USB 3.20, 5000 Mbps, 896mA] (Realtek RTL9210B-CG 012345678932) 4-1:1.0 (IF) 08:06:62 4EPs (Mass Storage) uas host13 (sdg) ```
lsblk -Df ``` NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO FSTYPE FSVER sdg 0 4K 32G 0 ├─sdg1 0 4K 32G 0 zfs_member 5000 └─sdg9 0 4K 32G 0 ```
zpool get all usbnvme | grep local ``` usbnvme ashift 12 local usbnvme autotrim on local usbnvme feature@async_destroy enabled local usbnvme feature@empty_bpobj active local usbnvme feature@lz4_compress active local usbnvme feature@multi_vdev_crash_dump enabled local usbnvme feature@spacemap_histogram active local usbnvme feature@enabled_txg active local usbnvme feature@hole_birth active local usbnvme feature@extensible_dataset active local usbnvme feature@embedded_data active local usbnvme feature@bookmarks enabled local usbnvme feature@filesystem_limits enabled local usbnvme feature@large_blocks enabled local usbnvme feature@large_dnode active local usbnvme feature@sha512 enabled local usbnvme feature@skein enabled local usbnvme feature@edonr enabled local usbnvme feature@userobj_accounting active local usbnvme feature@encryption active local usbnvme feature@project_quota active local usbnvme feature@device_removal enabled local usbnvme feature@obsolete_counts enabled local usbnvme feature@zpool_checkpoint enabled local usbnvme feature@spacemap_v2 active local usbnvme feature@allocation_classes enabled local usbnvme feature@resilver_defer enabled local usbnvme feature@bookmark_v2 enabled local usbnvme feature@redaction_bookmarks enabled local usbnvme feature@redacted_datasets enabled local usbnvme feature@bookmark_written enabled local usbnvme feature@log_spacemap active local usbnvme feature@livelist enabled local usbnvme feature@device_rebuild enabled local usbnvme feature@zstd_compress enabled local usbnvme feature@draid enabled local usbnvme feature@zilsaxattr enabled local usbnvme feature@head_errlog active local usbnvme feature@blake3 enabled local usbnvme feature@block_cloning enabled local usbnvme feature@vdev_zaps_v2 active local usbnvme feature@redaction_list_spill enabled local usbnvme feature@raidz_expansion enabled local ```
robn commented 5 months ago

The kernel code has a nice comment on it:

    /* In case the discard granularity isn't set by buggy device driver */
    if (WARN_ON_ONCE(!bdev_discard_granularity(bdev))) {
        pr_err_ratelimited("%pg: Error: discard_granularity is 0.\n",
                   bdev);
        return -EOPNOTSUPP;
    }

Given the hit-and-miss nature of USB adapters, my money is on firmware or driver.

That said, it looks like OpenZFS could be handling this better, by not enabling/issuing TRIM if it can detect this case, and/or by handling TRIM error responses better. If I get some spare time this week I'll have a look at it.

multi commented 5 months ago

@robn I see... thanks for the hint!

While searching for __blkdev_issue_discard I've found https://www.spinics.net/lists/linux-block/msg111501.html + https://www.spinics.net/lists/linux-block/msg111500.html <- this one removes this check... with the following comment:

We now set a default granularity in the queue limits API, so don't bother with this extra check.

Can't see these changes in master though https://github.com/torvalds/linux/blob/master/block/blk-lib.c maybe they're not merged yet (or will never be, idk)...

robn commented 5 months ago

Good find. Looks like the "set a default" change is torvalds/linux@458aa1a09939, so 6.8+ should "fix" this in your case.

The check itself has existed since 5.10 (torvalds/linux@b35fd7422c2f), and the error it papers over even longer. I have seen it from time to time in my own test environments, but it was never quite the thing I was interested in at the time, so I never looked deeper. So I still think its worthwhile for OpenZFS to try a bit harder to understand this situation, since we'll be supporting those older kernels for quite a while yet!

robn commented 5 months ago

@multi if I'm right, #16070 will fix the hang, and #16082 will prevent OpenZFS from even trying to issue TRIM to devices that don't support it properly. Thanks again for the report!