Open khain0 opened 2 months ago
The same issue occurred today
[103992.441067] CPU: 79 PID: 2564 Comm: z_rd_int_1 Kdump: loaded Tainted: P OE X ------- --- 5.14.0-427.22.1.el9_4.x86_64 #1
[103992.443048] Hardware name: Dell Inc. PowerEdge XE8640/0TVHHH, BIOS 2.0.3 05/15/2024
[103992.444053] RIP: 0010:kfpu_end+0x34/0xa0 [zcommon]
[103992.445062] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 65 8b 05 4a c2 74 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 19 fb 65 ff 0d 29 c2 74 3f 75 05 0f 1f 44 00 00 48 8b 44 24
[103992.447128] RSP: 0000:ff4809c2396478a0 EFLAGS: 00010046
[103992.448170] RAX: 00000000ffffffff RBX: ff4809c2396479a0 RCX: ff16a4239c857000
[103992.449228] RDX: 00000000ffffffff RSI: ff16a4799fde0000 RDI: ff4809c2396479c0
[103992.450284] RBP: 0000000000020000 R08: ff4809c2396479a0 R09: 0000000000000000
[103992.451163] R10: 0000000000000000 R11: ff16a465cee3f578 R12: ff16a4799fde0000
[103992.451978] R13: 0000000000020000 R14: 0000000000000000 R15: 0000000000000008
[103992.452796] FS: 0000000000000000(0000) GS:ff16a4a17f9c0000(0000) knlGS:0000000000000000
[103992.453627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[103992.454454] CR2: 00007fb1fbc6a560 CR3: 000000afba548004 CR4: 0000000000771ee0
[103992.455229] PKRU: 55555554
[103992.456234] Call Trace:
[103992.457231] <TASK>
[103992.457985] ? show_trace_log_lvl+0x1c4/0x2df
[103992.458926] ? show_trace_log_lvl+0x1c4/0x2df
[103992.459914] ? abd_fletcher_4_iter+0x64/0xc0 [zcommon]
[103992.460886] ? __die_body.cold+0x8/0xd
[103992.461829] ? die_addr+0x39/0x60
[103992.462749] ? exc_general_protection+0x1aa/0x400
[103992.463614] ? asm_exc_general_protection+0x22/0x30
[103992.464441] ? kfpu_end+0x34/0xa0 [zcommon]
[103992.465247] abd_fletcher_4_iter+0x64/0xc0 [zcommon]
[103992.466032] abd_iterate_func.part.0+0xbd/0x1c0 [zfs]
[103992.466907] ? __pfx_abd_fletcher_4_iter+0x10/0x10 [zcommon]
[103992.467666] abd_fletcher_4_native+0x7c/0xc0 [zfs]
[103992.468521] ? update_sg_lb_stats+0x7e/0x450
[103992.469119] ? blk_mq_start_request+0x34/0x120
[103992.469713] ? nvme_prep_rq.part.0+0xab/0x110 [nvme]
[103992.470298] ? nvme_queue_rqs+0x1e7/0x290 [nvme]
[103992.470959] zio_checksum_error_impl+0xf9/0x640 [zfs]
[103992.471667] ? __pfx_abd_fletcher_4_native+0x10/0x10 [zfs]
[103992.472362] ? __blk_flush_plug+0xf1/0x150
[103992.473015] ? remove_entity_load_avg+0x2e/0x70
[103992.473617] ? migrate_task_rq_fair+0x14c/0x1d0
[103992.474228] ? sched_clock+0xc/0x30
[103992.474743] ? __smp_call_single_queue+0x93/0x120
[103992.475425] ? ttwu_queue_wakelist+0xf2/0x110
[103992.475978] ? try_to_wake_up+0x3e2/0x5d0
[103992.476622] zio_checksum_error+0x64/0xc0 [zfs]
[103992.477363] vdev_raidz_io_done+0x1b6/0x550 [zfs]
[103992.478090] zio_vdev_io_done+0x7c/0x220 [zfs]
[103992.478811] zio_execute+0x80/0x120 [zfs]
[103992.479534] taskq_thread+0x2cc/0x500 [spl]
[103992.480143] ? __pfx_default_wake_function+0x10/0x10
[103992.480731] ? __pfx_zio_execute+0x10/0x10 [zfs]
[103992.481404] ? __pfx_taskq_thread+0x10/0x10 [spl]
[103992.481976] kthread+0xdd/0x100
[103992.482520] ? __pfx_kthread+0x10/0x10
[103992.483028] ret_from_fork+0x29/0x50
I think this would be #14989, whose workaround is in 2.2.x but not backported into a 2.1.x release so far (it's in 2.1.16-staging, but I don't know if 2.1.16 will ever be released.)
You could try cherrypicking from f288fdb4bd521f263277bcdc76cdec12a169a1e5 if you can't upgrade to 2.2.x, but 2.2.x would probably be the simpler solution.
System information
Type | Version/Name Red Hat Enterprise Linux 9.4 5.14.0-427.22.1.el9_4.x86_64 Distribution Name: Red Hat Enterprise Linux Distribution Version: 9.4 Kernel Version: Red Hat Enterprise Linux Architecture: x86_64 OpenZFS Version: zfs-dkms 2.1.15-3
Command to find OpenZFS version: zfs-2.1.15-3 zfs-kmod-2.1.15-3
Describe the problem you're observing
Zcommon caused a kernel crash.
Describe how to reproduce the problem
Wait for kernel crash
Include any warning/errors/backtraces from the system logs
Kernel crash log