openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.56k stars 1.74k forks source link

General protection fault with 0.8.6 on CentOS 7 #11555

Closed dani closed 2 years ago

dani commented 3 years ago

Environment :

I had 3 crashes of the server with the following message :

[25112.822209] general protection fault: 0000 [#1] SMP 
[25112.822372] Modules linked in: 8021q garp mrp stp llc iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack iptable_filter vfat fat zfs(POE) zunicode(POE) zlua(POE) iTCO_wdt iTCO_vendor_support mxm_wmi sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel zcommon(POE) kvm znvpair(POE) zavl(POE) icp(POE) spl(OE) irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel ttm lrw gf128mul glue_helper drm_kms_helper ablk_helper cryptd pcspkr syscopyarea sysfillrect sysimgblt fb_sys_fops drm sg lpc_ich i2c_i801 joydev drm_panel_orientation_quirks mei_me mei ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_pad ip_tables xfs libcrc32c raid1 sd_mod crc_t10dif crct10dif_generic
[25112.824930]  ahci libahci ixgbe igb libata crct10dif_pclmul crct10dif_common crc32c_intel i2c_algo_bit mdio dca ptp pps_core
[25112.825297] CPU: 0 PID: 1074 Comm: z_rd_int Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.11.1.el7.x86_64 #1
[25112.825652] Hardware name: Supermicro Super Server/X10SDV-TLN4F, BIOS 2.1.v1 06/05/2020
[25112.825916] task: ffff94d7703dc200 ti: ffff94d74a3e0000 task.ti: ffff94d74a3e0000
[25112.826147] RIP: 0010:[<ffffffffc0750bf0>]  [<ffffffffc0750bf0>] fletcher_4_avx2_native+0x40/0x90 [zcommon]
[25112.826465] RSP: 0018:ffff94d74a3e3880  EFLAGS: 00010287
[25112.826629] RAX: ffff94d74d3f64c0 RBX: ffe710efb7a22000 RCX: ffff94d7703dc200
[25112.826864] RDX: 00000000ffffffff RSI: ffe710efb7a22000 RDI: ffff94d74d3f64c0
[25112.827085] RBP: ffff94d74a3e3898 R08: ffff94d626de80c0 R09: ffffffffc0ec57c0
[25112.827314] R10: 0000000000000007 R11: ffff94d74b398000 R12: ffe710efb7a26000
[25112.827534] R13: ffff94d74a3e3980 R14: 0000000000004000 R15: 0000000000000001
[25112.827771] FS:  0000000000000000(0000) GS:ffff94de5f200000(0000) knlGS:0000000000000000
[25112.834495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[25112.841364] CR2: 00007fd857590000 CR3: 00000006cfc10000 CR4: 00000000003607f0
[25112.848388] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[25112.855456] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[25112.862307] Call Trace:
[25112.868937]  [<ffffffffc074cf87>] abd_fletcher_4_iter+0x77/0xe0 [zcommon]
[25112.875867]  [<ffffffffc074cf10>] ? fletcher_4_incremental_byteswap+0x140/0x140 [zcommon]
[25112.883015]  [<ffffffffc0db72e7>] abd_iterate_func+0x97/0x120 [zfs]
[25112.890254]  [<ffffffffc0ec5836>] abd_fletcher_4_native+0x76/0xb0 [zfs]
[25112.897509]  [<ffffffffab7536d9>] ? __blk_run_queue+0x39/0x50
[25112.904849]  [<ffffffffab75379a>] ? queue_unplugged+0x2a/0xa0
[25112.912167]  [<ffffffffc0ec6319>] zio_checksum_error_impl+0x519/0x740 [zfs]
[25112.919603]  [<ffffffffabb85022>] ? mutex_lock+0x12/0x2f
[25112.927018]  [<ffffffffc0e50214>] ? txg_all_lists_empty+0x84/0xa0 [zfs]
[25112.934318]  [<ffffffffc0e4aa30>] ? spa_has_pending_synctask+0x20/0x50 [zfs]
[25112.941485]  [<ffffffffc0e64c3d>] ? vdev_queue_io_to_issue+0x10d/0xa10 [zfs]
[25112.948515]  [<ffffffffc0ebf315>] ? zio_vdev_io_start+0xf5/0x320 [zfs]
[25112.955387]  [<ffffffffc0ec2262>] ? zio_nowait+0xc2/0x160 [zfs]
[25112.962093]  [<ffffffffc0ec65b9>] zio_checksum_error+0x79/0xf0 [zfs]
[25112.968698]  [<ffffffffc0ebc52a>] zio_checksum_verify+0x3a/0x160 [zfs]
[25112.975035]  [<ffffffffabb85022>] ? mutex_lock+0x12/0x2f
[25112.981254]  [<ffffffffc0ebbd35>] ? zio_wait_for_children+0x85/0xd0 [zfs]
[25112.987369]  [<ffffffffc0ebc675>] ? zio_vdev_io_assess+0x25/0x280 [zfs]
[25112.993326]  [<ffffffffc0ebde1f>] zio_execute+0x9f/0x100 [zfs]
[25112.999135]  [<ffffffffc06e4aac>] taskq_thread+0x2ac/0x4f0 [spl]
[25113.004836]  [<ffffffffab4db190>] ? wake_up_state+0x20/0x20
[25113.010450]  [<ffffffffc0ebdd80>] ? zio_taskq_member.isra.8.constprop.11+0x80/0x80 [zfs]
[25113.016082]  [<ffffffffc06e4800>] ? taskq_thread_spawn+0x60/0x60 [spl]
[25113.021567]  [<ffffffffab4c5e71>] kthread+0xd1/0xe0
[25113.026839]  [<ffffffffab4c5da0>] ? insert_kthread_work+0x40/0x40
[25113.032109]  [<ffffffffabb93df7>] ret_from_fork_nospec_begin+0x21/0x21
[25113.037414]  [<ffffffffab4c5da0>] ? insert_kthread_work+0x40/0x40
[25113.042699] Code: 48 89 f3 e8 63 74 ce ea c4 c1 7e 6f 45 00 c4 c1 7e 6f 4d 20 c4 c1 7e 6f 55 40 c4 c1 7e 6f 5d 60 4c 39 e3 73 24 66 0f 1f 44 00 00 <c4> e2 7d 35 23 c5 fd d4 c4 c5 f5 d4 c8 c5 ed d4 d1 c5 e5 d4 da 
[25113.059541] RIP  [<ffffffffc0750bf0>] fletcher_4_avx2_native+0x40/0x90 [zcommon]
[25113.065218]  RSP <ffff94d74a3e3880>

Each time the problem occure, the server was sending a raw encrypted dataset on a remote host, so it might be related. Most of the time, zfs send success though. A pool scrub find no issue.

stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.