openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.73k forks source link

unable to handle kernel paging request ... in zio_compress_data #8600

Open msd-nn opened 5 years ago

msd-nn commented 5 years ago

System information

Type Version/Name
Distribution Name RHEL
Distribution Version 7.6
Linux Kernel 3.10.0-957.5.1.el7.x86_64
Architecture x86_64
ZFS Version 0.7.12-1
SPL Version 0.7.12-1

Describe the problem you're observing

After updating from zfs 0.6.5.9 to 0.7.12, our system became unstable and crashed about once per day. See example kernel logs below. The crashes happened during high write IO. We tried an update to current zfs 0.7.13 which did not solve the problem. Reverting to zfs 0.6.5.11 (and kernel 3.10.0-693.21.1.el7.x86_64 because of known incompatibility of zfs 0.6.5.x to recent RHEL kernels), the system became stable again. We can exclude the different kernel version as a reason, because we also tried an older kernel together with zfs 0.7.13 and saw the same crashes happen.

In all cases, zio_compress_data appears as on of the last function calls in the kernel trace. "unable to handle kernel paging request" is raised either from

abd_iterate_func(src, 0, s_len, zio_compress_zeroed_cb, NULL) == 0

or

c_len = ci->ci_compress(tmp, dst, s_len, d_len, ci->ci_level);

so we guess that the crashes are caused by dereferencing of invalid pointers in connection with scattered ABD. (Scattered) ABDs did not exist in 0.6.5.x.

Interestingly, we operate a second server with identical hardware and zpool configuration. On this server, we upgraded to zfs 0.7.12 without problems (uptime is 32 days and > 300 TiB were written to that zpool without problems since). The two servers differ in data stored on the servers (backups/snapshots of Oracle and/or SAP HANA databases written via NFS in both cases), the problematic server has slightly higher degree of fragmentation and more zfs sub-volumes.

Some hardware info:

Single socket Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 128 GB of ECC RAM zpool consists of storage LUNs from a storage appliance, 128TiB per LUN, attached via multipath FC 4 SAS SSDs as ZIL and L2ARC

some zpool infos

the "crashing" one:

NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
nfsshare  1.36P   989T   408T         -    33%    70%  1.00x  ONLINE  -

the "good" one:

NAME       SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
nfsshare  1.36P  1.03P   344T         -    28%    75%  1.00x  ONLINE  -

We use lz4 compression on both pools, compression ratio is 2.5x in average.

Describe how to reproduce the problem

Heavy write IO on a large zpool using zfs 0.7.x

Include any warning/errors/backtraces from the system logs

example 1 (crash in zio_compress_zeroed_cb)

[ 7406.356182] BUG: unable to handle kernel paging request at ffffa850c396b000
[ 7406.356247] IP: [<ffffffffc0a79a11>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[ 7406.356355] PGD 17fd96067 PUD 1b600e1067 PMD 1b5afc9067 PTE 0
[ 7406.356408] Oops: 0000 [#1] SMP 
[ 7406.356437] Modules linked in: nfsd nfs_acl lockd grace bonding zfs(POE) zunicode(POE) zavl(POE) icp(POE) iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr lpc_ich mei_me mei dm_service_time zcommon(POE) znvpair(POE) ipmi_ssif spl(OE) sg ipmi_si ipmi_devintf ipmi_msghandler wmi tpm_crb acpi_power_meter dm_multipath auth_rpcgss sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qla2xxx crct10dif_pclmul crct10dif_common drm crc32c_intel bnx2x drm_panel_orientation_quirks ahci nvme_fc libahci nvme_fabrics megaraid_sas nvme_core libata scsi_transport_fc
[ 7406.357138]  mdio ptp pps_core scsi_tgt libcrc32c dm_mirror dm_region_hash dm_log dm_mod
[ 7406.357213] CPU: 5 PID: 79370 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.5.1.el7.x86_64 #1
[ 7406.357296] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.4.3 01/17/2017
[ 7406.357351] task: ffff9043c0665140 ti: ffff9042a5bdc000 task.ti: ffff9042a5bdc000
[ 7406.357405] RIP: 0010:[<ffffffffc0a79a11>]  [<ffffffffc0a79a11>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[ 7406.357519] RSP: 0018:ffff9042a5bdfc18  EFLAGS: 00010287
[ 7406.357559] RAX: ffffa850c396b000 RBX: 0000000000100000 RCX: 0000000000100000
[ 7406.357611] RDX: 0000000000000000 RSI: ffffa850c3a6b000 RDI: ffffa850c396b000
[ 7406.357664] RBP: ffff9042a5bdfc18 R08: 0000000000000000 R09: ffff902553388000
[ 7406.357717] R10: 0000000000000002 R11: 0000001e96b0c000 R12: ffffffffc0a799f0
[ 7406.357769] R13: 0000000000000000 R14: 0000000000100000 R15: ffffa850df694000
[ 7406.357822] FS:  0000000000000000(0000) GS:ffff9043fc740000(0000) knlGS:0000000000000000
[ 7406.357881] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7406.357923] CR2: ffffa850c396b000 CR3: 0000001b7a210000 CR4: 00000000003607e0
[ 7406.357976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7406.358028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7406.358079] Call Trace:
[ 7406.358126]  [<ffffffffc09a6257>] abd_iterate_func+0x97/0x120 [zfs]
[ 7406.358213]  [<ffffffffc0a79aab>] zio_compress_data+0x3b/0xc0 [zfs]
[ 7406.358300]  [<ffffffffc0a72fbf>] zio_write_compress+0x3bf/0x6a0 [zfs]
[ 7406.358388]  [<ffffffffc0a7280f>] zio_execute+0x9f/0x100 [zfs]
[ 7406.358440]  [<ffffffffc0475d7c>] taskq_thread+0x2ac/0x4f0 [spl]
[ 7406.358489]  [<ffffffff8fed67f0>] ? wake_up_state+0x20/0x20
[ 7406.358562]  [<ffffffffc0a72770>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
[ 7406.358627]  [<ffffffffc0475ad0>] ? taskq_thread_spawn+0x60/0x60 [spl]
[ 7406.358678]  [<ffffffff8fec1c71>] kthread+0xd1/0xe0
[ 7406.358716]  [<ffffffff8fec1ba0>] ? insert_kthread_work+0x40/0x40
[ 7406.358765]  [<ffffffff90574c37>] ret_from_fork_nospec_begin+0x21/0x21
[ 7406.358814]  [<ffffffff8fec1ba0>] ? insert_kthread_work+0x40/0x40
[ 7406.358858] Code: 00 00 00 00 00 0f 1f 44 00 00 0f 1f 44 00 00 55 48 01 fe 48 39 f7 48 89 e5 72 10 eb 1d 0f 1f 44 00 00 48 83 c7 08 48 39 fe 76 0f <48> 83 3f 00 74 f1 b8 01 00 00 00 5d c3 66 90 31 c0 5d c3 66 90 
[ 7406.359147] RIP  [<ffffffffc0a79a11>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[ 7406.359240]  RSP <ffff9042a5bdfc18>
[ 7406.361319] CR2: ffffa850c396b000

example 2 (crash in lz4_compress_zfs):

[54195.432822] BUG: unable to handle kernel paging request at ffffbb8a45a70000
[54195.432862] IP: [<ffffffffc1117887>] lz4_compress_zfs+0xf7/0x840 [zfs]
[54195.432927] PGD 17fd96067 PUD 1772780067 PMD 1768165067 PTE 0
[54195.432955] Oops: 0000 [#1] SMP 
[54195.432971] Modules linked in: nfsd nfs_acl lockd grace bonding zfs(POE) zunicode(POE) zavl(POE) icp(POE) iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) spl(OE) dm_service_time joydev ipmi_ssif sg pcspkr mei_me mei lpc_ich ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter tpm_crb dm_multipath auth_rpcgss sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qla2xxx crct10dif_pclmul crct10dif_common crc32c_intel drm ahci bnx2x drm_panel_orientation_quirks libahci nvme_fc libata nvme_fabrics megaraid_sas nvme_core mdio
[54195.433360]  scsi_transport_fc ptp pps_core libcrc32c scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[54195.433409] CPU: 2 PID: 24937 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.5.1.el7.x86_64 #1
[54195.433453] Hardware name: Dell Inc. PowerEdge R630/02C2CP, BIOS 2.9.1 12/04/2018
[54195.433483] task: ffff9b56ecb56180 ti: ffff9b56b07c8000 task.ti: ffff9b56b07c8000
[54195.433513] RIP: 0010:[<ffffffffc1117887>]  [<ffffffffc1117887>] lz4_compress_zfs+0xf7/0x840 [zfs]
[54195.433587] RSP: 0018:ffff9b56b07cbc40  EFLAGS: 00010212
[54195.433608] RAX: ffffbb8a45acbff4 RBX: 0000000000100000 RCX: ffffbb8a45a6fffc
[54195.433635] RDX: ffffbb8a45a6fffd RSI: 00000000000003bb RDI: 00000000000003bb
[54195.433664] RBP: ffff9b56b07cbc98 R08: 0000000000000001 R09: 000000000000005f
[54195.433697] R10: 0000000093060115 R11: ffffbb8a36dc9a12 R12: ffffbb8a459cc000
[54195.433724] R13: ffff9b56a08ec000 R14: ffffbb8a45a6ffe0 R15: ffffbb8a36e8c000
[54195.433755] FS:  0000000000000000(0000) GS:ffff9b56fc680000(0000) knlGS:0000000000000000
[54195.433789] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54195.433816] CR2: ffffbb8a45a70000 CR3: 0000001fd4fa0000 CR4: 00000000003607e0
[54195.433844] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[54195.433871] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[54195.433907] Call Trace:
[54195.433950]  [<ffffffffc1199af7>] zio_compress_data+0x87/0xc0 [zfs]
[54195.434002]  [<ffffffffc1192fbf>] zio_write_compress+0x3bf/0x6a0 [zfs]
[54195.434053]  [<ffffffffc119280f>] zio_execute+0x9f/0x100 [zfs]
[54195.434083]  [<ffffffffc0645d7c>] taskq_thread+0x2ac/0x4f0 [spl]
[54195.434109]  [<ffffffffa86d67f0>] ? wake_up_state+0x20/0x20
[54195.434156]  [<ffffffffc1192770>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
[54195.434192]  [<ffffffffc0645ad0>] ? taskq_thread_spawn+0x60/0x60 [spl]
[54195.434219]  [<ffffffffa86c1c71>] kthread+0xd1/0xe0
[54195.434239]  [<ffffffffa86c1ba0>] ? insert_kthread_work+0x40/0x40
[54195.434265]  [<ffffffffa8d74c37>] ret_from_fork_nospec_begin+0x21/0x21
[54195.434292]  [<ffffffffa86c1ba0>] ? insert_kthread_work+0x40/0x40
[54195.434315] Code: 00 00 00 eb 0f 0f 1f 80 00 00 00 00 45 89 c8 48 89 d1 89 fe 45 8d 48 01 41 c1 f8 06 49 63 d0 48 01 ca 48 39 d0 0f 82 df 06 00 00 <44> 8b 02 49 89 ca 4d 29 e2 41 69 f8 b1 79 37 9e 4d 8d 44 b5 00 
[54195.434473] RIP  [<ffffffffc1117887>] lz4_compress_zfs+0xf7/0x840 [zfs]
[54195.435708]  RSP <ffff9b56b07cbc40>
[54195.436930] CR2: ffffbb8a45a70000
richardelling commented 5 years ago

As a workaround, does disabling ABD result in stable operation? https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_abd_scatter_enabled

msd-nn commented 5 years ago

@richardelling thank you for your feedback. Setting zfs_abd_scatter_enabled = 0 is worth a try. Unfortunately, I do not have a test system where I could reproduce the problem and will not get a downtime on the production system (which is currently running 0.6.5.11 again) before next week. I will let you know, when I could test this option, maybe it helps tracking down the problem.

stale[bot] commented 4 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

jsmarkb commented 4 years ago

We have seen the same problem on a few hosts over the past couple of weeks on zfs 0.8.4:

[723473.157220] BUG: unable to handle kernel paging request at ffffb4288aa24000
[723473.157955] IP: [<ffffffffc0c94431>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[723473.158661] PGD 17fe09067 PUD c23e91067 PMD 201e977067 PTE 0
[723473.159294] Oops: 0000 [#1] SMP
[723473.159915] Modules linked in: joydev mpt3sas mpt2sas raid_class scsi_transport_sas mptctl mptbase nfsv3 nfs_acl nfs lockd grace fscache dell_rbu dm_mirror dm_region_hash dm_log iTCO_wdt iTCO_vendor_support dell_smbios dell_wmi_descriptor dcdbas zfs(POE) zunicode(POE) zlua(POE) skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm zcommon(POE) mgag200 znvpair(POE) zavl(POE) irqbypass icp(POE) ttm pcspkr drm_kms_helper spl(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops drm sg drm_panel_orientation_quirks lpc_ich i2c_i801 mei_me mei wmi ipmi_si ipmi_devintf ipmi_msghandler tpm_crb acpi_power_meter acpi_pad auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ahci
[723473.163831]  ixgbe glue_helper lrw gf128mul ablk_helper cryptd libahci igb libata megaraid_sas mdio ptp pps_core i2c_algo_bit dca nfit libnvdimm dm_mod
[723473.165187] CPU: 13 PID: 113950 Comm: z_wr_iss_h Kdump: loaded Tainted: P           OE  ------------   3.10.0-1127.18.2.el7.x86_64 #1
[723473.166529] Hardware name: Dell Inc. PowerEdge R640, BIOS 2.6.4 04/09/2020
[723473.167196] task: ffff959fbb4ab150 ti: ffff959f24ba8000 task.ti: ffff959f24ba8000
[723473.167855] RIP: 0010:[<ffffffffc0c94431>]  [<ffffffffc0c94431>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[723473.168544] RSP: 0018:ffff959f24babc18  EFLAGS: 00010287
[723473.169204] RAX: ffffb4288aa24000 RBX: 0000000000020000 RCX: 0000000000020000
[723473.169873] RDX: 0000000000000000 RSI: ffffb4288aa44000 RDI: ffffb4288aa24000
[723473.170534] RBP: ffff959f24babc18 R08: 0000000000000000 R09: 0000000000000001
[723473.171185] R10: 000000001bc6d001 R11: ffff95aa1bc6e4a0 R12: ffffffffc0c94410
[723473.171829] R13: 0000000000000000 R14: 0000000000020000 R15: ffffb4284b15c000
[723473.172471] FS:  0000000000000000(0000) GS:ffff95afbad80000(0000) knlGS:0000000000000000
[723473.173105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[723473.173730] CR2: ffffb4288aa24000 CR3: 000000200efc8000 CR4: 00000000007607e0
[723473.174353] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[723473.174973] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[723473.175593] PKRU: 00000000
[723473.176209] Call Trace:
[723473.176838]  [<ffffffffc0b852e7>] abd_iterate_func+0x97/0x120 [zfs]
[723473.177474]  [<ffffffffc0c944cb>] zio_compress_data+0x3b/0xc0 [zfs]
[723473.178097]  [<ffffffffc0c8c98d>] zio_write_compress+0x3dd/0x750 [zfs]
[723473.178704]  [<ffffffffc0c8bb5f>] zio_execute+0x9f/0x100 [zfs]
[723473.179283]  [<ffffffffc086da9c>] taskq_thread+0x2ac/0x4f0 [spl]
[723473.179850]  [<ffffffff8c4db990>] ? wake_up_state+0x20/0x20
[723473.180435]  [<ffffffffc0c8bac0>] ? zio_taskq_member.isra.8.constprop.11+0x80/0x80 [zfs]
[723473.181015]  [<ffffffffc086d7f0>] ? taskq_thread_spawn+0x60/0x60 [spl]
[723473.181585]  [<ffffffff8c4c6691>] kthread+0xd1/0xe0
[723473.182138]  [<ffffffff8c4c65c0>] ? insert_kthread_work+0x40/0x40
[723473.182682]  [<ffffffff8cb92d1d>] ret_from_fork_nospec_begin+0x7/0x21
[723473.183216]  [<ffffffff8c4c65c0>] ? insert_kthread_work+0x40/0x40
[723473.183738] Code: 00 00 00 00 00 0f 1f 44 00 00 0f 1f 44 00 00 55 48 01 fe 48 39 f7 48 89 e5 72 10 eb 1d 0f 1f 44 00 00 48 83 c7 08 48 39 fe 76 0f <48> 83 3f 00 74 f1 b8 01 00 00 00 5d c3 66 90 31 c0 5d c3 66 90
[723473.184840] RIP  [<ffffffffc0c94431>] zio_compress_zeroed_cb+0x21/0x40 [zfs]
[723473.185378]  RSP <ffff959f24babc18>
[723473.185881] CR2: ffffb4288aa24000
cperl82 commented 4 years ago

Fwiw, we have crash dumps, and we did some trawling around the stack to find the abd_t that induced the crash, so we can give more details about what things looked like just before the crash, if that is helpful.

Re setting zfs_abd_scatter_enabled = 0, the abd_t is marked with ABD_FLAG_LINEAR, so I'm not sure if you still think that would be helpful.

In the below case, dereferencing 0xffffbf23650d5000 was the thing that caused the crash.

crash> struct abd_iter 0xffff9dc912d63c38
struct abd_iter {
  iter_mapaddr = 0xffffbf23650d5000, 
  iter_mapsize = 131072, 
  iter_abd = 0xffff9dd820ab39b0, 
  iter_pos = 0, 
  iter_offset = 0, 
  iter_sg = 0x0
}
crash> struct abd 0xffff9dd820ab39b0
struct abd {
  abd_flags = ABD_FLAG_LINEAR, 
  abd_size = 131072, 
  abd_parent = 0x0, 
  abd_children = {
    rc_count = 0
  }, 
  abd_u = {
    abd_scatter = {
      abd_offset = 1695371264, 
      abd_nents = 4294950691, 
      abd_sgl = 0xffff9dc64da72080
    }, 
    abd_linear = {
      abd_buf = 0xffffbf23650d5000, 
      abd_sgl = 0xffff9dc64da72080
    }
  }
}
ironMann commented 3 years ago

@cperl82 Do you still have the crash dump? Can you please do crash> vtop ffffbf23650d5000?

cperl82 commented 3 years ago

Yes, I have 4 different crash dumps saved, all seem to be the same issue (and, I expect before too long, I'll have some more).

crash> vtop 0xffffbf23650d5000
VIRTUAL           PHYSICAL        
ffffbf23650d5000  (not mapped)

PGD DIRECTORY: ffffffff88010000
PAGE DIRECTORY: 17fe09067
   PUD: 17fe09468 => ffe9c1067
   PMD: ffe9c1940 => 1012164067
   PTE: 10121646a8 => 0
gleventhal commented 3 years ago

Do you have any recommendations for any settings/tunables that would assist with debugging this issue?

ironMann commented 3 years ago

It would be great if small reproducer existed... From the abd above and the fact the address is no longer mapped, I assume there is some kind of race causing use after free (this abd only borrows the buffer). From the crash dumps, one can inspect the zio object, and try to get to the owner: (arc_buf_t*) (arc_write_callback_t*)(zio->io_private)->awcb_buf if it is coming from arc_write()

gleventhal commented 3 years ago

Something common to the crashes we see here, and in another past situation is the use of RAIDZ2 on top of a perc RAID controller in with each disk a single-disk RAID 0. Is there any known frailty perhaps with RAIDZ2 and the code paths referenced in these stacks?

I am going to take another look at the dumps and see if I can answer your question about the zio_t owner.

cperl82 commented 3 years ago

Sorry I haven't had time to come back to this.

I still have the crash dump I was referencing earlier, so I'll use that for continuity. But, we've also experienced the same crash on zfs-2.0.3 (we were hoping we could just upgrade our way out of this).

I had to do some more trawling around on the stack to find the zio_t, but I think I got it, as evidenced by the fact that it's io_abd member points at the same struct abd we were looking at before:

crash> struct zio.io_abd ffff9dd954ada338
  io_abd = 0xffff9dd820ab39b0

So, following your example to get to the arc_buf_t, I see this:

crash> struct zio.io_private ffff9dd954ada338
  io_private = 0xffff9dc91131bd20
crash> struct arc_write_callback 0xffff9dc91131bd20
struct arc_write_callback {
  awcb_private = 0x0, 
  awcb_ready = 0xffffffffc0dfa450 <zfs_get_done>, 
  awcb_children_ready = 0xffff9dd870e21900, 
  awcb_physdone = 0xffff9dd7376b8f00, 
  awcb_done = 0xffff9dc91131bdc0, 
  awcb_buf = 0xdead000000000200
}

Obviously awcb_buf is not a valid pointer. I'm not sure what to make of the other function pointers. awcb_ready seems valid, but the others seem to point at invalid addresses.

Any thoughts about what else to look at?

jebsolutions commented 3 years ago

Unsure if this is helpful or not...

0xdead000000000200 = LIST_POISON2

e.g. include/os/linux/spl/sys/list.h

in Linux node->prev will be LIST_POISON2 instead of NULL

gleventhal commented 3 years ago

So that (LIST_POISON2) seems to bolster the use after free argument.

FWIW, when I did bt -a on the vmcore, I noticed an identical stack to this crashing one on another CPU, and %rdi pointed to the same (unmapped) address that caused the kernel page fault exception, which made me wonder if there might be a race between two threads referencing the same buffer. (%rdi is presumably the abd_iter.iter_mapaddr passed as a the first param to zio_compress_zeroed_cb in both cases. )

cperl82 commented 3 years ago

Actually, I think my last comment was slightly wrong.

I don't believe this zio came through arc_write. If you look at the io_ready and io_done function pointers, I believe they indicate it must have come through dmu_sync. So the thing that io_private points at would a dmu_sync_arg_t:

crash> struct zio.io_ready,io_done ffff9dd954ada338
  io_ready = 0xffffffffc0d212b0 <dmu_sync_late_arrival_ready>
  io_done = 0xffffffffc0d22640 <dmu_sync_late_arrival_done>
crash> struct zio.io_private ffff9dd954ada338
  io_private = 0xffff9dc91131bd20
crash> struct dmu_sync_arg_t 0xffff9dc91131bd20
struct dmu_sync_arg_t {
  dsa_dr = 0x0, 
  dsa_done = 0xffffffffc0dfa450 <zfs_get_done>, 
  dsa_zgd = 0xffff9dd870e21900, 
  dsa_tx = 0xffff9dd7376b8f00
}
gleventhal commented 3 years ago

I just wanted to check-in and see if the data provided by cperl82 is sufficient @ironMann

If not, please let us know what other data we can provide.

gleventhal commented 3 years ago

@msd-nn I was just curious where you ended up with this issue. Did you find a resolution?

gleventhal commented 3 years ago

@behlendorf Hi Brian, I was hoping I might be able to get your take on this. We are seeing these panics, repeatedly, every few weeks on our production Postgreql boxes, which use raidz2. Is there enough data for you guys to analyze here? If not, I'd love to try to get you whatever you need or help in any way I can.

gleventhal commented 3 years ago

Looking at the code, not really being familiar with it, I get a feeling it might be related to prematurely releasing a lwb / some kind of race with the ZIL during dmu_sync, but I am probably talking out of school here.

If I followed the dump properly, it looks like we are in LWB_STATE_ISSUED for whatever that's worth.

gleventhal commented 3 years ago

Is there any hope of this issue being addressed? We had to stop using ZFS in production for a major part of our infrastructure due to the instability, which is really a shame and something I'd like to rectify.

If you need something from me, I am more than happy to assist with fixing the issue.

rotanid commented 3 years ago

@msd-nn I was just curious where you ended up with this issue. Did you find a resolution?

i can answer instead: we didn't have this problem in the last months fortunately, instead we had multiple appearences of #9741 (which seems to have been fixed now upstream, but maybe we can't benefit as we are not going to migrate to 2.x releases anytime soon and probably 0.8.x is kind of EOL)

gleventhal commented 3 years ago

I find a very similar stack on a different core in this crash dump, with a zio at a separate address than the zio in the crashed thread and an zio.io_abd at a different address who shares the same abd->abd_u.abd_linear.abd_buf address that is unexpectedly unmapped in the crashed thread.

crash> set
    PID: 38038
COMMAND: "z_wr_iss_h"
   TASK: ffff8e40248a3180  [THREAD_INFO: ffff8e400a4f4000]
    CPU: 11
  STATE: TASK_RUNNING (ACTIVE)
crash> zio.io_abd ffff8e3246708f30
  io_abd = 0xffff8e2ec9ff6680
crash> abd 0xffff8e2ec9ff6680
struct abd {
  abd_flags = ABD_FLAG_LINEAR, 
  abd_size = 131072, 
  abd_gang_link = {
    next = 0xdead000000000100, 
    prev = 0xdead000000000200
  }, 
  abd_parent = 0x0, 
  abd_children = {
    rc_count = 0
  }, 
  abd_mtx = {
    m_mutex = {
      count = {
        counter = 1
      }, 
      wait_lock = {
        {
          rlock = {
            raw_lock = {
              val = {
                counter = 0
              }
            }
          }
        }
      }, 
      wait_list = {
        next = 0xffff8e2ec9ff66b0, 
        prev = 0xffff8e2ec9ff66b0
      }, 
      owner = 0x0, 
      {
        osq = {
          tail = {
            counter = 0
          }
        }, 
        __UNIQUE_ID_rh_kabi_hide3 = {
          spin_mlock = 0x0
        }, 
        {<No data fields>}
      }
    }, 
    m_lock = {
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 0
            }
          }
        }
      }
    }, 
    m_owner = 0x0
  }, 
  abd_u = {
    abd_scatter = {
      abd_offset = 713875456, 
      abd_nents = 4294942341, 
      abd_sgl = 0xffff8e2dea29d6a0
    }, 
    abd_linear = {
      abd_buf = 0xffff9e852a8ce000, 
      abd_sgl = 0xffff8e2dea29d6a0
    }, 
    abd_gang = {
      abd_gang_chain = {
        list_size = 18446636893514555392, 
        list_offset = 18446618926586123936, 
        list_head = {
          next = 0xdead000000000100, 
          prev = 0xdead000000000200
        }
      }
    }
  }
}
crash> set
    PID: 38039
COMMAND: "z_wr_iss_h"
   TASK: ffff8e40248a4200  [THREAD_INFO: ffff8e3fead0c000]
    CPU: 12
  STATE: TASK_RUNNING (PANIC)
crash> zio.io_abd ffff8e2ac172f470
  io_abd = 0xffff8e24887c2a80
crash> abd 0xffff8e24887c2a80
struct abd {
  abd_flags = ABD_FLAG_LINEAR, 
  abd_size = 131072, 
  abd_gang_link = {
    next = 0xdead000000000100, 
    prev = 0xdead000000000200
  }, 
  abd_parent = 0x0, 
  abd_children = {
    rc_count = 0
  }, 
  abd_mtx = {
    m_mutex = {
      count = {
        counter = 1
      }, 
      wait_lock = {
        {
          rlock = {
            raw_lock = {
              val = {
                counter = 0
              }
            }
          }
        }
      }, 
      wait_list = {
        next = 0xffff8e24887c2ab0, 
        prev = 0xffff8e24887c2ab0
      }, 
      owner = 0x0, 
      {
        osq = {
          tail = {
            counter = 0
          }
        }, 
        __UNIQUE_ID_rh_kabi_hide3 = {
          spin_mlock = 0x0
        }, 
        {<No data fields>}
      }
    }, 
    m_lock = {
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 0
            }
          }
        }
      }
    }, 
    m_owner = 0x0
  }, 
  abd_u = {
    abd_scatter = {
      abd_offset = 713875456, 
      abd_nents = 4294942341, 
      abd_sgl = 0xffff8e340c7e12e0
    }, 
    abd_linear = {
      abd_buf = 0xffff9e852a8ce000, 
      abd_sgl = 0xffff8e340c7e12e0
    }, 
    abd_gang = {
      abd_gang_chain = {
        list_size = 18446636893514555392, 
        list_offset = 18446618952931873504, 
        list_head = {
          next = 0xdead000000000100, 
          prev = 0xdead000000000200
        }
      }
    }
  }
}

That appears to be the same abd_linear->abd_buf address used by the two separate processes which is the unmapped address causing the panic.

crash> vtop 0xffff9e852a8ce000
VIRTUAL           PHYSICAL        
ffff9e852a8ce000  (not mapped)

PGD DIRECTORY: ffffffff92e10000
PAGE DIRECTORY: 17fe09067
   PUD: 17fe090a0 => 148396067
   PMD: 148396aa0 => 1f3c601067
   PTE: 1f3c601670 => 0

Is this expected, due to the borrowing of the buffer that you mentioned? I interpreted it at the zio would borrow the abd buffer, but wasn't sure if it's expected that two separate abds would be referencing the same linear buffer. I wondered if it's perhaps a locking issue. @ironMann

gleventhal commented 3 years ago

@msd-nn I was just curious where you ended up with this issue. Did you find a resolution?

i can answer instead: we didn't have this problem in the last months fortunately, instead we had multiple appearences of #9741 (which seems to have been fixed now upstream, but maybe we can't benefit as we are not going to migrate to 2.x releases anytime soon and probably 0.8.x is kind of EOL)

Let me ask you, on the panicking systems, were you using Postgresql? If so, what version please?

rotanid commented 3 years ago

@msd-nn I was just curious where you ended up with this issue. Did you find a resolution?

i can answer instead: we didn't have this problem in the last months fortunately, instead we had multiple appearences of #9741 (which seems to have been fixed now upstream, but maybe we can't benefit as we are not going to migrate to 2.x releases anytime soon and probably 0.8.x is kind of EOL)

Let me ask you, on the panicking systems, were you using Postgresql? If so, what version please?

i can't tell 100%, but as far as i know there are only backups, no actual application working directly on these ZFS Volumes.

btw, my statement about not moving to OpenZFS 2.x soon is wrong now, as 2.x started to appear in the repository, we decided to give it a go

rincebrain commented 2 years ago

I may have just reproduced this deliberately on 2.1.2.

[83313.605390] BUG: unable to handle page fault for address: ffff998866b57000
[83313.605417] #PF: supervisor read access in kernel mode
[83313.605426] #PF: error_code(0x0000) - not-present page
[83313.605436] PGD 100000067 P4D 100000067 PUD 100196067 PMD 3640a7067 PTE 0
[83313.605451] Oops: 0000 [#1] SMP PTI
[83313.605459] CPU: 2 PID: 1596099 Comm: z_wr_iss_h Kdump: loaded Tainted: P           OE     5.10.0-11-amd64 #1 Debian 5.10.92-1
[83313.605479] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[83313.605525] RIP: 0010:zio_compress_zeroed_cb+0x18/0x30 [zfs]
[83313.605537] Code: 59 ff 48 8b 3b eb b3 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 48 01 fe 48 39 f7 72 0b eb 15 48 83 c7 08 48 39 fe 76 0c <48> 83 3f 00 74 f1 b8 01 00 00 00 c3 31 c0 c3 66 0f 1f 84 00 00 00
[83313.605571] RSP: 0018:ffff9988468b3c80 EFLAGS: 00010287
[83313.605581] RAX: ffffffffc1724530 RBX: 0000000000100000 RCX: ffff8931378dfb60
[83313.605594] RDX: 0000000000000000 RSI: ffff998866c57000 RDI: ffff998866b57000
[83313.605606] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff892f07732828
[83313.605619] R10: ffff89319d048d98 R11: 0000000000000000 R12: ffff8931378dfb60
[83313.605632] R13: ffff9988468b3cb0 R14: 0000000000000000 R15: 0000000000000008
[83313.605645] FS:  0000000000000000(0000) GS:ffff89320fc80000(0000) knlGS:0000000000000000
[83313.605659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[83313.605670] CR2: ffff998866b57000 CR3: 0000000306e4a004 CR4: 00000000000706e0
[83313.605684] Call Trace:
[83313.605708]  abd_iterate_func.part.0+0x118/0x1b0 [zfs]
[83313.605744]  ? zio_checksum_templates_free+0x80/0x80 [zfs]
[83313.605775]  zio_compress_data+0x39/0x100 [zfs]
[83313.605803]  zio_write_compress+0x4ff/0x9c0 [zfs]
[83313.605816]  ? __wake_up_common_lock+0x8a/0xc0
[83313.605842]  zio_execute+0x81/0x120 [zfs]
[83313.605853]  taskq_thread+0x2df/0x520 [spl]
[83313.605863]  ? wake_up_q+0xa0/0xa0
[83313.605888]  ? zio_gang_tree_free+0x60/0x60 [zfs]
[83313.605900]  ? taskq_thread_spawn+0x50/0x50 [spl]
[83313.605910]  kthread+0x11b/0x140
[83313.605917]  ? __kthread_bind_mask+0x60/0x60
[83313.606220]  ret_from_fork+0x22/0x30
[83313.606505] Modules linked in: zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs nfs_ssc fscache xfs socwatch2_15(OE) sep5(OE) socperf3(OE) intel_rapl_msr intel_rapl_common intel_pmc_core_pltdrv intel_pmc_core ghash_clmulni_intel pax(OE) rfkill aesni_intel libaes crypto_simd cryptd glue_helper rapl joydev serio_raw sg ac vboxguest evdev pcspkr nfsd binfmt_misc auth_rpcgss nfs_acl lockd grace sunrpc msr fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sd_mod sr_mod cdrom t10_pi crc_t10dif crct10dif_generic ata_generic hid_generic usbhid hid ohci_pci vmwgfx ttm drm_kms_helper virtio_net net_failover failover ohci_hcd cec ahci libahci drm ata_piix crct10dif_pclmul ehci_pci crct10dif_common libata ehci_hcd crc32_pclmul psmouse usbcore crc32c_intel
[83313.606541]  scsi_mod i2c_piix4 virtio_pci virtio_ring usb_common virtio video button
[83313.610288] CR2: ffff998866b57000

I was running fsstress from LTP on a pool while a pair of scripts in a loop in the BG changed recordsize between 128k and 1M and compression between lz4 and zstd-3 at random intervals up to 30 seconds apart, trying to reproduce a different bug, and then it went bang.

gleventhal commented 2 years ago

Very interesting!

Any sense of whether it corresponds to the recordsize change or the compression change?

It might be interesting to see if you can repro consistently while using only one of those changes.

rincebrain commented 2 years ago

Absolutely no data at the moment, but when I'm not trying to polish off a couple things, I'll come back to it and see. (Well, that and I'm waiting for my kASAN kernel package to finish building for the VM.)

Simple enough to reproduce, though, at least in my n=1 experiment. Watch I come back to it and it never happens again.

(I also have a nice core dump laying around, though in my recent experience crash has a bad habit of obeying its name on loading dumps.)

CoolCold commented 2 years ago

I've catched the same issue a day ago, not sure what useful info I can provide, but so far:

System info:

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal

Kernel:

Linux fit-prod-db4 5.13.0-40-generic #45~20.04.1-Ubuntu SMP Mon Apr 4 09:38:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

ZFS:

root@fit-prod-db4:~# modinfo zfs|head
filename:       /lib/modules/5.13.0-40-generic/kernel/zfs/zfs.ko
version:        2.0.6-1ubuntu2.1
license:        CDDL
author:         OpenZFS
description:    ZFS
alias:          devname:zfs
alias:          char-major-10-249
srcversion:     0F62A5B7879CE835336F55F
depends:        spl,znvpair,icp,zlua,zzstd,zunicode,zcommon,zavl

zpool:

root@fit-prod-db4 ~ # zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
db01  1.73T   416G  1.33T        -         -    70%    23%  1.00x    ONLINE  -

syslog (it resides on non-zfs volume):

Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.728816] BUG: unable to handle page fault for address: ffffbdbf9ecdf000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.728964] #PF: supervisor read access in kernel mode
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.729076] #PF: error_code(0x0000) - not-present page
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.729184] PGD 100000067 P4D 100000067 PUD 1001d6067 PMD 10c9f5067 PTE 0
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.729302] Oops: 0000 [#1] SMP NOPTI
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.729405] CPU: 7 PID: 2680396 Comm: z_wr_iss_h Tainted: P           O      5.13.0-40-generic #45~20.04.1-Ubuntu
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.729560] Hardware name: Hetzner /B450D4U-V1L, BIOS L1.02W 07/09/2020
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.729673] RIP: 0010:zio_compress_zeroed_cb+0x1c/0x30 [zfs]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.729913] Code: e8 48 85 9a ff 48 8b 3b eb b1 0f 1f 40 00 0f 1f 44 00 00 55 48 01 fe 48 89 e5 48 39 f7 72 0b eb 16 48 83 c7 08 48 39 fe 76 0d <48> 83 3f 00 74 f1 b8 01 00 00 00 5d c3 31 c0 5d c3 0f 1f 00 0f 1f
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.730125] RSP: 0018:ffffbdbf8129fc78 EFLAGS: 00010287
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.730233] RAX: ffffffffc0c4a9f0 RBX: 0000000000010000 RCX: 0000000000010000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.730376] RDX: 0000000000000000 RSI: ffffbdbf9ecef000 RDI: ffffbdbf9ecdf000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.730516] RBP: ffffbdbf8129fc78 R08: 0000000000000000 R09: 0000000000000001
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.730658] R10: 0000000000004000 R11: 0000000000000000 R12: ffff9458fc25e870
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.730799] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.730941] FS:  0000000000000000(0000) GS:ffff94770e580000(0000) knlGS:0000000000000000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.731089] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.731198] CR2: ffffbdbf9ecdf000 CR3: 000000018b586000 CR4: 0000000000350ee0
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.731340] Call Trace:
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.731437]  <TASK>
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.731535]  abd_iterate_func+0x9c/0x100 [zfs]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.731737]  ? zio_checksum_templates_free+0x80/0x80 [zfs]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.731963]  zio_compress_data+0x3c/0x110 [zfs]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.732177]  zio_write_compress+0x42b/0x780 [zfs]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.732400]  zio_execute+0x92/0xe0 [zfs]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.732618]  taskq_thread+0x23d/0x430 [spl]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.732731]  ? wake_up_q+0xa0/0xa0
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.732834]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.733058]  ? param_set_taskq_kick+0xf0/0xf0 [spl]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.733172]  kthread+0x12b/0x150
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.733272]  ? set_kthread_struct+0x40/0x40
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.733375]  ret_from_fork+0x22/0x30
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.733478]  </TASK>
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.733573] Modules linked in: binfmt_misc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul snd_hda_intel ghash_clmulni_intel snd_intel_dspcfg aesni_intel cec snd_intel_sdw_acpi crypto_simd snd_hda_codec cryptd rc_core rapl snd_hda_core snd_hwdep wmi_bmof snd_pcm fb_sys_fops syscopyarea sysfillrect 8021q snd_timer garp k10temp sysimgblt mrp snd soundcore ccp bridge stp llc mac_hid ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack sch_fq_codel nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter drm ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 nvme igb ahci i2c_algo_bit xhci_pci
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.733634]  gpio_amdpt crc32_pclmul i2c_piix4 nvme_core dca libahci xhci_pci_renesas wmi gpio_generic
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.734522] CR2: ffffbdbf9ecdf000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.734622] ---[ end trace 186199f8874c0b10 ]---
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.734725] RIP: 0010:zio_compress_zeroed_cb+0x1c/0x30 [zfs]
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.734947] Code: e8 48 85 9a ff 48 8b 3b eb b1 0f 1f 40 00 0f 1f 44 00 00 55 48 01 fe 48 89 e5 48 39 f7 72 0b eb 16 48 83 c7 08 48 39 fe 76 0d <48> 83 3f 00 74 f1 b8 01 00 00 00 5d c3 31 c0 5d c3 0f 1f 00 0f 1f
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.735159] RSP: 0018:ffffbdbf8129fc78 EFLAGS: 00010287
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.735269] RAX: ffffffffc0c4a9f0 RBX: 0000000000010000 RCX: 0000000000010000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.735410] RDX: 0000000000000000 RSI: ffffbdbf9ecef000 RDI: ffffbdbf9ecdf000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.735552] RBP: ffffbdbf8129fc78 R08: 0000000000000000 R09: 0000000000000001
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.735691] R10: 0000000000004000 R11: 0000000000000000 R12: ffff9458fc25e870
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.735832] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.735971] FS:  0000000000000000(0000) GS:ffff94770e580000(0000) knlGS:0000000000000000
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.736117] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 13 15:07:21 fit-prod-db4 kernel: [3123766.736225] CR2: ffffbdbf9ecdf000 CR3: 000000018b586000 CR4: 0000000000350ee0
CoolCold commented 2 years ago

and happened for me again, on the same system:

[Fri Jul  1 23:55:04 2022] BUG: unable to handle page fault for address: ffffa5a4a8afc000
[Fri Jul  1 23:55:04 2022] #PF: supervisor read access in kernel mode
[Fri Jul  1 23:55:04 2022] #PF: error_code(0x0000) - not-present page
[Fri Jul  1 23:55:04 2022] PGD 100000067 P4D 100000067 PUD 1001d6067 PMD 133c4a067 PTE 0
[Fri Jul  1 23:55:04 2022] Oops: 0000 [#1] SMP NOPTI
[Fri Jul  1 23:55:04 2022] CPU: 2 PID: 3666056 Comm: z_wr_iss_h Tainted: P           O      5.13.0-40-generic #45~20.04.1-Ubuntu
[Fri Jul  1 23:55:04 2022] Hardware name: Hetzner /B450D4U-V1L, BIOS L1.02W 07/09/2020
[Fri Jul  1 23:55:04 2022] RIP: 0010:zio_compress_zeroed_cb+0x1c/0x30 [zfs]
[Fri Jul  1 23:55:04 2022] Code: e8 48 35 9f ff 48 8b 3b eb b1 0f 1f 40 00 0f 1f 44 00 00 55 48 01 fe 48 89 e5 48 39 f7 72 0b eb 16 48 83 c7 08 48 39 fe 76 0d <48> 83 3f 00 74 f1 b8 01 00 00 00 5d c3 31 c0 5d c3 0f 1f 00 0f 1f
[Fri Jul  1 23:55:04 2022] RSP: 0018:ffffa5a499ae7c78 EFLAGS: 00010287
[Fri Jul  1 23:55:04 2022] RAX: ffffffffc0d569f0 RBX: 0000000000010000 RCX: 0000000000010000
[Fri Jul  1 23:55:04 2022] RDX: 0000000000000000 RSI: ffffa5a4a8b0c000 RDI: ffffa5a4a8afc000
[Fri Jul  1 23:55:04 2022] RBP: ffffa5a499ae7c78 R08: 0000000000000000 R09: 0000000000000001
[Fri Jul  1 23:55:04 2022] R10: 0000000000002000 R11: 0000000000000000 R12: ffff89e485783078
[Fri Jul  1 23:55:04 2022] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Fri Jul  1 23:55:04 2022] FS:  0000000000000000(0000) GS:ffff89fb8e300000(0000) knlGS:0000000000000000
[Fri Jul  1 23:55:04 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Jul  1 23:55:04 2022] CR2: ffffa5a4a8afc000 CR3: 000000013539a000 CR4: 0000000000350ee0
[Fri Jul  1 23:55:04 2022] Call Trace:
[Fri Jul  1 23:55:04 2022]  <TASK>
[Fri Jul  1 23:55:04 2022]  abd_iterate_func+0x9c/0x100 [zfs]
[Fri Jul  1 23:55:04 2022]  ? zio_checksum_templates_free+0x80/0x80 [zfs]
[Fri Jul  1 23:55:04 2022]  zio_compress_data+0x3c/0x110 [zfs]
[Fri Jul  1 23:55:04 2022]  zio_write_compress+0x42b/0x780 [zfs]
[Fri Jul  1 23:55:04 2022]  zio_execute+0x92/0xe0 [zfs]
[Fri Jul  1 23:55:04 2022]  taskq_thread+0x23d/0x430 [spl]
[Fri Jul  1 23:55:04 2022]  ? wake_up_q+0xa0/0xa0
[Fri Jul  1 23:55:04 2022]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[Fri Jul  1 23:55:04 2022]  ? param_set_taskq_kick+0xf0/0xf0 [spl]
[Fri Jul  1 23:55:04 2022]  kthread+0x12b/0x150
[Fri Jul  1 23:55:04 2022]  ? set_kthread_struct+0x40/0x40
[Fri Jul  1 23:55:04 2022]  ret_from_fork+0x22/0x30
[Fri Jul  1 23:55:04 2022]  </TASK>
[Fri Jul  1 23:55:04 2022] Modules linked in: binfmt_misc intel_rapl_msr intel_rapl_common zfs(PO) zunicode(PO) zzstd(O) edac_mce_amd zlua(O) kvm_amd zavl(PO) icp(PO) kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel drm_vram_helper drm_ttm_helper crypto_simd ttm snd_hda_intel snd_intel_dspcfg zcommon(PO) snd_intel_sdw_acpi cryptd drm_kms_helper snd_hda_codec znvpair(PO) cec 8021q rc_core garp spl(O) rapl mrp snd_hda_core fb_sys_fops syscopyarea snd_hwdep sysfillrect snd_pcm wmi_bmof snd_timer sysimgblt k10temp snd ccp soundcore bridge stp llc mac_hid ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack sch_fq_codel nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bpfilter drm ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 igb nvme ahci xhci_pci i2c_algo_bit
[Fri Jul  1 23:55:04 2022]  gpio_amdpt crc32_pclmul i2c_piix4 nvme_core dca libahci xhci_pci_renesas wmi gpio_generic
[Fri Jul  1 23:55:04 2022] CR2: ffffa5a4a8afc000
[Fri Jul  1 23:55:04 2022] ---[ end trace ffa29da8b7693aa1 ]---
yshollander-selerity commented 2 years ago

Happened for me too.

[1983729.066697] BUG: unable to handle kernel paging request at ffffad2e3288d000
[1983729.069087] PGD 100146067 P4D 100146067 PUD 100147067 PMD 2c69ef067 PTE 0
[1983729.071511] Oops: 0000 [#1] SMP NOPTI
[1983729.072720] CPU: 14 PID: 69756 Comm: z_wr_iss_h Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-372.13.1.el8_6.x86_64 #1
[1983729.076657] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[1983729.078952] RIP: 0010:zio_compress_zeroed_cb+0x18/0x30 [zfs]
[1983729.088189] Code: 48 8b 3b eb b7 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 48 01 fe 48 39 f7 72 0b eb 15 48 83 c7 08 48 39 fe 76 0c <48> 83 3f 00 74 f1 b8 01 00 00 00 c3 31 c0 c3 66 0f 1f 84 00 00 00
[1983729.092025] RSP: 0018:ffffad2e05fbfc70 EFLAGS: 00010287
[1983729.093109] RAX: ffffffffc0ac9b40 RBX: 0000000000020000 RCX: 0000000000000000
[1983729.094605] RDX: 0000000000000000 RSI: ffffad2e328ad000 RDI: ffffad2e3288d000
[1983729.096097] RBP: ffffad2e05fbfca0 R08: 0000000000000000 R09: 0000000000000001
[1983729.097601] R10: 0000000000000001 R11: 0000000001700000 R12: 0000000000000000
[1983729.099070] R13: ffff9490e3576e88 R14: 0000000000020000 R15: 000000000000000f
[1983729.100393] FS:  0000000000000000(0000) GS:ffff94979fd80000(0000) knlGS:0000000000000000
[1983729.101887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1983729.102953] CR2: ffffad2e3288d000 CR3: 000000013bdb2000 CR4: 00000000003506e0
[1983729.104277] Call Trace:
[1983729.105712]  abd_iterate_func+0xce/0x1a0 [zfs]
[1983729.107337]  ? zio_checksum_templates_free+0x90/0x90 [zfs]
[1983729.108428]  zio_compress_data+0x3b/0x100 [zfs]
[1983729.109410]  ? mutex_lock+0xe/0x30
[1983729.110083]  zio_write_compress+0x426/0x780 [zfs]
[1983729.111168]  zio_execute+0x90/0xf0 [zfs]
[1983729.112529]  taskq_thread+0x2ed/0x530 [spl]
[1983729.113914]  ? wake_up_q+0x70/0x70
[1983729.114557]  ? zio_taskq_member.isra.12.constprop.18+0x60/0x60 [zfs]
[1983729.115889]  ? taskq_thread_spawn+0x50/0x50 [spl]
[1983729.116801]  kthread+0x10a/0x120
[1983729.117621]  ? set_kthread_struct+0x40/0x40
[1983729.118441]  ret_from_fork+0x35/0x40
CoolCold commented 1 year ago

happens quite often, I don't know exact case to repeat it, but I need to echo b > /proc/sysrq-trigger on my server once every week to every month

CoolCold commented 3 months ago

Update for others seeking solution: after upgrade to Ubuntu 22.04, things are rather stable - I probably seen issue of this sort once, but not even sure about once - server keeps running fine, under the same type of load (MySQL DB replica).

dpkg -l|fgrep zfs
ii  libzfs4linux                           2.1.5-1ubuntu6~22.04.4                  amd64        OpenZFS filesystem library for Linux - general support
ii  zfs-zed                                2.1.5-1ubuntu6~22.04.4                  amd64        OpenZFS Event Daemon
ii  zfsutils-linux                         2.1.5-1ubuntu6~22.04.4                  amd64        command-line tools to manage OpenZFS filesystems

kernel

$ uname -r
5.15.0-94-generic

zfs

$ modinfo zfs|head
filename:       /lib/modules/5.15.0-94-generic/kernel/zfs/zfs.ko
version:        2.1.5-1ubuntu6~22.04.2
license:        CDDL
author:         OpenZFS
description:    ZFS
alias:          devname:zfs
alias:          char-major-10-249
srcversion:     5A94B4662A7A991696CC35F