PANIC: zfs: accessing past end of object bf37/35f70

openzfs / zfs

OpenZFS on Linux and FreeBSD

https://openzfs.github.io/openzfs-docs

Other

10.68k stars 1.76k forks source link

PANIC: zfs: accessing past end of object bf37/35f70 #8673

Closed camaer closed 4 years ago

camaer commented 5 years ago

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	9.7
Linux Kernel	4.15.18
Architecture	x64
ZFS Version	0.7.12-1
SPL Version	0.7.12-1

Describe the problem you're observing

Splunk was installed on a dataset. After sometime it crashed the whole hypervisor with a kernel panic. Had to manually reboot the hypervisor

Describe how to reproduce the problem

Seems to be related to the read/write heavy of Splunk. Unfortunately, can't reproduce accurately. Maybe the logs will help pinpoint the exact issue so we can reproduce.

Include any warning/errors/backtraces from the system logs

[Fri Apr 26 10:23:53 2019] PANIC: zfs: accessing past end of object bf37/35f70 (size=14848 access=13823+1056)
[Fri Apr 26 10:23:53 2019] Showing stack for process 16149
[Fri Apr 26 10:23:53 2019] CPU: 21 PID: 16149 Comm: splunkd Tainted: P          IO     4.15.18-10-pve #1
[Fri Apr 26 10:23:53 2019] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS 6.4.0 07/23/2013
[Fri Apr 26 10:23:53 2019] Call Trace:
[Fri Apr 26 10:23:53 2019]  dump_stack+0x63/0x8b
[Fri Apr 26 10:23:53 2019]  spl_dumpstack+0x42/0x50 [spl]
[Fri Apr 26 10:23:53 2019]  vcmn_err+0x6a/0x100 [spl]
[Fri Apr 26 10:23:53 2019]  ? down_read+0x19/0x40
[Fri Apr 26 10:23:53 2019]  ? dmu_zfetch+0x389/0x510 [zfs]
[Fri Apr 26 10:23:53 2019]  ? arc_buf_access+0x14a/0x290 [zfs]
[Fri Apr 26 10:23:53 2019]  ? dbuf_rele_and_unlock+0x27b/0x4b0 [zfs]
[Fri Apr 26 10:23:53 2019]  ? _cond_resched+0x1a/0x50
[Fri Apr 26 10:23:53 2019]  ? mutex_lock+0x12/0x40
[Fri Apr 26 10:23:53 2019]  ? _cond_resched+0x1a/0x50
[Fri Apr 26 10:23:53 2019]  zfs_panic_recover+0x69/0x90 [zfs]
[Fri Apr 26 10:23:53 2019]  dmu_buf_hold_array_by_dnode+0x2a8/0x470 [zfs]
[Fri Apr 26 10:23:53 2019]  dmu_write_uio_dnode+0x4c/0x140 [zfs]
[Fri Apr 26 10:23:53 2019]  dmu_write_uio_dbuf+0x51/0x70 [zfs]
[Fri Apr 26 10:23:53 2019]  zfs_write+0xced/0xea0 [zfs]
[Fri Apr 26 10:23:53 2019]  ? __switch_to_asm+0x40/0x70
[Fri Apr 26 10:23:53 2019]  ? __switch_to_asm+0x34/0x70
[Fri Apr 26 10:23:53 2019]  ? __switch_to_asm+0x40/0x70
[Fri Apr 26 10:23:53 2019]  ? dput+0x34/0x1f0
[Fri Apr 26 10:23:53 2019]  ? __inode_permission+0x5b/0x160
[Fri Apr 26 10:23:53 2019]  zpl_write_common_iovec+0x8c/0xe0 [zfs]
[Fri Apr 26 10:23:53 2019]  zpl_iter_write+0xae/0xe0 [zfs]
[Fri Apr 26 10:23:53 2019]  do_iter_readv_writev+0x116/0x180
[Fri Apr 26 10:23:53 2019]  do_iter_write+0x87/0x1a0
[Fri Apr 26 10:23:53 2019]  vfs_writev+0x98/0x110
[Fri Apr 26 10:23:53 2019]  do_writev+0x61/0xf0
[Fri Apr 26 10:23:53 2019]  ? do_writev+0x61/0xf0
[Fri Apr 26 10:23:53 2019]  SyS_writev+0x10/0x20
[Fri Apr 26 10:23:53 2019]  do_syscall_64+0x73/0x130
[Fri Apr 26 10:23:53 2019]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[Fri Apr 26 10:23:53 2019] RIP: 0033:0x7f9a0f3226e7
[Fri Apr 26 10:23:53 2019] RSP: 002b:00007f9a00bfd790 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
[Fri Apr 26 10:23:53 2019] RAX: ffffffffffffffda RBX: 000000000000003d RCX: 00007f9a0f3226e7
[Fri Apr 26 10:23:53 2019] RDX: 0000000000000004 RSI: 00007f9a00bfd890 RDI: 000000000000003d
[Fri Apr 26 10:23:53 2019] RBP: 00007f9a00bfd890 R08: 0000000000000000 R09: 0000000000000004
[Fri Apr 26 10:23:53 2019] R10: 00007f9a0eec9dc0 R11: 0000000000000293 R12: 0000000000000004
[Fri Apr 26 10:23:53 2019] R13: 000000005c302d57 R14: 000000005cc314a7 R15: 00007f9a00bfdcd0

zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0h9m with 0 errors on Sun Apr 14 00:33:31 2019
config:

    NAME                              STATE     READ WRITE CKSUM
    rpool                             ONLINE       0     0     0
      mirror-0                        ONLINE       0     0     0
        wwn-0x5002538454a19820-part2  ONLINE       0     0     0
        wwn-0x5002538454a19920-part2  ONLINE       0     0     0
      mirror-1                        ONLINE       0     0     0
        wwn-0x5002538454a199b0-part2  ONLINE       0     0     0
        wwn-0x5002538454a198f0-part2  ONLINE       0     0     0
      mirror-2                        ONLINE       0     0     0
        wwn-0x5002538454a18e20-part2  ONLINE       0     0     0
        wwn-0x5002538454a19ae0-part2  ONLINE       0     0     0
      mirror-3                        ONLINE       0     0     0
        wwn-0x5002538454a18d70-part2  ONLINE       0     0     0
        wwn-0x5002538454a198c0-part2  ONLINE       0     0     0

NAME                          PROPERTY              VALUE                          SOURCE
rpool/data/subvol-142-disk-0  type                  filesystem                     -
rpool/data/subvol-142-disk-0  creation              Tue Apr 23  8:40 2019          -
rpool/data/subvol-142-disk-0  used                  5.75G                          -
rpool/data/subvol-142-disk-0  available             194G                           -
rpool/data/subvol-142-disk-0  referenced            5.60G                          -
rpool/data/subvol-142-disk-0  compressratio         2.05x                          -
rpool/data/subvol-142-disk-0  mounted               yes                            -
rpool/data/subvol-142-disk-0  quota                 none                           default
rpool/data/subvol-142-disk-0  reservation           none                           default
rpool/data/subvol-142-disk-0  recordsize            128K                           default
rpool/data/subvol-142-disk-0  mountpoint            /rpool/data/subvol-142-disk-0  default
rpool/data/subvol-142-disk-0  sharenfs              off                            default
rpool/data/subvol-142-disk-0  checksum              on                             default
rpool/data/subvol-142-disk-0  compression           on                             inherited from rpool
rpool/data/subvol-142-disk-0  atime                 off                            inherited from rpool
rpool/data/subvol-142-disk-0  devices               on                             default
rpool/data/subvol-142-disk-0  exec                  on                             default
rpool/data/subvol-142-disk-0  setuid                on                             default
rpool/data/subvol-142-disk-0  readonly              off                            default
rpool/data/subvol-142-disk-0  zoned                 off                            default
rpool/data/subvol-142-disk-0  snapdir               hidden                         default
rpool/data/subvol-142-disk-0  aclinherit            restricted                     default
rpool/data/subvol-142-disk-0  createtxg             12013509                       -
rpool/data/subvol-142-disk-0  canmount              on                             default
rpool/data/subvol-142-disk-0  xattr                 sa                             received
rpool/data/subvol-142-disk-0  copies                1                              default
rpool/data/subvol-142-disk-0  version               5                              -
rpool/data/subvol-142-disk-0  utf8only              off                            -
rpool/data/subvol-142-disk-0  normalization         none                           -
rpool/data/subvol-142-disk-0  casesensitivity       sensitive                      -
rpool/data/subvol-142-disk-0  vscan                 off                            default
rpool/data/subvol-142-disk-0  nbmand                off                            default
rpool/data/subvol-142-disk-0  sharesmb              off                            default
rpool/data/subvol-142-disk-0  refquota              200G                           received
rpool/data/subvol-142-disk-0  refreservation        none                           default
rpool/data/subvol-142-disk-0  guid                  14323607718340521704           -
rpool/data/subvol-142-disk-0  primarycache          all                            default
rpool/data/subvol-142-disk-0  secondarycache        all                            default
rpool/data/subvol-142-disk-0  usedbysnapshots       152M                           -
rpool/data/subvol-142-disk-0  usedbydataset         5.60G                          -
rpool/data/subvol-142-disk-0  usedbychildren        0B                             -
rpool/data/subvol-142-disk-0  usedbyrefreservation  0B                             -
rpool/data/subvol-142-disk-0  logbias               latency                        default
rpool/data/subvol-142-disk-0  dedup                 off                            default
rpool/data/subvol-142-disk-0  mlslabel              none                           default
rpool/data/subvol-142-disk-0  sync                  standard                       inherited from rpool
rpool/data/subvol-142-disk-0  dnodesize             legacy                         default
rpool/data/subvol-142-disk-0  refcompressratio      2.05x                          -
rpool/data/subvol-142-disk-0  written               4.80G                          -
rpool/data/subvol-142-disk-0  logicalused           11.6G                          -
rpool/data/subvol-142-disk-0  logicalreferenced     11.3G                          -
rpool/data/subvol-142-disk-0  volmode               default                        default
rpool/data/subvol-142-disk-0  filesystem_limit      none                           default
rpool/data/subvol-142-disk-0  snapshot_limit        none                           default
rpool/data/subvol-142-disk-0  filesystem_count      none                           default
rpool/data/subvol-142-disk-0  snapshot_count        none                           default
rpool/data/subvol-142-disk-0  snapdev               hidden                         default
rpool/data/subvol-142-disk-0  acltype               posixacl                       received
rpool/data/subvol-142-disk-0  context               none                           default
rpool/data/subvol-142-disk-0  fscontext             none                           default
rpool/data/subvol-142-disk-0  defcontext            none                           default
rpool/data/subvol-142-disk-0  rootcontext           none                           default
rpool/data/subvol-142-disk-0  relatime              off                            default
rpool/data/subvol-142-disk-0  redundant_metadata    all                            default
rpool/data/subvol-142-disk-0  overlay               off                            default

rinigus commented 5 years ago

I have been hit with the same issue on Gentoo: ZFS 0.7.13 and Kernel 4.19.27-gentoo-r1. Backtrace looked similar (see below) and panic occurred during heavy I/O induced by one of the processes. At this point, its been triggered twice.

Call Trace:
 dump_stack+0x46/0x5b
 spl_panic+0x14e/0x197 [spl]
 ? copy_user_handle_tail+0x14/0x40
 ? ktime_get_raw_ts64+0x39/0xc0
 ? recalibrate_cpu_khz+0x10/0x10
 ? ktime_get_raw_ts64+0x39/0xc0
 ? dmu_zfetch+0x476/0x13d0 [zfs]
 ? dbuf_rele_and_unlock+0x1a3/0x580 [zfs]
 ? dbuf_read+0x338/0x9c0 [zfs]
 zfs_panic_recover+0x64/0x80 [zfs]
 dmu_buf_rele_array+0x386/0x470 [zfs]
 dmu_write_uio_dnode+0x47/0x130 [zfs]
 dmu_write_uio_dbuf+0x47/0x70 [zfs]
 zfs_write+0x876/0xd00 [zfs]
 zle_decompress+0x11af/0x1210 [zfs]
 ? terminate_walk+0x85/0xf0
 zpl_putpage+0x592/0xaf0 [zfs]
 do_iter_readv_writev+0x123/0x190
 do_iter_write+0x7b/0x180
 vfs_writev+0x93/0x110
 ? do_writev+0x59/0xf0
 do_writev+0x59/0xf0
 do_syscall_64+0x4a/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f55d7698741
Code: 00 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 56 93 01 00 8b 54 24 1c 41 89 c0 48 8b 74 24 10 b8 14 00 00 00 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 2c 44 89 c7 48 89 44 24 08 e8 8a 93 01 00 48
RSP: 002b:00007f5567d23520 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000d28 RCX: 00007f55d7698741
RDX: 0000000000000002 RSI: 00007f5567d23570 RDI: 0000000000000004
RBP: 00007f4ece2302a0 R08: 0000000000000000 R09: 00007eb76c0008d0
R10: 00007eb76c000080 R11: 0000000000000293 R12: 0000000000000004
R13: 00007f5567d23570 R14: 0000000000000300 R15: 0000000000000a28

gmelikov commented 5 years ago

Type	Version/Name
Distribution Name	Centos
Distribution Version	7.6
Linux Kernel	3.10.0-957.12.2.el7
Architecture	x64
ZFS Version	0.7.13-1
SPL Version	0.7.13-1

It's reproducing constantly on one of our servers now, there is only a 75MB/sec async write via syslog-ng on 1x raidz2 pool

[ 6779.874583] PANIC: zfs: accessing past end of object 33/9bf7 (size=464384 access=464295+161)
[ 6779.883356] Showing stack for process 11787
[ 6779.887713] CPU: 0 PID: 11787 Comm: syslog-ng Kdump: loaded Tainted: P           OE  ------------   3.10.0-957.12.2.el7.x86_64 #1
[ 6779.906738] Call Trace:
[ 6779.909501]  [<ffffffff97b63041>] dump_stack+0x19/0x1b
[ 6779.914892]  [<ffffffffc054ff24>] spl_dumpstack+0x44/0x50 [spl]
[ 6779.920990]  [<ffffffffc05500ac>] vcmn_err+0x6c/0x110 [spl]
[ 6779.926739]  [<ffffffff97502574>] ? getrawmonotonic64+0x34/0xc0
[ 6779.932897]  [<ffffffffc0a84043>] ? dbuf_rele_and_unlock+0x283/0x4c0 [zfs]
[ 6779.939990]  [<ffffffffc0a83098>] ? dbuf_read+0x748/0x9e0 [zfs]
[ 6779.946097]  [<ffffffff97b67092>] ? mutex_lock+0x12/0x2f
[ 6779.951624]  [<ffffffffc0a81aab>] ? dbuf_find+0x1cb/0x1e0 [zfs]
[ 6779.957741]  [<ffffffff9761f051>] ? __kmalloc_node+0x1d1/0x2b0
[ 6779.963839]  [<ffffffffc054b238>] ? spl_kmem_zalloc+0xd8/0x180 [spl]
[ 6779.970443]  [<ffffffffc0ae6c89>] zfs_panic_recover+0x69/0x90 [zfs]
[ 6779.976917]  [<ffffffffc0a8e3b7>] dmu_buf_hold_array_by_dnode+0x2d7/0x4a0 [zfs]
[ 6779.984595]  [<ffffffffc0a8f905>] dmu_write_uio_dnode+0x55/0x150 [zfs]
[ 6779.991330]  [<ffffffffc0aa1135>] ? dmu_tx_assign+0x355/0x490 [zfs]
[ 6779.997827]  [<ffffffffc0a8fa54>] dmu_write_uio_dbuf+0x54/0x70 [zfs]
[ 6780.004411]  [<ffffffffc0b32683>] zfs_write+0xcb3/0xe70 [zfs]
[ 6780.010360]  [<ffffffff9767965d>] ? __getblk+0x2d/0x300
[ 6780.015845]  [<ffffffff97a89d13>] ? ip_output+0x73/0xe0
[ 6780.021296]  [<ffffffffc0b4ab55>] zpl_write_common_iovec.constprop.7+0x95/0x100 [zfs]
[ 6780.029510]  [<ffffffffc0b4acbe>] zpl_aio_write+0xfe/0x120 [zfs]
[ 6780.035702]  [<ffffffff97640deb>] do_sync_readv_writev+0x7b/0xd0
[ 6780.041899]  [<ffffffff97642a2e>] do_readv_writev+0xce/0x260
[ 6780.047786]  [<ffffffffc0b4abc0>] ? zpl_write_common_iovec.constprop.7+0x100/0x100 [zfs]
[ 6780.056236]  [<ffffffff97640c90>] ? do_sync_read+0xe0/0xe0
[ 6780.061898]  [<ffffffff97642c55>] vfs_writev+0x35/0x60
[ 6780.067299]  [<ffffffff97642e0f>] SyS_writev+0x7f/0x110
[ 6780.072725]  [<ffffffff97b75ddb>] system_call_fastpath+0x22/0x27

Several times (after reboots):

[ 174.677724] PANIC: zfs: accessing past end of object 33/9f5c (size=29696 access=28056+1781)
[ 174.677724] PANIC: zfs: accessing past end of object 33/9f5c (size=29696 access=28056+1781)
[ 174.677724] PANIC: zfs: accessing past end of object 33/9f5c (size=29696 access=28056+1781)
[ 635.727935] PANIC: zfs: accessing past end of object 33/a044 (size=37376 access=28135+9315)
[163980.432350] PANIC: zfs: accessing past end of object 33/9dd9 (size=18432 access=18157+292)
[164221.276190] PANIC: zfs: accessing past end of object 33/9c7d (size=69120 access=69002+167)

Disks are connected via 02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) Subsystem: LSI Logic / Symbios Logic SAS9300-8i

snajpa commented 5 years ago

For some reason zpl_aio_write path to dmu_buf_hold_array_by_dnode can lead to a bigger write than the dnode is able to hold, if I understand things correctly from all the crashes we're getting with recent~ish master HEAD and RHEL6 kernel.

The stack dump looks always about the same, it comes from zpl_aio_write by writev syscall.

Jul 13 06:59:09 node8 kernel: [1221170.633559] PANIC: zfs: accessing past end of object 6d5/697b5a (size=35328 access=35231+172)
Jul 13 06:59:09 node8 kernel: [1221170.633825] Showing stack for process 895956
Jul 13 06:59:09 node8 kernel: [1221170.633961] Pid: 895956, comm: rspamd veid: 2981 Tainted: P           -- ------------    2.6.32-042stab138.42 #1
Jul 13 06:59:09 node8 kernel: [1221170.634225] Call Trace:
Jul 13 06:59:09 node8 kernel: [1221170.634410]  [<ffffffffa02afaab>] ? spl_dumpstack+0x2b/0x30 [spl]
Jul 13 06:59:09 node8 kernel: [1221170.634554]  [<ffffffffa02afb3d>] ? vcmn_err+0x8d/0xf0 [spl]
Jul 13 06:59:09 node8 kernel: [1221170.634749]  [<ffffffffa03b1cc0>] ? dbuf_rele_and_unlock+0x270/0x550 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.634950]  [<ffffffff8156624e>] ? mutex_lock+0x1e/0x50
Jul 13 06:59:09 node8 kernel: [1221170.635113]  [<ffffffffa03b0f93>] ? dbuf_find+0x183/0x190 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.635255]  [<ffffffff8156624e>] ? mutex_lock+0x1e/0x50
Jul 13 06:59:09 node8 kernel: [1221170.635409]  [<ffffffff811b1630>] ? kmem_cache_alloc_node_trace+0x1f0/0x220
Jul 13 06:59:09 node8 kernel: [1221170.635605]  [<ffffffffa042e0c2>] ? zfs_panic_recover+0x52/0x60 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.635746]  [<ffffffffa02b068e>] ? spl_kmem_zalloc+0x8e/0x150 [spl]
Jul 13 06:59:09 node8 kernel: [1221170.635904]  [<ffffffffa03bde07>] ? dmu_buf_hold_array_by_dnode+0x1b7/0x4c0 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636183]  [<ffffffffa03be156>] ? dmu_write_uio_dnode+0x46/0x140 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636352]  [<ffffffffa03be325>] ? dmu_write_uio_dbuf+0x55/0x70 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636533]  [<ffffffffa0495f6b>] ? zfs_write+0xe0b/0xf70 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636702]  [<ffffffffa03bd6c0>] ? dmu_buf_rele_array+0x70/0x90 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636879]  [<ffffffffa04b3f69>] ? zpl_readpage+0x79/0xa0 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.637051]  [<ffffffffa04b4223>] ? zpl_write_common_iovec.clone.0+0xa3/0xe0 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.637333]  [<ffffffffa04b43aa>] ? zpl_aio_write+0x10a/0x120 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.637511]  [<ffffffffa04b42a0>] ? zpl_aio_write+0x0/0x120 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.637643]  [<ffffffff811c80ad>] ? do_sync_readv_writev+0xfd/0x140
Jul 13 06:59:09 node8 kernel: [1221170.637769]  [<ffffffff81183e5f>] ? handle_pte_fault+0x9f/0x1250
Jul 13 06:59:09 node8 kernel: [1221170.637894]  [<ffffffff810b5420>] ? autoremove_wake_function+0x0/0x40
Jul 13 06:59:09 node8 kernel: [1221170.638020]  [<ffffffff81570f4e>] ? common_interrupt+0xe/0x13
Jul 13 06:59:09 node8 kernel: [1221170.638146]  [<ffffffff8118528b>] ? handle_mm_fault+0x27b/0x360
Jul 13 06:59:09 node8 kernel: [1221170.638271]  [<ffffffff811c7ed3>] ? rw_copy_check_uvector+0xa3/0x130
Jul 13 06:59:09 node8 kernel: [1221170.638406]  [<ffffffff81188b56>] ? find_vma+0x46/0x80
Jul 13 06:59:09 node8 kernel: [1221170.638540]  [<ffffffff811c9136>] ? do_readv_writev+0xd6/0x1f0
Jul 13 06:59:09 node8 kernel: [1221170.638668]  [<ffffffff81484f7e>] ? sys_recvfrom+0xee/0x180
Jul 13 06:59:09 node8 kernel: [1221170.638797]  [<ffffffff81013aa9>] ? read_tsc+0x9/0x20
Jul 13 06:59:09 node8 kernel: [1221170.638926]  [<ffffffff811c9296>] ? vfs_writev+0x46/0x60
Jul 13 06:59:09 node8 kernel: [1221170.639057]  [<ffffffff811c93c1>] ? sys_writev+0x51/0xd0
Jul 13 06:59:09 node8 kernel: [1221170.639193]  [<ffffffff81570427>] ? system_call_fastpath+0x35/0x3a

snajpa commented 5 years ago

For now, I'm trying to serialize writes to the same file on the top level of the zfs stack with i_mutex; some comments in the code say it's not needed for ZFS, but I've looked in BTRFS in our RHEL6/OpenVZ kernel and it does the same.

commit: https://github.com/vpsfreecz/zfs/commit/30891525e0b666732ea913620528bb8ee83ebee0

Edit: this didn't help... maybe I need to wrap the mutex around more actions...

mclewis-4 commented 5 years ago

Seeing this issue as well. 0.8.1 on Centos 7.6. This bug seems specific to ZFSoL, as we have not seen it on our similarly used FreeBSD ZFS fileservers.

Sep 14 20:13:28 envy kernel: PANIC: zfs: accessing past end of object 18c/4f794d
 (size=81920 access=71336+16384)
Sep 14 20:13:28 envy kernel: Showing stack for process 18295
Sep 14 20:13:28 envy kernel: CPU: 5 PID: 18295 Comm: dafileserver Kdump: loaded 
Tainted: P           OE  ------------   3.10.0-957.21.3.el7.x86_64 #1
Sep 14 20:13:28 envy kernel: Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH
-7TF/7F/iTF/iF, BIOS 3.0  07/26/2013
Sep 14 20:13:28 envy kernel: Call Trace:
Sep 14 20:13:28 envy kernel: [<ffffffff82763107>] dump_stack+0x19/0x1b
Sep 14 20:13:28 envy kernel: [<ffffffffc058b9db>] spl_dumpstack+0x2b/0x30 [spl]
Sep 14 20:13:28 envy kernel: [<ffffffffc058bb5c>] vcmn_err+0x6c/0x110 [spl]
Sep 14 20:13:28 envy kernel: [<ffffffff82767152>] ? mutex_lock+0x12/0x2f
Sep 14 20:13:28 envy kernel: [<ffffffffc10b256c>] ? dmu_zfetch+0x49c/0x590 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc108f863>] ? dbuf_rele_and_unlock+0x283/0
x5c0 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc058c52f>] ? spl_kmem_alloc+0xdf/0x140 [s
pl]
Sep 14 20:13:28 envy kernel: [<ffffffff82767152>] ? mutex_lock+0x12/0x2f
Sep 14 20:13:28 envy kernel: [<ffffffffc108c5f3>] ? dbuf_find+0x1e3/0x200 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffff82767152>] ? mutex_lock+0x12/0x2f
Sep 14 20:13:28 envy kernel: [<ffffffff8221f0d1>] ? __kmalloc_node+0x1d1/0x2b0
Sep 14 20:13:28 envy kernel: [<ffffffffc110c3a9>] zfs_panic_recover+0x69/0x90 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc109b2a7>] dmu_buf_hold_array_by_dnode+0x2d7/0x4a0 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc10c960a>] ? dsl_dir_tempreserve_space+0x1fa/0x4a0 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc109cc45>] dmu_write_uio_dnode+0x55/0x150 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc10b19ad>] ? dmu_tx_assign+0x20d/0x490 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc109cd94>] dmu_write_uio_dbuf+0x54/0x70 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc116fe6c>] zfs_write+0xd3c/0xed0 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffff826b3a77>] ? udp_sendmsg+0x417/0xa10
Sep 14 20:13:28 envy kernel: [<ffffffff820dccbe>] ? account_entity_dequeue+0xae/0xd0
Sep 14 20:13:28 envy kernel: [<ffffffff8202a621>] ? __switch_to+0x151/0x580
Sep 14 20:13:28 envy kernel: [<ffffffffc118f74e>] zpl_write_common_iovec.constprop.8+0x9e/0x100 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc118f8b4>] zpl_aio_write+0x104/0x120 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffff82240e6b>] do_sync_readv_writev+0x7b/0xd0
Sep 14 20:13:28 envy kernel: [<ffffffff82242aae>] do_readv_writev+0xce/0x260
Sep 14 20:13:28 envy kernel: [<ffffffffc118f7b0>] ? zpl_write_common_iovec.constprop.8+0x100/0x100 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffff82240d10>] ? do_sync_read+0xe0/0xe0
Sep 14 20:13:28 envy kernel: [<ffffffff8210fbfa>] ? do_futex+0x12a/0x5a0
Sep 14 20:13:28 envy kernel: [<ffffffff82242cd5>] vfs_writev+0x35/0x60
Sep 14 20:13:28 envy kernel: [<ffffffff822430d2>] SyS_pwritev+0xc2/0xf0
Sep 14 20:13:28 envy kernel: [<ffffffff82775ddb>] system_call_fastpath+0x22/0x27

snajpa commented 5 years ago

We're experiencing this one way too often. Anything I did didn't help me to get to the root cause of this.

The only thing I know for sure is that it does this when reclaim kicks in - which does so because of the OpenVZ patchset even in OOM event inside the vz container - it invokes all the possible shrinkers registered anyway with little regard for what is whose...

[1371878.962126]  [<ffffffffa02e1b1b>] ? spl_dumpstack+0x2b/0x30 [spl]
[1371878.962139]  [<ffffffffa02e1bad>] ? vcmn_err+0x8d/0xf0 [spl]
[1371878.962149]  [<ffffffff81055f04>] ? __do_page_fault+0x1f4/0x4d0
[1371878.962156]  [<ffffffff8156636e>] ? mutex_lock+0x1e/0x50
[1371878.962217]  [<ffffffffa03e2e93>] ? dbuf_find+0x193/0x1a0 [zfs]
[1371878.962224]  [<ffffffff811b1770>] ? kmem_cache_alloc_node_trace+0x1f0/0x220
[1371878.962299]  [<ffffffffa046cbf2>] ? zfs_panic_recover+0x52/0x60 [zfs]
[1371878.962363]  [<ffffffffa03f0ad7>] ? dmu_buf_hold_array_by_dnode+0x1b7/0x4d0 [zfs]
[1371878.962412]  [<ffffffffa03f0e36>] ? dmu_write_uio_dnode+0x46/0x140 [zfs]
[1371878.962457]  [<ffffffffa03f1005>] ? dmu_write_uio_dbuf+0x55/0x70 [zfs]
[1371878.962526]  [<ffffffffa04d643b>] ? zfs_write+0xe4b/0xfb0 [zfs]
[1371878.962590]  [<ffffffffa044de7c>] ? rrm_exit+0x4c/0xa0 [zfs]
[1371878.962600]  [<ffffffff81166882>] ? __do_page_cache_readahead+0x1a2/0x230
[1371878.962666]  [<ffffffffa04f45c3>] ? zpl_write_common_iovec.clone.0+0xa3/0xe0 [zfs]
[1371878.962725]  [<ffffffffa04f474a>] ? zpl_aio_write+0x10a/0x120 [zfs]
[1371878.962785]  [<ffffffffa04f4640>] ? zpl_aio_write+0x0/0x120 [zfs]
[1371878.962794]  [<ffffffff811c81ed>] ? do_sync_readv_writev+0xfd/0x140
[1371878.962801]  [<ffffffff81183eff>] ? handle_pte_fault+0x9f/0x1250
[1371878.962809]  [<ffffffff810b5460>] ? autoremove_wake_function+0x0/0x40
[1371878.962816]  [<ffffffff8118532b>] ? handle_mm_fault+0x27b/0x360
[1371878.962822]  [<ffffffff811c8013>] ? rw_copy_check_uvector+0xa3/0x130
[1371878.962830]  [<ffffffff811c9276>] ? do_readv_writev+0xd6/0x1f0
[1371878.962836]  [<ffffffff811c93d6>] ? vfs_writev+0x46/0x60
[1371878.962842]  [<ffffffff811c9501>] ? sys_writev+0x51/0xd0
[1371878.962849]  [<ffffffff81570437>] ? system_call_fastpath+0x35/0x3a

@behlendorf can someone more senior please help me debug this?

We've been loosing members pretty fast because of this one, in a few weeks tens of people gone...

snajpa commented 5 years ago

[500397.655112] PANIC: zfs: accessing past end of object d99/9987b (size=6656 access=6623+131)
[500397.655519] Showing stack for process 204542
[500397.655525] Pid: 204542, comm: rspamd veid: 8680 Tainted: P           -- ------------    2.6.32-042stab139.44 #1
[500397.655534] Call Trace:
[500397.655581]  [<ffffffffa02e9abb>] ? spl_dumpstack+0x2b/0x30 [spl]
[500397.655596]  [<ffffffffa02e9b4d>] ? vcmn_err+0x8d/0xf0 [spl]
[500397.655608]  [<ffffffff81055f04>] ? __do_page_fault+0x1f4/0x4d0
[500397.655618]  [<ffffffff8156636e>] ? mutex_lock+0x1e/0x50
[500397.655678]  [<ffffffffa03eaf83>] ? dbuf_find+0x183/0x190 [zfs]
[500397.655686]  [<ffffffff811b1770>] ? kmem_cache_alloc_node_trace+0x1f0/0x220
[500397.655751]  [<ffffffffa0468162>] ? zfs_panic_recover+0x52/0x60 [zfs]
[500397.655796]  [<ffffffffa03f7e17>] ? dmu_buf_hold_array_by_dnode+0x1b7/0x4c0 [zfs]
[500397.655873]  [<ffffffffa03f8166>] ? dmu_write_uio_dnode+0x46/0x140 [zfs]
[500397.655927]  [<ffffffffa03f8335>] ? dmu_write_uio_dbuf+0x55/0x70 [zfs]
[500397.655998]  [<ffffffffa04d013b>] ? zfs_write+0xe4b/0xfb0 [zfs]
[500397.656054]  [<ffffffffa03f76d0>] ? dmu_buf_rele_array+0x70/0x90 [zfs]
[500397.656076]  [<ffffffff8114e5d7>] ? unlock_page+0x27/0x30
[500397.656146]  [<ffffffffa04ee1c9>] ? zpl_readpage+0x79/0xa0 [zfs]
[500397.656220]  [<ffffffffa04ee483>] ? zpl_write_common_iovec.clone.0+0xa3/0xe0 [zfs]
[500397.656289]  [<ffffffffa04ee60a>] ? zpl_aio_write+0x10a/0x120 [zfs]
[500397.656361]  [<ffffffffa04ee500>] ? zpl_aio_write+0x0/0x120 [zfs]
[500397.656381]  [<ffffffff811c81ed>] ? do_sync_readv_writev+0xfd/0x140
[500397.656403]  [<ffffffff81183eff>] ? handle_pte_fault+0x9f/0x1250
[500397.656423]  [<ffffffff810b5460>] ? autoremove_wake_function+0x0/0x40
[500397.656440]  [<ffffffff8118532b>] ? handle_mm_fault+0x27b/0x360
[500397.656461]  [<ffffffff811c8013>] ? rw_copy_check_uvector+0xa3/0x130
[500397.656479]  [<ffffffff811c9276>] ? do_readv_writev+0xd6/0x1f0
[500397.656495]  [<ffffffff811c93d6>] ? vfs_writev+0x46/0x60
[500397.656507]  [<ffffffff811c9501>] ? sys_writev+0x51/0xd0
[500397.656520]  [<ffffffff81570437>] ? system_call_fastpath+0x35/0x3a

snajpa commented 5 years ago

Heck, after all, @martellini "trying to use it in production" would be a good way how to describe the 5-year-long journey I've had with ZoL so far ;)

snajpa commented 5 years ago

That is to say, I don't think there's any better alternative than ZFS for the use-case we have for it. I'd like to chase this one little ****er down rather than argue about who said what about what...

snajpa commented 5 years ago

Anyway, I'm now trying to roll back to 0.7.9, had to back-port project quotas, although we've never used them, just enabling them changed format on-disk, contrary to all other features.

https://github.com/vpsfreecz/zfs/commits/vz-0.7.9

[root@node1.pgnd.vpsfree.cz]
 ~ # zpool get all
NAME  PROPERTY                                 VALUE                                    SOURCE
vz    size                                     5.44T                                    -
vz    capacity                                 35%                                      -
vz    altroot                                  -                                        default
vz    health                                   ONLINE                                   -
vz    guid                                     6706140627667879515                      -
vz    version                                  -                                        default
vz    bootfs                                   -                                        default
vz    delegation                               on                                       default
vz    autoreplace                              off                                      default
vz    cachefile                                -                                        default
vz    failmode                                 wait                                     default
vz    listsnapshots                            off                                      default
vz    autoexpand                               off                                      default
vz    dedupditto                               0                                        default
vz    dedupratio                               1.00x                                    -
vz    free                                     3.48T                                    -
vz    allocated                                1.96T                                    -
vz    readonly                                 off                                      -
vz    ashift                                   0                                        default
vz    comment                                  -                                        default
vz    expandsize                               -                                        -
vz    freeing                                  247G                                     -
vz    fragmentation                            24%                                      -
vz    leaked                                   0                                        -
vz    multihost                                off                                      default
vz    feature@async_destroy                    enabled                                  local
vz    feature@empty_bpobj                      active                                   local
vz    feature@lz4_compress                     active                                   local
vz    feature@multi_vdev_crash_dump            enabled                                  local
vz    feature@spacemap_histogram               active                                   local
vz    feature@enabled_txg                      active                                   local
vz    feature@hole_birth                       active                                   local
vz    feature@extensible_dataset               active                                   local
vz    feature@embedded_data                    active                                   local
vz    feature@bookmarks                        enabled                                  local
vz    feature@filesystem_limits                enabled                                  local
vz    feature@large_blocks                     enabled                                  local
vz    feature@large_dnode                      enabled                                  local
vz    feature@sha512                           enabled                                  local
vz    feature@skein                            enabled                                  local
vz    feature@edonr                            enabled                                  local
vz    feature@userobj_accounting               active                                   local
vz    feature@project_quota                    active                                   local
vz    unsupported@com.datto:encryption         inactive                                 local
vz    unsupported@com.datto:bookmark_v2        inactive                                 local
vz    unsupported@com.delphix:device_removal   inactive                                 local
vz    unsupported@com.datto:resilver_defer     inactive                                 local
vz    unsupported@com.delphix:obsolete_counts  inactive                                 local
[root@node1.pgnd.vpsfree.cz]
 ~ # modinfo zfs | head
filename:       /lib/modules/2.6.32-042stab139.44/extra/zfs/zfs/zfs.ko
version:        0.7.9-44
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
retpoline:      Y
srcversion:     533BB7E5866E52F63B9ACCB
depends:        spl,znvpair,zcommon,zunicode,zavl,icp
vermagic:       2.6.32-042stab139.44 SMP mod_unload modversions 
parm:           zvol_inhibit_dev:Do not create zvol device nodes (uint)

dimatter commented 5 years ago

Is triggered somewhat consistently on my RT kernel during heavy IO and a non-io process being 'reniced' to -20

atjegan commented 5 years ago

hit by the same issue, running zfs 0.7.13 on CentOS 7.6 kernel 3.10.0-957.27.2.el7 Kernel panic during heavy IO. ZFS_Hits

kernel:PANIC: zfs: accessing past end of object 1303/4ecce (size=27136 access=20000+26958) Any workaround for this? roll back to older version help?

snajpa commented 5 years ago

@atjegan 0.7.9 seems to be without this bug, running without a crash so far

atjegan commented 5 years ago

@snajpa thanks for the update. Will try the version 0.7.9.

kazunoriohki commented 5 years ago

I've encounterd the same error on several different servers. but only with the version 0.7.13 and later.(not with 0.7.12)

System Information

Type
Distribution Name	CentOS
Distribution Version	7.7.1908
Linux Kernel	3.10.0-1062.1.2.el7.x86_64
ZFS version	0.8.2-1

Parameters I've changed from default
- recordsize : 1MB
- acltype : posixacl
- xattr = sa
- compress = on(lz4)

Log

Oct 21 17:01:25 a975tbd kernel: PANIC: zfs: accessing past end of object 11b/a8a00 (size=664064 access=663564+8192)
Oct 23 11:49:57 a975tbd kernel: PANIC: zfs: accessing past end of object 11b/ad709 (size=270336 access=270328+8192)
Oct 23 14:37:56 a975tbd kernel: PANIC: zfs: accessing past end of object 11b/aabd3 (size=237568 access=237560+8192)
Oct 23 15:21:40 a975tbd kernel: PANIC: zfs: accessing past end of object 11b/a8649 (size=188928 access=188420+8192)
Oct 24 15:12:56 a975tbd kernel: PANIC: zfs: accessing past end of object 10e/aadd5 (size=117760 access=117223+8614)
Oct 25 08:55:00 a975tbd kernel: PANIC: zfs: accessing past end of object 10e/a9e6e (size=533504 access=533089+8308)
Oct 27 09:31:06 a975tbd kernel: PANIC: zfs: accessing past end of object 10e/a96bf (size=655360 access=654575+8274)
Nov  3 09:21:57 a975tbd kernel: PANIC: zfs: accessing past end of object 10e/af16c (size=258048 access=257426+8386)

How to reproduce in our site

Application : MegaHit, it's genome assembler
Data : 2 fastq format files which can be download from here for example.

Command : like this

$ while :; do megahit -1 DRR000001_1.fastq -2 DRR000001_2.fastq -o outdir_${DATE} ; done

Frequency

repeating the command , the error will occur in a few hours(min) to a few days

Tendency

We've experienced/tested on various hardware/OS/zfs versions and the tendency is like this.

ZFS Version
- 0.7.12(CentOS7) : I've never experienced this error
- 0.7.13(CentOS7) : Error occurs
- 0.8.2(CentOS7) : Error occurs
- 0.8.2(CentOS8) : For now, it's OK(no error has happened)
Hardware RAID or not

We've tested both on hardware RAID and standalone SATA/NVMe drive.(haven't tested with RAID Z)
- Hardware RAID(Avago's 9361-8i) : Error occurs
- Standalone SATA/NVMe Drive : ErrorNever happened
ZFS Setting
- compression : "compress=on(lz4)"" is easier to reproduce error than "compress=no"
- recordsize : "1MB" is easier to reproduce error than '128KB'

PrivatePuffin commented 5 years ago

@behlendorf Considering your statements on #9034 this one is definately critical and seems to be (semi-)repeatable

behlendorf commented 5 years ago

@Ornias1993 agreed. Based on the very helpful https://github.com/zfsonlinux/zfs/issues/8673#issuecomment-550326225 above I reviewed all of the changes from 0.7.12 to 0.7.13. The following commit, which addresses a rare deadlock condition, is the most likely cause.

98bb45e27ae deadlock between mm_sem and tx assign in zfs_write() and page fault

@kazunoriohki @snajpa if it wouldn't be too much trouble can you try reverting this commit from 0.7.13 and try to reproduce the issue. I'll see if I can reproduce the issue as well using the test case described above.

kazunoriohki commented 5 years ago

@behlendorf Thanks, I'll try with https://github.com/zfsonlinux/zfs/commit/98bb45e27ae80145a6ce028df90fccdb23f8901d. Maybe it will take a week or two, then I'll report here.

snajpa commented 5 years ago

Same here :)

snajpa commented 5 years ago

Oh it's this patch. @behlendorf, I can't be sure because I didn't take notes at the time (and I should have), but I think I've ruled out that patch, because it only appeared in 0.7.13 release, but the original issue mentions 0.7.12.

kazunoriohki commented 5 years ago

@behlendorf reverting 98bb45e resolved the issue at our environment!

with 0.7.13(native), the error has been reproduced in 13times, 92times, 144times (1 job takes about 5min, and I've repeated 3 times for making sure), while the error never happend over 750times with reverting 98bb45e commit.

tkazmierczak-tt commented 4 years ago

We are experiencing this bug in production also. It appears in 0.8.2, 0.7.13, 0.7.12, 0.7.11. I'm pretty sure it didn't appear in 0.7.9, but now I have problems when downgrading to 0.7.9, probably should downgrade more dependencies. I'll try to migrate the workload to a server were we still use 0.7.9.

The workload is tile rendering using Mapnik from PostgreSQL database with Postgis and some Shapefiles.

fsvm88 commented 4 years ago

Was toying with my gentoo-musl chroot lately, and managed to reproduce this 100% of the time, 4 times in a row: within the chroot, compile gentoo-sources kernel 5.4.12 with -j16. Compiling from the glibc host system doesn't seem to trigger the issue (compiler is gcc both for musl and glibc systems).

The first time it happened, I was emerging some packages with -j16 in MAKEOPTS, -j3 to emerge (machine with 8 cores/16 threads). The other 3 times, I got something similar to this (sorry for the picture, I couldn't write to the filesystem anymore and "sync" would hang): https://pasteboard.co/IQqr7sH.jpg

First time when compiling kernel happened after ~6h uptime, 2nd and 3rd time were from cold boot (pressed reset button, remounted chroot, restarted compiling).

The system was usable, except for writing to disk. The only out-of-tree module is ZFS.

I will try to revert the patch mentioned above and post results.

fsvm88 commented 4 years ago

I did some additional digging (sorry, I almost forgot about this and then I tried to toy with the musl chroot again last evening...). I am not sure why the musl chroot is able to trigger this 100% of the time, but the host glibc system does not, consistently (currently testing a GCC compile that is 1h through, still working).

After analyzing the code (git tag zfs-0.8.3) and the patch, I noticed that when dmu_write_uio_dbuf() is called from zfs_write (L#827) the operation can result in EFAULT in zfs_uio.c/uiomove_iov (L#92,98), returning early without touching uio->uio_loffset, uio->uio_resid, which in the end are used for issuing the next dmu_write_uio_dbuf() call in the zfs_write while() loop for the pending bytes. In this I am assuming that the first call to dmu_write_uio_dbuf() in the while() loop returns an EFAULT, uio_prefaultpages is called returning 0, then the "continue" statement is triggered at L#835, and it restarts with an unsynchronized uio struct. Even though uio_prefaultpages is passed the uio struct, it doesn't touch uio_loffset/uio_resid either that I can see/understand. I am basing this on the fact that uio_loffset/uio_resid are used for calculating offset and size for the next issue, but don't seem to be modified when passed down from zfs_write/dmu_write_uio_dbuf to uiomove_iov().

I am guessing two fixes should be possible, and both could be implemented:

uiomove_iov should update uio_loffset, uio_resid and uio_skip (L117-119) also on errors (return (EFAULT)), thanks to copy_from_user/__copy_from_user_inatomic returning bytes left to copy in case not everything could be copied
uio_prefaultpages should do what point 1) does, ensuring that counters are updated properly when re-reading pages

fsvm88 commented 4 years ago

Tentative (not very elegant) patch: https://pastebin.com/i5xnxii5 The last if (bytes_left > 0) block is copy-pasted from the while loop's end and adapted.

Seems to be working properly:

Host system seems unaffected
Musl chroot is running an emerge of GCC with 32 jobs, not failing yet after 40m

[edit]: spoke too soon, it took much longer for it to happen, but it happened again:

[ 4504.376866] PANIC: zfs: accessing past end of object 613/a4f9c (size=17920 access=17837+1025) [ 4504.376876] Showing stack for process 9206 [ 4504.376884] CPU: 3 PID: 9206 Comm: cc1plus Tainted: P O 5.5.10-gentoo-x86_64 #1 [ 4504.376889] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F12e 03/06/2020 [ 4504.376895] Call Trace: [ 4504.376907] dump_stack+0x71/0x98 [ 4504.376919] spl_panic+0x176/0x21e [spl] [ 4504.376957] zfs_panic_recover+0x79/0xa0 [zfs] [ 4504.376981] dmu_buf_rele_array+0x45b/0x4d0 [zfs] [ 4504.376999] ? dsl_dir_tempreserve_space+0x222/0x4e0 [zfs] [ 4504.377017] dmu_write_uio_dnode+0x66/0x150 [zfs] [ 4504.377036] dmu_write_uio_dbuf+0x4a/0x70 [zfs] [ 4504.377055] zfs_write+0x9b6/0xe70 [zfs] [ 4504.377101] zle_decompress+0x1365/0x13c0 [zfs] [ 4504.377124] zpl_putpage+0x1a3/0x1130 [zfs] [ 4504.377139] do_iter_readv_writev+0x19e/0x250 [ 4504.377153] do_iter_write+0x80/0x190 [ 4504.377161] vfs_writev+0xf7/0x140 [ 4504.377178] ? __fget_light+0x6c/0xa0 [ 4504.377187] do_writev+0x78/0x120 [ 4504.377198] do_syscall_64+0xd2/0x480 [ 4504.377207] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 4504.377215] RIP: 0033:0x7fa2f5b9fceb [ 4504.377222] Code: 41 ff cc 48 29 d0 49 83 c6 10 49 8b 56 08 49 63 ec 49 01 06 48 29 c2 49 89 56 08 4c 89 e8 4c 89 f6 48 89 ea 48 63 7b 78 0f 05 <48> 89 c7 e8 5d be fb ff 49 39 c7 75 b8 48 8b 43 58 48 8b 4b 60 48 [ 4504.377228] RSP: 002b:00007fff9aaaa810 EFLAGS: 00000212 ORIG_RAX: 0000000000000014 [ 4504.377236] RAX: ffffffffffffffda RBX: 0000000005ab89c0 RCX: 00007fa2f5b9fceb [ 4504.377241] RDX: 0000000000000002 RSI: 00007fff9aaaa820 RDI: 0000000000000001 [ 4504.377246] RBP: 0000000000000002 R08: 00000000056d6cc0 R09: 0000000000000000 [ 4504.377251] R10: 0000000000000070 R11: 0000000000000212 R12: 0000000000000002 [ 4504.377256] R13: 0000000000000014 R14: 00007fff9aaaa820 R15: 0000000000000401

fsvm88 commented 4 years ago

Did some more analysis and test patching, 2nd final patch (to be used with the first one above): https://pastebin.com/L1VStMDx

There were two errors fixed by the 2nd patch in the zfs_write while(n > 0) loop.

n > max_blksz: we countinue to the next loop without updating n (uio_prefaultpages does not fail), and then possibly attempt to write garbage in the last iteration (saw GCC compiling break after comment above due to garbage in one of the object files)
n < max_blksz: we try to uio_prefaultpages with n bigger than the last byte of data, most likely EFAULTing due to the access_ok() check in uio_prefaultpages, and thus breaking the loop early

I have left GCC compiling run in a loop overnight: 11 successful merges (and counting) with no sign of breakage.