Closed camaer closed 4 years ago
I have been hit with the same issue on Gentoo: ZFS 0.7.13 and Kernel 4.19.27-gentoo-r1. Backtrace looked similar (see below) and panic occurred during heavy I/O induced by one of the processes. At this point, its been triggered twice.
Call Trace:
dump_stack+0x46/0x5b
spl_panic+0x14e/0x197 [spl]
? copy_user_handle_tail+0x14/0x40
? ktime_get_raw_ts64+0x39/0xc0
? recalibrate_cpu_khz+0x10/0x10
? ktime_get_raw_ts64+0x39/0xc0
? dmu_zfetch+0x476/0x13d0 [zfs]
? dbuf_rele_and_unlock+0x1a3/0x580 [zfs]
? dbuf_read+0x338/0x9c0 [zfs]
zfs_panic_recover+0x64/0x80 [zfs]
dmu_buf_rele_array+0x386/0x470 [zfs]
dmu_write_uio_dnode+0x47/0x130 [zfs]
dmu_write_uio_dbuf+0x47/0x70 [zfs]
zfs_write+0x876/0xd00 [zfs]
zle_decompress+0x11af/0x1210 [zfs]
? terminate_walk+0x85/0xf0
zpl_putpage+0x592/0xaf0 [zfs]
do_iter_readv_writev+0x123/0x190
do_iter_write+0x7b/0x180
vfs_writev+0x93/0x110
? do_writev+0x59/0xf0
do_writev+0x59/0xf0
do_syscall_64+0x4a/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f55d7698741
Code: 00 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 56 93 01 00 8b 54 24 1c 41 89 c0 48 8b 74 24 10 b8 14 00 00 00 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 2c 44 89 c7 48 89 44 24 08 e8 8a 93 01 00 48
RSP: 002b:00007f5567d23520 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000d28 RCX: 00007f55d7698741
RDX: 0000000000000002 RSI: 00007f5567d23570 RDI: 0000000000000004
RBP: 00007f4ece2302a0 R08: 0000000000000000 R09: 00007eb76c0008d0
R10: 00007eb76c000080 R11: 0000000000000293 R12: 0000000000000004
R13: 00007f5567d23570 R14: 0000000000000300 R15: 0000000000000a28
Type | Version/Name |
---|---|
Distribution Name | Centos |
Distribution Version | 7.6 |
Linux Kernel | 3.10.0-957.12.2.el7 |
Architecture | x64 |
ZFS Version | 0.7.13-1 |
SPL Version | 0.7.13-1 |
It's reproducing constantly on one of our servers now, there is only a 75MB/sec async write via syslog-ng
on 1x raidz2 pool
[ 6779.874583] PANIC: zfs: accessing past end of object 33/9bf7 (size=464384 access=464295+161)
[ 6779.883356] Showing stack for process 11787
[ 6779.887713] CPU: 0 PID: 11787 Comm: syslog-ng Kdump: loaded Tainted: P OE ------------ 3.10.0-957.12.2.el7.x86_64 #1
[ 6779.906738] Call Trace:
[ 6779.909501] [<ffffffff97b63041>] dump_stack+0x19/0x1b
[ 6779.914892] [<ffffffffc054ff24>] spl_dumpstack+0x44/0x50 [spl]
[ 6779.920990] [<ffffffffc05500ac>] vcmn_err+0x6c/0x110 [spl]
[ 6779.926739] [<ffffffff97502574>] ? getrawmonotonic64+0x34/0xc0
[ 6779.932897] [<ffffffffc0a84043>] ? dbuf_rele_and_unlock+0x283/0x4c0 [zfs]
[ 6779.939990] [<ffffffffc0a83098>] ? dbuf_read+0x748/0x9e0 [zfs]
[ 6779.946097] [<ffffffff97b67092>] ? mutex_lock+0x12/0x2f
[ 6779.951624] [<ffffffffc0a81aab>] ? dbuf_find+0x1cb/0x1e0 [zfs]
[ 6779.957741] [<ffffffff9761f051>] ? __kmalloc_node+0x1d1/0x2b0
[ 6779.963839] [<ffffffffc054b238>] ? spl_kmem_zalloc+0xd8/0x180 [spl]
[ 6779.970443] [<ffffffffc0ae6c89>] zfs_panic_recover+0x69/0x90 [zfs]
[ 6779.976917] [<ffffffffc0a8e3b7>] dmu_buf_hold_array_by_dnode+0x2d7/0x4a0 [zfs]
[ 6779.984595] [<ffffffffc0a8f905>] dmu_write_uio_dnode+0x55/0x150 [zfs]
[ 6779.991330] [<ffffffffc0aa1135>] ? dmu_tx_assign+0x355/0x490 [zfs]
[ 6779.997827] [<ffffffffc0a8fa54>] dmu_write_uio_dbuf+0x54/0x70 [zfs]
[ 6780.004411] [<ffffffffc0b32683>] zfs_write+0xcb3/0xe70 [zfs]
[ 6780.010360] [<ffffffff9767965d>] ? __getblk+0x2d/0x300
[ 6780.015845] [<ffffffff97a89d13>] ? ip_output+0x73/0xe0
[ 6780.021296] [<ffffffffc0b4ab55>] zpl_write_common_iovec.constprop.7+0x95/0x100 [zfs]
[ 6780.029510] [<ffffffffc0b4acbe>] zpl_aio_write+0xfe/0x120 [zfs]
[ 6780.035702] [<ffffffff97640deb>] do_sync_readv_writev+0x7b/0xd0
[ 6780.041899] [<ffffffff97642a2e>] do_readv_writev+0xce/0x260
[ 6780.047786] [<ffffffffc0b4abc0>] ? zpl_write_common_iovec.constprop.7+0x100/0x100 [zfs]
[ 6780.056236] [<ffffffff97640c90>] ? do_sync_read+0xe0/0xe0
[ 6780.061898] [<ffffffff97642c55>] vfs_writev+0x35/0x60
[ 6780.067299] [<ffffffff97642e0f>] SyS_writev+0x7f/0x110
[ 6780.072725] [<ffffffff97b75ddb>] system_call_fastpath+0x22/0x27
Several times (after reboots):
[ 174.677724] PANIC: zfs: accessing past end of object 33/9f5c (size=29696 access=28056+1781)
[ 174.677724] PANIC: zfs: accessing past end of object 33/9f5c (size=29696 access=28056+1781)
[ 174.677724] PANIC: zfs: accessing past end of object 33/9f5c (size=29696 access=28056+1781)
[ 635.727935] PANIC: zfs: accessing past end of object 33/a044 (size=37376 access=28135+9315)
[163980.432350] PANIC: zfs: accessing past end of object 33/9dd9 (size=18432 access=18157+292)
[164221.276190] PANIC: zfs: accessing past end of object 33/9c7d (size=69120 access=69002+167)
Disks are connected via 02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) Subsystem: LSI Logic / Symbios Logic SAS9300-8i
For some reason zpl_aio_write
path to dmu_buf_hold_array_by_dnode
can lead to a bigger write than the dnode is able to hold, if I understand things correctly from all the crashes we're getting with recent~ish master HEAD and RHEL6 kernel.
The stack dump looks always about the same, it comes from zpl_aio_write
by writev
syscall.
Jul 13 06:59:09 node8 kernel: [1221170.633559] PANIC: zfs: accessing past end of object 6d5/697b5a (size=35328 access=35231+172)
Jul 13 06:59:09 node8 kernel: [1221170.633825] Showing stack for process 895956
Jul 13 06:59:09 node8 kernel: [1221170.633961] Pid: 895956, comm: rspamd veid: 2981 Tainted: P -- ------------ 2.6.32-042stab138.42 #1
Jul 13 06:59:09 node8 kernel: [1221170.634225] Call Trace:
Jul 13 06:59:09 node8 kernel: [1221170.634410] [<ffffffffa02afaab>] ? spl_dumpstack+0x2b/0x30 [spl]
Jul 13 06:59:09 node8 kernel: [1221170.634554] [<ffffffffa02afb3d>] ? vcmn_err+0x8d/0xf0 [spl]
Jul 13 06:59:09 node8 kernel: [1221170.634749] [<ffffffffa03b1cc0>] ? dbuf_rele_and_unlock+0x270/0x550 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.634950] [<ffffffff8156624e>] ? mutex_lock+0x1e/0x50
Jul 13 06:59:09 node8 kernel: [1221170.635113] [<ffffffffa03b0f93>] ? dbuf_find+0x183/0x190 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.635255] [<ffffffff8156624e>] ? mutex_lock+0x1e/0x50
Jul 13 06:59:09 node8 kernel: [1221170.635409] [<ffffffff811b1630>] ? kmem_cache_alloc_node_trace+0x1f0/0x220
Jul 13 06:59:09 node8 kernel: [1221170.635605] [<ffffffffa042e0c2>] ? zfs_panic_recover+0x52/0x60 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.635746] [<ffffffffa02b068e>] ? spl_kmem_zalloc+0x8e/0x150 [spl]
Jul 13 06:59:09 node8 kernel: [1221170.635904] [<ffffffffa03bde07>] ? dmu_buf_hold_array_by_dnode+0x1b7/0x4c0 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636183] [<ffffffffa03be156>] ? dmu_write_uio_dnode+0x46/0x140 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636352] [<ffffffffa03be325>] ? dmu_write_uio_dbuf+0x55/0x70 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636533] [<ffffffffa0495f6b>] ? zfs_write+0xe0b/0xf70 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636702] [<ffffffffa03bd6c0>] ? dmu_buf_rele_array+0x70/0x90 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.636879] [<ffffffffa04b3f69>] ? zpl_readpage+0x79/0xa0 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.637051] [<ffffffffa04b4223>] ? zpl_write_common_iovec.clone.0+0xa3/0xe0 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.637333] [<ffffffffa04b43aa>] ? zpl_aio_write+0x10a/0x120 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.637511] [<ffffffffa04b42a0>] ? zpl_aio_write+0x0/0x120 [zfs]
Jul 13 06:59:09 node8 kernel: [1221170.637643] [<ffffffff811c80ad>] ? do_sync_readv_writev+0xfd/0x140
Jul 13 06:59:09 node8 kernel: [1221170.637769] [<ffffffff81183e5f>] ? handle_pte_fault+0x9f/0x1250
Jul 13 06:59:09 node8 kernel: [1221170.637894] [<ffffffff810b5420>] ? autoremove_wake_function+0x0/0x40
Jul 13 06:59:09 node8 kernel: [1221170.638020] [<ffffffff81570f4e>] ? common_interrupt+0xe/0x13
Jul 13 06:59:09 node8 kernel: [1221170.638146] [<ffffffff8118528b>] ? handle_mm_fault+0x27b/0x360
Jul 13 06:59:09 node8 kernel: [1221170.638271] [<ffffffff811c7ed3>] ? rw_copy_check_uvector+0xa3/0x130
Jul 13 06:59:09 node8 kernel: [1221170.638406] [<ffffffff81188b56>] ? find_vma+0x46/0x80
Jul 13 06:59:09 node8 kernel: [1221170.638540] [<ffffffff811c9136>] ? do_readv_writev+0xd6/0x1f0
Jul 13 06:59:09 node8 kernel: [1221170.638668] [<ffffffff81484f7e>] ? sys_recvfrom+0xee/0x180
Jul 13 06:59:09 node8 kernel: [1221170.638797] [<ffffffff81013aa9>] ? read_tsc+0x9/0x20
Jul 13 06:59:09 node8 kernel: [1221170.638926] [<ffffffff811c9296>] ? vfs_writev+0x46/0x60
Jul 13 06:59:09 node8 kernel: [1221170.639057] [<ffffffff811c93c1>] ? sys_writev+0x51/0xd0
Jul 13 06:59:09 node8 kernel: [1221170.639193] [<ffffffff81570427>] ? system_call_fastpath+0x35/0x3a
For now, I'm trying to serialize writes to the same file on the top level of the zfs stack with i_mutex; some comments in the code say it's not needed for ZFS, but I've looked in BTRFS in our RHEL6/OpenVZ kernel and it does the same.
commit: https://github.com/vpsfreecz/zfs/commit/30891525e0b666732ea913620528bb8ee83ebee0
Edit: this didn't help... maybe I need to wrap the mutex around more actions...
Seeing this issue as well. 0.8.1 on Centos 7.6. This bug seems specific to ZFSoL, as we have not seen it on our similarly used FreeBSD ZFS fileservers.
Sep 14 20:13:28 envy kernel: PANIC: zfs: accessing past end of object 18c/4f794d
(size=81920 access=71336+16384)
Sep 14 20:13:28 envy kernel: Showing stack for process 18295
Sep 14 20:13:28 envy kernel: CPU: 5 PID: 18295 Comm: dafileserver Kdump: loaded
Tainted: P OE ------------ 3.10.0-957.21.3.el7.x86_64 #1
Sep 14 20:13:28 envy kernel: Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH
-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013
Sep 14 20:13:28 envy kernel: Call Trace:
Sep 14 20:13:28 envy kernel: [<ffffffff82763107>] dump_stack+0x19/0x1b
Sep 14 20:13:28 envy kernel: [<ffffffffc058b9db>] spl_dumpstack+0x2b/0x30 [spl]
Sep 14 20:13:28 envy kernel: [<ffffffffc058bb5c>] vcmn_err+0x6c/0x110 [spl]
Sep 14 20:13:28 envy kernel: [<ffffffff82767152>] ? mutex_lock+0x12/0x2f
Sep 14 20:13:28 envy kernel: [<ffffffffc10b256c>] ? dmu_zfetch+0x49c/0x590 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc108f863>] ? dbuf_rele_and_unlock+0x283/0
x5c0 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc058c52f>] ? spl_kmem_alloc+0xdf/0x140 [s
pl]
Sep 14 20:13:28 envy kernel: [<ffffffff82767152>] ? mutex_lock+0x12/0x2f
Sep 14 20:13:28 envy kernel: [<ffffffffc108c5f3>] ? dbuf_find+0x1e3/0x200 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffff82767152>] ? mutex_lock+0x12/0x2f
Sep 14 20:13:28 envy kernel: [<ffffffff8221f0d1>] ? __kmalloc_node+0x1d1/0x2b0
Sep 14 20:13:28 envy kernel: [<ffffffffc110c3a9>] zfs_panic_recover+0x69/0x90 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc109b2a7>] dmu_buf_hold_array_by_dnode+0x2d7/0x4a0 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc10c960a>] ? dsl_dir_tempreserve_space+0x1fa/0x4a0 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc109cc45>] dmu_write_uio_dnode+0x55/0x150 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc10b19ad>] ? dmu_tx_assign+0x20d/0x490 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc109cd94>] dmu_write_uio_dbuf+0x54/0x70 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc116fe6c>] zfs_write+0xd3c/0xed0 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffff826b3a77>] ? udp_sendmsg+0x417/0xa10
Sep 14 20:13:28 envy kernel: [<ffffffff820dccbe>] ? account_entity_dequeue+0xae/0xd0
Sep 14 20:13:28 envy kernel: [<ffffffff8202a621>] ? __switch_to+0x151/0x580
Sep 14 20:13:28 envy kernel: [<ffffffffc118f74e>] zpl_write_common_iovec.constprop.8+0x9e/0x100 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffffc118f8b4>] zpl_aio_write+0x104/0x120 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffff82240e6b>] do_sync_readv_writev+0x7b/0xd0
Sep 14 20:13:28 envy kernel: [<ffffffff82242aae>] do_readv_writev+0xce/0x260
Sep 14 20:13:28 envy kernel: [<ffffffffc118f7b0>] ? zpl_write_common_iovec.constprop.8+0x100/0x100 [zfs]
Sep 14 20:13:28 envy kernel: [<ffffffff82240d10>] ? do_sync_read+0xe0/0xe0
Sep 14 20:13:28 envy kernel: [<ffffffff8210fbfa>] ? do_futex+0x12a/0x5a0
Sep 14 20:13:28 envy kernel: [<ffffffff82242cd5>] vfs_writev+0x35/0x60
Sep 14 20:13:28 envy kernel: [<ffffffff822430d2>] SyS_pwritev+0xc2/0xf0
Sep 14 20:13:28 envy kernel: [<ffffffff82775ddb>] system_call_fastpath+0x22/0x27
We're experiencing this one way too often. Anything I did didn't help me to get to the root cause of this.
The only thing I know for sure is that it does this when reclaim kicks in - which does so because of the OpenVZ patchset even in OOM event inside the vz container - it invokes all the possible shrinkers registered anyway with little regard for what is whose...
[1371878.962126] [<ffffffffa02e1b1b>] ? spl_dumpstack+0x2b/0x30 [spl]
[1371878.962139] [<ffffffffa02e1bad>] ? vcmn_err+0x8d/0xf0 [spl]
[1371878.962149] [<ffffffff81055f04>] ? __do_page_fault+0x1f4/0x4d0
[1371878.962156] [<ffffffff8156636e>] ? mutex_lock+0x1e/0x50
[1371878.962217] [<ffffffffa03e2e93>] ? dbuf_find+0x193/0x1a0 [zfs]
[1371878.962224] [<ffffffff811b1770>] ? kmem_cache_alloc_node_trace+0x1f0/0x220
[1371878.962299] [<ffffffffa046cbf2>] ? zfs_panic_recover+0x52/0x60 [zfs]
[1371878.962363] [<ffffffffa03f0ad7>] ? dmu_buf_hold_array_by_dnode+0x1b7/0x4d0 [zfs]
[1371878.962412] [<ffffffffa03f0e36>] ? dmu_write_uio_dnode+0x46/0x140 [zfs]
[1371878.962457] [<ffffffffa03f1005>] ? dmu_write_uio_dbuf+0x55/0x70 [zfs]
[1371878.962526] [<ffffffffa04d643b>] ? zfs_write+0xe4b/0xfb0 [zfs]
[1371878.962590] [<ffffffffa044de7c>] ? rrm_exit+0x4c/0xa0 [zfs]
[1371878.962600] [<ffffffff81166882>] ? __do_page_cache_readahead+0x1a2/0x230
[1371878.962666] [<ffffffffa04f45c3>] ? zpl_write_common_iovec.clone.0+0xa3/0xe0 [zfs]
[1371878.962725] [<ffffffffa04f474a>] ? zpl_aio_write+0x10a/0x120 [zfs]
[1371878.962785] [<ffffffffa04f4640>] ? zpl_aio_write+0x0/0x120 [zfs]
[1371878.962794] [<ffffffff811c81ed>] ? do_sync_readv_writev+0xfd/0x140
[1371878.962801] [<ffffffff81183eff>] ? handle_pte_fault+0x9f/0x1250
[1371878.962809] [<ffffffff810b5460>] ? autoremove_wake_function+0x0/0x40
[1371878.962816] [<ffffffff8118532b>] ? handle_mm_fault+0x27b/0x360
[1371878.962822] [<ffffffff811c8013>] ? rw_copy_check_uvector+0xa3/0x130
[1371878.962830] [<ffffffff811c9276>] ? do_readv_writev+0xd6/0x1f0
[1371878.962836] [<ffffffff811c93d6>] ? vfs_writev+0x46/0x60
[1371878.962842] [<ffffffff811c9501>] ? sys_writev+0x51/0xd0
[1371878.962849] [<ffffffff81570437>] ? system_call_fastpath+0x35/0x3a
@behlendorf can someone more senior please help me debug this?
We've been loosing members pretty fast because of this one, in a few weeks tens of people gone...
[500397.655112] PANIC: zfs: accessing past end of object d99/9987b (size=6656 access=6623+131)
[500397.655519] Showing stack for process 204542
[500397.655525] Pid: 204542, comm: rspamd veid: 8680 Tainted: P -- ------------ 2.6.32-042stab139.44 #1
[500397.655534] Call Trace:
[500397.655581] [<ffffffffa02e9abb>] ? spl_dumpstack+0x2b/0x30 [spl]
[500397.655596] [<ffffffffa02e9b4d>] ? vcmn_err+0x8d/0xf0 [spl]
[500397.655608] [<ffffffff81055f04>] ? __do_page_fault+0x1f4/0x4d0
[500397.655618] [<ffffffff8156636e>] ? mutex_lock+0x1e/0x50
[500397.655678] [<ffffffffa03eaf83>] ? dbuf_find+0x183/0x190 [zfs]
[500397.655686] [<ffffffff811b1770>] ? kmem_cache_alloc_node_trace+0x1f0/0x220
[500397.655751] [<ffffffffa0468162>] ? zfs_panic_recover+0x52/0x60 [zfs]
[500397.655796] [<ffffffffa03f7e17>] ? dmu_buf_hold_array_by_dnode+0x1b7/0x4c0 [zfs]
[500397.655873] [<ffffffffa03f8166>] ? dmu_write_uio_dnode+0x46/0x140 [zfs]
[500397.655927] [<ffffffffa03f8335>] ? dmu_write_uio_dbuf+0x55/0x70 [zfs]
[500397.655998] [<ffffffffa04d013b>] ? zfs_write+0xe4b/0xfb0 [zfs]
[500397.656054] [<ffffffffa03f76d0>] ? dmu_buf_rele_array+0x70/0x90 [zfs]
[500397.656076] [<ffffffff8114e5d7>] ? unlock_page+0x27/0x30
[500397.656146] [<ffffffffa04ee1c9>] ? zpl_readpage+0x79/0xa0 [zfs]
[500397.656220] [<ffffffffa04ee483>] ? zpl_write_common_iovec.clone.0+0xa3/0xe0 [zfs]
[500397.656289] [<ffffffffa04ee60a>] ? zpl_aio_write+0x10a/0x120 [zfs]
[500397.656361] [<ffffffffa04ee500>] ? zpl_aio_write+0x0/0x120 [zfs]
[500397.656381] [<ffffffff811c81ed>] ? do_sync_readv_writev+0xfd/0x140
[500397.656403] [<ffffffff81183eff>] ? handle_pte_fault+0x9f/0x1250
[500397.656423] [<ffffffff810b5460>] ? autoremove_wake_function+0x0/0x40
[500397.656440] [<ffffffff8118532b>] ? handle_mm_fault+0x27b/0x360
[500397.656461] [<ffffffff811c8013>] ? rw_copy_check_uvector+0xa3/0x130
[500397.656479] [<ffffffff811c9276>] ? do_readv_writev+0xd6/0x1f0
[500397.656495] [<ffffffff811c93d6>] ? vfs_writev+0x46/0x60
[500397.656507] [<ffffffff811c9501>] ? sys_writev+0x51/0xd0
[500397.656520] [<ffffffff81570437>] ? system_call_fastpath+0x35/0x3a
Heck, after all, @martellini "trying to use it in production" would be a good way how to describe the 5-year-long journey I've had with ZoL so far ;)
That is to say, I don't think there's any better alternative than ZFS for the use-case we have for it. I'd like to chase this one little ****er down rather than argue about who said what about what...
Anyway, I'm now trying to roll back to 0.7.9, had to back-port project quotas, although we've never used them, just enabling them changed format on-disk, contrary to all other features.
https://github.com/vpsfreecz/zfs/commits/vz-0.7.9
[root@node1.pgnd.vpsfree.cz]
~ # zpool get all
NAME PROPERTY VALUE SOURCE
vz size 5.44T -
vz capacity 35% -
vz altroot - default
vz health ONLINE -
vz guid 6706140627667879515 -
vz version - default
vz bootfs - default
vz delegation on default
vz autoreplace off default
vz cachefile - default
vz failmode wait default
vz listsnapshots off default
vz autoexpand off default
vz dedupditto 0 default
vz dedupratio 1.00x -
vz free 3.48T -
vz allocated 1.96T -
vz readonly off -
vz ashift 0 default
vz comment - default
vz expandsize - -
vz freeing 247G -
vz fragmentation 24% -
vz leaked 0 -
vz multihost off default
vz feature@async_destroy enabled local
vz feature@empty_bpobj active local
vz feature@lz4_compress active local
vz feature@multi_vdev_crash_dump enabled local
vz feature@spacemap_histogram active local
vz feature@enabled_txg active local
vz feature@hole_birth active local
vz feature@extensible_dataset active local
vz feature@embedded_data active local
vz feature@bookmarks enabled local
vz feature@filesystem_limits enabled local
vz feature@large_blocks enabled local
vz feature@large_dnode enabled local
vz feature@sha512 enabled local
vz feature@skein enabled local
vz feature@edonr enabled local
vz feature@userobj_accounting active local
vz feature@project_quota active local
vz unsupported@com.datto:encryption inactive local
vz unsupported@com.datto:bookmark_v2 inactive local
vz unsupported@com.delphix:device_removal inactive local
vz unsupported@com.datto:resilver_defer inactive local
vz unsupported@com.delphix:obsolete_counts inactive local
[root@node1.pgnd.vpsfree.cz]
~ # modinfo zfs | head
filename: /lib/modules/2.6.32-042stab139.44/extra/zfs/zfs/zfs.ko
version: 0.7.9-44
license: CDDL
author: OpenZFS on Linux
description: ZFS
retpoline: Y
srcversion: 533BB7E5866E52F63B9ACCB
depends: spl,znvpair,zcommon,zunicode,zavl,icp
vermagic: 2.6.32-042stab139.44 SMP mod_unload modversions
parm: zvol_inhibit_dev:Do not create zvol device nodes (uint)
Is triggered somewhat consistently on my RT kernel during heavy IO and a non-io process being 'reniced' to -20
hit by the same issue, running zfs 0.7.13 on CentOS 7.6 kernel 3.10.0-957.27.2.el7 Kernel panic during heavy IO.
kernel:PANIC: zfs: accessing past end of object 1303/4ecce (size=27136 access=20000+26958) Any workaround for this? roll back to older version help?
@atjegan 0.7.9 seems to be without this bug, running without a crash so far
@snajpa thanks for the update. Will try the version 0.7.9.
I've encounterd the same error on several different servers. but only with the version 0.7.13 and later.(not with 0.7.12)
Type | |
---|---|
Distribution Name | CentOS |
Distribution Version | 7.7.1908 |
Linux Kernel | 3.10.0-1062.1.2.el7.x86_64 |
ZFS version | 0.8.2-1 |
Oct 21 17:01:25 a975tbd kernel: PANIC: zfs: accessing past end of object 11b/a8a00 (size=664064 access=663564+8192)
Oct 23 11:49:57 a975tbd kernel: PANIC: zfs: accessing past end of object 11b/ad709 (size=270336 access=270328+8192)
Oct 23 14:37:56 a975tbd kernel: PANIC: zfs: accessing past end of object 11b/aabd3 (size=237568 access=237560+8192)
Oct 23 15:21:40 a975tbd kernel: PANIC: zfs: accessing past end of object 11b/a8649 (size=188928 access=188420+8192)
Oct 24 15:12:56 a975tbd kernel: PANIC: zfs: accessing past end of object 10e/aadd5 (size=117760 access=117223+8614)
Oct 25 08:55:00 a975tbd kernel: PANIC: zfs: accessing past end of object 10e/a9e6e (size=533504 access=533089+8308)
Oct 27 09:31:06 a975tbd kernel: PANIC: zfs: accessing past end of object 10e/a96bf (size=655360 access=654575+8274)
Nov 3 09:21:57 a975tbd kernel: PANIC: zfs: accessing past end of object 10e/af16c (size=258048 access=257426+8386)
Application : MegaHit, it's genome assembler
Data : 2 fastq format files which can be download from here for example.
Command : like this
$ while :; do megahit -1 DRR000001_1.fastq -2 DRR000001_2.fastq -o outdir_${DATE} ; done
Frequency
repeating the command , the error will occur in a few hours(min) to a few days
We've experienced/tested on various hardware/OS/zfs versions and the tendency is like this.
ZFS Version
Hardware RAID or not
We've tested both on hardware RAID and standalone SATA/NVMe drive.(haven't tested with RAID Z)
ZFS Setting
@behlendorf Considering your statements on #9034 this one is definately critical and seems to be (semi-)repeatable
@Ornias1993 agreed. Based on the very helpful https://github.com/zfsonlinux/zfs/issues/8673#issuecomment-550326225 above I reviewed all of the changes from 0.7.12 to 0.7.13. The following commit, which addresses a rare deadlock condition, is the most likely cause.
98bb45e27ae deadlock between mm_sem and tx assign in zfs_write() and page fault
@kazunoriohki @snajpa if it wouldn't be too much trouble can you try reverting this commit from 0.7.13 and try to reproduce the issue. I'll see if I can reproduce the issue as well using the test case described above.
@behlendorf Thanks, I'll try with https://github.com/zfsonlinux/zfs/commit/98bb45e27ae80145a6ce028df90fccdb23f8901d. Maybe it will take a week or two, then I'll report here.
Same here :)
Oh it's this patch. @behlendorf, I can't be sure because I didn't take notes at the time (and I should have), but I think I've ruled out that patch, because it only appeared in 0.7.13 release, but the original issue mentions 0.7.12.
@behlendorf reverting 98bb45e resolved the issue at our environment!
with 0.7.13(native), the error has been reproduced in 13times, 92times, 144times (1 job takes about 5min, and I've repeated 3 times for making sure), while the error never happend over 750times with reverting 98bb45e commit.
We are experiencing this bug in production also. It appears in 0.8.2, 0.7.13, 0.7.12, 0.7.11. I'm pretty sure it didn't appear in 0.7.9, but now I have problems when downgrading to 0.7.9, probably should downgrade more dependencies. I'll try to migrate the workload to a server were we still use 0.7.9.
The workload is tile rendering using Mapnik from PostgreSQL database with Postgis and some Shapefiles.
Was toying with my gentoo-musl chroot lately, and managed to reproduce this 100% of the time, 4 times in a row: within the chroot, compile gentoo-sources kernel 5.4.12 with -j16. Compiling from the glibc host system doesn't seem to trigger the issue (compiler is gcc both for musl and glibc systems).
The first time it happened, I was emerging some packages with -j16 in MAKEOPTS, -j3 to emerge (machine with 8 cores/16 threads). The other 3 times, I got something similar to this (sorry for the picture, I couldn't write to the filesystem anymore and "sync" would hang): https://pasteboard.co/IQqr7sH.jpg
First time when compiling kernel happened after ~6h uptime, 2nd and 3rd time were from cold boot (pressed reset button, remounted chroot, restarted compiling).
The system was usable, except for writing to disk. The only out-of-tree module is ZFS.
I will try to revert the patch mentioned above and post results.
I did some additional digging (sorry, I almost forgot about this and then I tried to toy with the musl chroot again last evening...). I am not sure why the musl chroot is able to trigger this 100% of the time, but the host glibc system does not, consistently (currently testing a GCC compile that is 1h through, still working).
After analyzing the code (git tag zfs-0.8.3) and the patch, I noticed that when dmu_write_uio_dbuf() is called from zfs_write (L#827) the operation can result in EFAULT in zfs_uio.c/uiomove_iov (L#92,98), returning early without touching uio->uio_loffset, uio->uio_resid, which in the end are used for issuing the next dmu_write_uio_dbuf() call in the zfs_write while() loop for the pending bytes. In this I am assuming that the first call to dmu_write_uio_dbuf() in the while() loop returns an EFAULT, uio_prefaultpages is called returning 0, then the "continue" statement is triggered at L#835, and it restarts with an unsynchronized uio struct. Even though uio_prefaultpages is passed the uio struct, it doesn't touch uio_loffset/uio_resid either that I can see/understand. I am basing this on the fact that uio_loffset/uio_resid are used for calculating offset and size for the next issue, but don't seem to be modified when passed down from zfs_write/dmu_write_uio_dbuf to uiomove_iov().
I am guessing two fixes should be possible, and both could be implemented:
Tentative (not very elegant) patch: https://pastebin.com/i5xnxii5 The last if (bytes_left > 0) block is copy-pasted from the while loop's end and adapted.
Seems to be working properly:
[edit]: spoke too soon, it took much longer for it to happen, but it happened again:
[ 4504.376866] PANIC: zfs: accessing past end of object 613/a4f9c (size=17920 access=17837+1025) [ 4504.376876] Showing stack for process 9206 [ 4504.376884] CPU: 3 PID: 9206 Comm: cc1plus Tainted: P O 5.5.10-gentoo-x86_64 #1 [ 4504.376889] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 AORUS PRO, BIOS F12e 03/06/2020 [ 4504.376895] Call Trace: [ 4504.376907] dump_stack+0x71/0x98 [ 4504.376919] spl_panic+0x176/0x21e [spl] [ 4504.376957] zfs_panic_recover+0x79/0xa0 [zfs] [ 4504.376981] dmu_buf_rele_array+0x45b/0x4d0 [zfs] [ 4504.376999] ? dsl_dir_tempreserve_space+0x222/0x4e0 [zfs] [ 4504.377017] dmu_write_uio_dnode+0x66/0x150 [zfs] [ 4504.377036] dmu_write_uio_dbuf+0x4a/0x70 [zfs] [ 4504.377055] zfs_write+0x9b6/0xe70 [zfs] [ 4504.377101] zle_decompress+0x1365/0x13c0 [zfs] [ 4504.377124] zpl_putpage+0x1a3/0x1130 [zfs] [ 4504.377139] do_iter_readv_writev+0x19e/0x250 [ 4504.377153] do_iter_write+0x80/0x190 [ 4504.377161] vfs_writev+0xf7/0x140 [ 4504.377178] ? __fget_light+0x6c/0xa0 [ 4504.377187] do_writev+0x78/0x120 [ 4504.377198] do_syscall_64+0xd2/0x480 [ 4504.377207] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 4504.377215] RIP: 0033:0x7fa2f5b9fceb [ 4504.377222] Code: 41 ff cc 48 29 d0 49 83 c6 10 49 8b 56 08 49 63 ec 49 01 06 48 29 c2 49 89 56 08 4c 89 e8 4c 89 f6 48 89 ea 48 63 7b 78 0f 05 <48> 89 c7 e8 5d be fb ff 49 39 c7 75 b8 48 8b 43 58 48 8b 4b 60 48 [ 4504.377228] RSP: 002b:00007fff9aaaa810 EFLAGS: 00000212 ORIG_RAX: 0000000000000014 [ 4504.377236] RAX: ffffffffffffffda RBX: 0000000005ab89c0 RCX: 00007fa2f5b9fceb [ 4504.377241] RDX: 0000000000000002 RSI: 00007fff9aaaa820 RDI: 0000000000000001 [ 4504.377246] RBP: 0000000000000002 R08: 00000000056d6cc0 R09: 0000000000000000 [ 4504.377251] R10: 0000000000000070 R11: 0000000000000212 R12: 0000000000000002 [ 4504.377256] R13: 0000000000000014 R14: 00007fff9aaaa820 R15: 0000000000000401
Did some more analysis and test patching, 2nd final patch (to be used with the first one above): https://pastebin.com/L1VStMDx
There were two errors fixed by the 2nd patch in the zfs_write while(n > 0) loop.
I have left GCC compiling run in a loop overnight: 11 successful merges (and counting) with no sign of breakage.
System information
Describe the problem you're observing
Splunk was installed on a dataset. After sometime it crashed the whole hypervisor with a kernel panic. Had to manually reboot the hypervisor
Describe how to reproduce the problem
Seems to be related to the read/write heavy of Splunk. Unfortunately, can't reproduce accurately. Maybe the logs will help pinpoint the exact issue so we can reproduce.
Include any warning/errors/backtraces from the system logs