Open snajpa opened 3 years ago
One of my backup machines is now in a panic-reboot cycle with this error. joyent_20200813T030805Z Supermicro Super Server H11SSL-C AMD EPYC 7401P 24-Core Processor CPU LSI SAS3008 PCI-Express Fusion-MPT SAS-3 Chelsio Communications Inc T580-SO-CR Unified Wire Ethernet Controller Sanima/Newisys/Viking Storage 106 disk JBOD
In single user mode ZFS checks out, passes scrubs etc. In multi-user mode it reports I/O errors, pool corruption, bad disks, and other nasties.
3 Other systems with the same config do no exhibit these errors yet. One other system is connected to the same JBOD but different zpools.
--- panic message --- backup004 ttya login: panic[cpu5]/thread=fffffe8579a29400: zfs: accessing past end of object 305/59ca (size=382976 access=382955+29)
fffffe00bd33e900 genunix:vcmn_err+42 () fffffe00bd33e970 zfs:zfs_panic_recover+65 () fffffe00bd33ea20 zfs:dmu_buf_hold_array_by_dnode+2b2 () fffffe00bd33eac0 zfs:dmu_write_uio_dnode+52 () fffffe00bd33eb10 zfs:dmu_write_uio_dbuf+5d () fffffe00bd33ed10 zfs:zfs_write+9ba () fffffe00bd33edb0 genunix:fop_write+111 () fffffe00bd33ee80 genunix:write+2c6 () fffffe00bd33eeb0 genunix:write32+15 () fffffe00bd33ef10 unix:brand_sys_sysenter+1dc ()
--- cut here ---
--- core status --- [root@00-07-43-52-c1-90 /var/crash/volatile]# mdb -k 28 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci ufs ip hook neti sockfs arp usba uhci smbios mm sd fctl stmf_sbd stmf zfs lofs idm mpt_sas sata crypto random cpc fcp logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]
::status debugging crash dump vmcore.28 (64-bit) from backup004 operating system: 5.11 joyent_20200813T030805Z (i86pc) git branch: release-20200813 git rev: 09f210323354e07890b312da9b4f5e33ae6b6df5 image uuid: (not set) panic message: zfs: accessing past end of object 684/59ca (size=518656 access=518656+5) dump content: kernel pages only
::system set ibft_noprobe=0x1 [0t1] set noexec_user_stack=0x1 [0t1] set noexec_user_stack_log=0x1 [0t1] set rlim_fd_cur=0x10000 [0t65536] set idle_cpu_no_deep_c=0x1 [0t1] set ip:ip_squeue_fanout=0x1 [0t1] set pcplusmp:apic_panic_on_nmi=0x1 [0t1] set apix:apic_panic_on_nmi=0x1 [0t1] set dump_plat_mincpu=0x0 [0t0] set dump_bzip2_level=0x1 [0t1] set pcplusmp:apic_timer_preferred_mode=0x0 [0t0] set apix:apic_timer_preferred_mode=0x0 [0t0] set dump_metrics_on=0x1 [0t1] set sata:sata_auto_online=0x1 [0t1] set sd:sd_io_time=0xa [0t10] set hires_tick=0x1 [0t1] set zfs:zfs_multihost_interval=0xffff [0t65535] set zfs:zfs_resilver_delay=0x0 [0t0] set zfs:zfs_resilver_min_time_ms=0x0 [0t0] set zfs:zfs_scrub_delay=0x0 [0t0] set zfs:zfs_arc_max=0xc80000000 [0t53687091200] set ddi_msix_alloc_limit=0x8 [0t8] set pcplusmp:apic_intr_policy=0x1 [0t1] set ip_squeue_soft_ring=0x1 [0t1]
--- cut here ---
I'm seeing the same issue on Manjaro with 5.14 kernel and ZFS 2.1.1:
[Wed Nov 10 09:09:34 2021] PANIC: zfs: accessing past end of object 304/142547 (size=70656 access=70284+4096)
[Wed Nov 10 09:09:34 2021] Showing stack for process 112516
[Wed Nov 10 09:09:34 2021] CPU: 6 PID: 112516 Comm: ThreadPoolForeg Tainted: P OE 5.14.10-1-MANJARO #1
[Wed Nov 10 09:09:34 2021] Hardware name: Dell Inc. Precision 3530/04MPXC, BIOS 1.10.1 09/04/2019
[Wed Nov 10 09:09:34 2021] Call Trace:
[Wed Nov 10 09:09:34 2021] dump_stack_lvl+0x46/0x5a
[Wed Nov 10 09:09:34 2021] vcmn_err.cold+0x50/0x68 [spl]
[Wed Nov 10 09:09:34 2021] zfs_panic_recover+0x75/0x90 [zfs]
[Wed Nov 10 09:09:34 2021] dmu_buf_hold_array_by_dnode+0x219/0x5f0 [zfs]
[Wed Nov 10 09:09:34 2021] dmu_write_uio_dnode+0x5e/0x1a0 [zfs]
[Wed Nov 10 09:09:34 2021] ? dnode_set_blksz+0x141/0x310 [zfs]
[Wed Nov 10 09:09:34 2021] ? dnode_rele_and_unlock+0x59/0xf0 [zfs]
[Wed Nov 10 09:09:34 2021] dmu_write_uio_dbuf+0x4a/0x70 [zfs]
[Wed Nov 10 09:09:34 2021] zfs_write+0x4e8/0xcc0 [zfs]
[Wed Nov 10 09:09:34 2021] zpl_iter_write+0x10c/0x190 [zfs]
[Wed Nov 10 09:09:34 2021] new_sync_write+0x159/0x1f0
[Wed Nov 10 09:09:34 2021] vfs_write+0x1db/0x270
[Wed Nov 10 09:09:34 2021] __x64_sys_pwrite64+0x8c/0xc0
[Wed Nov 10 09:09:34 2021] do_syscall_64+0x3b/0x90
[Wed Nov 10 09:09:34 2021] entry_SYSCALL_64_after_hwframe+0x44/0xae
[Wed Nov 10 09:09:34 2021] RIP: 0033:0x7f20d82060ef
[Wed Nov 10 09:09:34 2021] Code: 08 89 3c 24 48 89 4c 24 18 e8 7d f3 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 cd f3 ff ff 48 8b
[Wed Nov 10 09:09:34 2021] RSP: 002b:00007f20cf81f660 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[Wed Nov 10 09:09:34 2021] RAX: ffffffffffffffda RBX: 000026fa04437c00 RCX: 00007f20d82060ef
[Wed Nov 10 09:09:34 2021] RDX: 0000000000001000 RSI: 000026fa04437c00 RDI: 00000000000000c4
[Wed Nov 10 09:09:34 2021] RBP: 00007f20cf81f6d0 R08: 0000000000000000 R09: 0000563d7f13f590
[Wed Nov 10 09:09:34 2021] R10: 000000000001128c R11: 0000000000000293 R12: 0000000000001000
[Wed Nov 10 09:09:34 2021] R13: 000026fa04437c00 R14: 00000000000000c4 R15: 000000000001128c
Kernel Linux 5.14.10-1-MANJARO #1 SMP PREEMPT Thu Oct 7 06:43:34 UTC 2021 x86_64 GNU/Linux
Kernel and ZFS packages:
local/linux514-zfs 2.1.1-5 (linux514-extramodules)
Kernel modules for the Zettabyte File System.
local/zfs-utils 2.1.1-1 (archzfs-linux)
Kernel module support files for the Zettabyte File System.
The same thing happens with linux 5.13 (ZFS 2.1.1-1).
Dataset properties:
NAME PROPERTY VALUE SOURCE
zmain/home type filesystem -
zmain/home creation Wed Apr 21 10:50 2021 -
zmain/home used 97.4G -
zmain/home available 235G -
zmain/home referenced 52.4G -
zmain/home compressratio 1.43x -
zmain/home mounted yes -
zmain/home quota none default
zmain/home reservation none default
zmain/home recordsize 128K inherited from zmain
zmain/home mountpoint /home local
zmain/home sharenfs off default
zmain/home checksum on default
zmain/home compression zstd inherited from zmain
zmain/home atime off inherited from zmain
zmain/home devices on default
zmain/home exec on default
zmain/home setuid on default
zmain/home readonly off default
zmain/home zoned off default
zmain/home snapdir hidden default
zmain/home aclmode discard default
zmain/home aclinherit restricted default
zmain/home createtxg 240 -
zmain/home canmount on default
zmain/home xattr sa inherited from zmain
zmain/home copies 1 default
zmain/home version 5 -
zmain/home utf8only off -
zmain/home normalization none -
zmain/home casesensitivity sensitive -
zmain/home vscan off default
zmain/home nbmand off default
zmain/home sharesmb off default
zmain/home refquota none default
zmain/home refreservation none default
zmain/home guid 4366126344549602389 -
zmain/home primarycache all default
zmain/home secondarycache all default
zmain/home usedbysnapshots 45.0G -
zmain/home usedbydataset 52.4G -
zmain/home usedbychildren 0B -
zmain/home usedbyrefreservation 0B -
zmain/home logbias latency default
zmain/home objsetid 772 -
zmain/home dedup off default
zmain/home mlslabel none default
zmain/home sync standard default
zmain/home dnodesize legacy default
zmain/home refcompressratio 1.32x -
zmain/home written 13.8M -
zmain/home logicalused 122G -
zmain/home logicalreferenced 60.9G -
zmain/home volmode default default
zmain/home filesystem_limit none default
zmain/home snapshot_limit none default
zmain/home filesystem_count none default
zmain/home snapshot_count none default
zmain/home snapdev hidden default
zmain/home acltype posix inherited from zmain
zmain/home context none default
zmain/home fscontext none default
zmain/home defcontext none default
zmain/home rootcontext none default
zmain/home relatime off default
zmain/home redundant_metadata all default
zmain/home overlay on default
zmain/home encryption aes-256-gcm -
zmain/home keylocation none default
zmain/home keyformat passphrase -
zmain/home pbkdf2iters 350000 -
zmain/home encryptionroot zmain -
zmain/home keystatus available -
zmain/home special_small_blocks 0 default
Pool properties:
NAME PROPERTY VALUE SOURCE
zmain size 398G -
zmain capacity 37% -
zmain altroot - default
zmain health ONLINE -
zmain guid 3347456179796344331 -
zmain version - default
zmain bootfs - default
zmain delegation on default
zmain autoreplace off default
zmain cachefile - default
zmain failmode wait default
zmain listsnapshots off default
zmain autoexpand off default
zmain dedupratio 1.00x -
zmain free 247G -
zmain allocated 151G -
zmain readonly off -
zmain ashift 13 local
zmain comment - default
zmain expandsize - -
zmain freeing 0 -
zmain fragmentation 22% -
zmain leaked 0 -
zmain multihost off default
zmain checkpoint - -
zmain load_guid 800961323049942777 -
zmain autotrim off default
zmain compatibility off default
zmain feature@async_destroy enabled local
zmain feature@empty_bpobj active local
zmain feature@lz4_compress active local
zmain feature@multi_vdev_crash_dump enabled local
zmain feature@spacemap_histogram active local
zmain feature@enabled_txg active local
zmain feature@hole_birth active local
zmain feature@extensible_dataset active local
zmain feature@embedded_data active local
zmain feature@bookmarks enabled local
zmain feature@filesystem_limits enabled local
zmain feature@large_blocks active local
zmain feature@large_dnode enabled local
zmain feature@sha512 enabled local
zmain feature@skein enabled local
zmain feature@edonr enabled local
zmain feature@userobj_accounting active local
zmain feature@encryption active local
zmain feature@project_quota active local
zmain feature@device_removal enabled local
zmain feature@obsolete_counts enabled local
zmain feature@zpool_checkpoint enabled local
zmain feature@spacemap_v2 active local
zmain feature@allocation_classes enabled local
zmain feature@resilver_defer enabled local
zmain feature@bookmark_v2 enabled local
zmain feature@redaction_bookmarks enabled local
zmain feature@redacted_datasets enabled local
zmain feature@bookmark_written enabled local
zmain feature@log_spacemap active local
zmain feature@livelist active local
zmain feature@device_rebuild enabled local
zmain feature@zstd_compress active local
zmain feature@draid enabled local
This started occurring about 2 weeks ago with no kernel or ZFS package updates. The only change I have made recently was to switch compression from lz4
to zstd
, but that was some weeks prior to the issue starting, so it's a very weak signal.
@jm650 I see the same stack trace on SmartOS and wrote about that in the newsgroup https://smartos.topicbox.com/groups/smartos-discuss/Td34fab84c5051dbd/panic-zfs-accessing-past-end-of-object
it seems to be related to some write operation on shell logon
I've also just seen a similar trace on 5.15 with 2.1.2:
kernel: PANIC: zfs: accessing past end of object 23a/2fbf91 (size=291328 access=291223+16384)
kernel: Showing stack for process 162504
kernel: CPU: 12 PID: 162504 Comm: thunderbird Tainted: P W OE 5.15.12-lqx1-1-lqx #1
kernel: Hardware name: LENOVO 82JY/LNVNB161216, BIOS GKCN49WW 11/08/2021
kernel: Call Trace:
kernel: <TASK>
kernel: dump_stack_lvl+0x46/0x62
kernel: vcmn_err.cold+0x50/0x68 [spl]
kernel: zfs_panic_recover+0x75/0xa0 [zfs]
kernel: dmu_buf_hold_array_by_dnode+0x227/0x610 [zfs]
kernel: dmu_write_uio_dnode+0x5e/0x1b0 [zfs]
kernel: ? dnode_set_blksz+0x141/0x320 [zfs]
kernel: ? dnode_rele_and_unlock+0x59/0xf0 [zfs]
kernel: dmu_write_uio_dbuf+0x4a/0x70 [zfs]
kernel: zfs_write+0x4fc/0xcd0 [zfs]
kernel: zpl_iter_write+0x10c/0x190 [zfs]
kernel: vfs_write+0x338/0x470
kernel: ksys_write+0x67/0xf0
kernel: do_syscall_64+0x3b/0x90
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f8d7190393f
kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 09 56 f9 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c 56 f9 ff 48
kernel: RSP: 002b:00007fffcb9d18f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f8d7190393f
kernel: RDX: 0000000000004000 RSI: 00007f8d343ee000 RDI: 00000000000000b3
kernel: RBP: 00007f8d343ee000 R08: 0000000000000000 R09: 00007fffcb9d1ab4
kernel: R10: 00007fffcb9d1cba R11: 0000000000000293 R12: 0000000000004000
kernel: R13: 00007f8d3acf15c0 R14: 0000000000004000 R15: 00007f8d719d5700
kernel: </TASK>
I'll see if I can replicate it, but I don't think it's going to be easy...
Invested some time to look into this again today.
I have only remote access to the server via IPMI but updated one of the USB sticks (which this server still uses) to the latest platform image to ensure it is not a resolved issue. (Mounting a new image via ISO IPMI mounting and copying over from ISO to USB). Didn't change anything.
I then booted into the root/root mode and looked at the dataset affected:
zdb config -d zones|grep 'ID 1085'
(Note to people with similar errors: 21085 because 0x43d is 1085 decimal)
which showed it to be part of a filesystem/vm which i didn't need (looking it up in zones/config). Assuming it was only accessed during startup due to it being listed in /etc/zones/index
(zones/config), I removed it from the index and rebooted.
The system booted again now (Note to others on SmartOS. When changing mountpoints in root/root mode, don't forget to reset them before rebooting)
Two outstanding things now:
1) Would still like to know how this error happened to begin with, but even more importantly whether there is something I can do help debug this issue. I can upload the core dumps if someone is interested.
2) Why would it crash on login predictably ? This is more likely a SmartOS question though and not directly related to ZFS. I would have assume the vmadm service or rather the underlying zones infrastructure to boot up irrespective of the login. Potentially this would have happened anyway when waiting long enough and ensuring no SSH login attempts, but still curious as to where the connection is.
I obviously have a large number of dumps in /cores I could provide (core.metadata, core.vmadm, and core.vmadmd) to anybody interestet, as well as the theoretical ability (with some planing on my end) to put the machine back into the index and re-cause the issue.
I never followed up on my previous report. After switching recordsize back to 128k and compression back to lz4, I took a snapshot then did a send|receive into a new dataset on the same pool - fortunately I had enough space to do so. That was nearly 4 months ago, and the system has been perfectly stable since. This is on my daily-driver development laptop which was hitting the panic multiple times per day.
As far as I can tell, no data was ever harmed or corrupted. It seems that the data at-rest was fine, and the issue only surfaced when trying to write.
I have been struggling with what seems to be the same issue. Much like @scotte, I changed my recordsize to something a bit larger and switched the compression from 'lz4' to 'zstd' a few days before the panic appeared.
Since this has all started, it seems like just triggering several numerous reads (possibly some writes mixed in as the panic usually appears when launching applications) will cause it to panic and lock everything up.
Oddly enough, I can still use the machine if I switch TTY and log in as root. This includes being able to access data on the pool. As root, it appears the only thing that will not execute successfully is a clean shutdown/restart.
This is occurring on every kernel I've tried, and it does not make a difference if I use the dkms modules. I am running zfs 2.1.2.
Here is my stacktrace:
Feb 23 10:52:35.776839 <hostname> kernel: PANIC: zfs: accessing past end of object 36/ba611 (size=284160 access=283692+4096)
Feb 23 10:52:35.776993 <hostname> kernel: Showing stack for process 3868362
Feb 23 10:52:35.779653 <hostname> kernel: CPU: 7 PID: 3868362 Comm: ThreadPoolForeg Tainted: P OE 5.16.10-arch1-1 #1 481a3e145f0d7fd554ab88c4fa876a7ddc0795df
Feb 23 10:52:35.779691 <hostname> kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B17/MPG Z390 GAMING EDGE AC (MS-7B17), BIOS A.B0 10/12/2020
Feb 23 10:52:35.779708 <hostname> kernel: Call Trace:
Feb 23 10:52:35.779723 <hostname> kernel: <TASK>
Feb 23 10:52:35.779736 <hostname> kernel: dump_stack_lvl+0x48/0x5e
Feb 23 10:52:35.779750 <hostname> kernel: vcmn_err.cold+0x50/0x68 [spl d207c65daab89e4aba1563f3c9772829c82d0963]
Feb 23 10:52:35.779766 <hostname> kernel: zfs_panic_recover+0x75/0x90 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779780 <hostname> kernel: dmu_buf_hold_array_by_dnode+0x219/0x5f0 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779793 <hostname> kernel: dmu_write_uio_dnode+0x5e/0x1a0 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779805 <hostname> kernel: ? dnode_set_blksz+0x141/0x310 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779817 <hostname> kernel: ? dnode_rele_and_unlock+0x59/0xf0 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779828 <hostname> kernel: dmu_write_uio_dbuf+0x4a/0x70 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779852 <hostname> kernel: zfs_write+0x4e8/0xcc0 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779870 <hostname> kernel: zpl_iter_write+0x114/0x190 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779886 <hostname> kernel: new_sync_write+0x159/0x1f0
Feb 23 10:52:35.779899 <hostname> kernel: vfs_write+0x1eb/0x280
Feb 23 10:52:35.779911 <hostname> kernel: __x64_sys_pwrite64+0x8c/0xc0
Feb 23 10:52:35.779923 <hostname> kernel: do_syscall_64+0x59/0x80
Feb 23 10:52:35.779935 <hostname> kernel: ? vfs_write+0x174/0x280
Feb 23 10:52:35.779951 <hostname> kernel: ? vfs_write+0x174/0x280
Feb 23 10:52:35.779964 <hostname> kernel: ? syscall_exit_to_user_mode+0x23/0x40
Feb 23 10:52:35.779977 <hostname> kernel: ? do_syscall_64+0x69/0x80
Feb 23 10:52:35.779989 <hostname> kernel: ? exc_page_fault+0x72/0x170
Feb 23 10:52:35.780003 <hostname> kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Feb 23 10:52:35.780017 <hostname> kernel: RIP: 0033:0x7f757047c23f
Feb 23 10:52:35.780031 <hostname> kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 bd 98 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 fd 98 f8 ff 48 8b
Feb 23 10:52:35.780044 <hostname> kernel: RSP: 002b:00007f756d0b7b20 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Feb 23 10:52:35.780058 <hostname> kernel: RAX: ffffffffffffffda RBX: 0000107c0315a000 RCX: 00007f757047c23f
Feb 23 10:52:35.780264 <hostname> kernel: RDX: 0000000000001000 RSI: 0000107c0315a000 RDI: 00000000000000a6
Feb 23 10:52:35.780282 <hostname> kernel: RBP: 00007f756d0b7b90 R08: 0000000000000000 R09: 000055b82f179b40
Feb 23 10:52:35.780314 <hostname> kernel: R10: 000000000004542c R11: 0000000000000293 R12: 0000000000001000
Feb 23 10:52:35.780329 <hostname> kernel: R13: 0000107c0315a000 R14: 00000000000000a6 R15: 000000000004542c
Feb 23 10:52:35.780344 <hostname> kernel: </TASK>
Similar (maybe the same) issue for me, too: NixOS unstable, zfs v2.1.3 => v2.1.4 (kernel 5.15 => 5.17) and I also changed blocksize of some of the datasets to 1M.
After a minute or two into user login the system hanged partially (could switch VTs, some terminals worked but couldn't login or start new processes) and after another two-ish minutes I got kernel panic zfs: accessing past end of object
.
Can't identify the exact moment this started to happen but it was neither immediately after the upgrade, nor immediately after the block size change. Pool created with zstd
compression, single NVME device (partition), ashift=12
, encrypted (single encryptionroot
).
send | recv
to the same pool helped although it was slightly complicated because of encryption - couldn't use -R
because it changed encryptionroot
so I had to send | recv
each dataset one by one and manually set the parameters. Funnily enough, using -Rw
was much slower than sending unencrypted datasets. As in 1-10 MB/s for raw and 100-500 MB/s for unencrypted.
Similar (maybe the same) issue for me, too: NixOS unstable, zfs v2.1.3 => v2.1.4 (kernel 5.15 => 5.17) and I also changed blocksize of some of the datasets to 1M.
After a minute or two into user login the system hanged partially (could switch VTs, some terminals worked but couldn't login or start new processes) and after another two-ish minutes I got kernel panic
zfs: accessing past end of object
.
Just hit the same issue using zfs git from a few days ago and recent arch (happens on 6.1 lts and 6.2 kernels) - it only started after I changed the recordsize
and compression
properties on my /home
dataset - more details here: https://github.com/openzfs/zfs/issues/14117#issuecomment-1493411461
Also hitting this issue after changing recordsize
from 16k
to 1M
. We've set it back to 16k
, but still see the issue randomly on some servers.
Is there any known fix for this which doesn't involve migrating the dataset?
zfs-2.1.6-0york1~22.04
zfs-kmod-2.1.6-0york1~22.04
Kernel 5.15.0-56-generic
[10779805.585236] PANIC: zfs: accessing past end of object e84/1b6a3 (size=1038848 access=1038828+4096)
[10779805.587631] Showing stack for process 4139333
[10779805.587634] CPU: 11 PID: 4139333 Comm: **** Kdump: loaded Tainted: P OE 5.15.0-56-generic #62-Ubuntu
[10779805.587639] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.13.0 05/14/2021
[10779805.587641] Call Trace:
[10779805.587644] <TASK>
[10779805.587651] show_stack+0x52/0x5c
[10779805.587660] dump_stack_lvl+0x4a/0x63
[10779805.587670] dump_stack+0x10/0x16
[10779805.587676] spl_dumpstack+0x29/0x2f [spl]
[10779805.587692] vcmn_err.cold+0x60/0x78 [spl]
[10779805.587711] ? arc_change_state.constprop.0+0x412/0x580 [zfs]
[10779805.587822] ? aggsum_add+0x1a3/0x1c0 [zfs]
[10779805.587901] ? arc_access.isra.0+0xc4/0x320 [zfs]
[10779805.587975] ? dbuf_rele_and_unlock+0x13f/0x800 [zfs]
[10779805.588058] ? arc_buf_access+0x2a7/0x2c0 [zfs]
[10779805.588140] zfs_panic_recover+0x6d/0x90 [zfs]
[10779805.588266] ? dbuf_read+0x37c/0x5c0 [zfs]
[10779805.588352] dmu_buf_hold_array_by_dnode+0x221/0x600 [zfs]
[10779805.588441] ? dnode_hold_impl+0x514/0xf00 [zfs]
[10779805.588536] dmu_write_uio_dnode+0x4c/0x190 [zfs]
[10779805.588625] ? dnode_rele+0x3d/0x50 [zfs]
[10779805.588721] dmu_write_uio_dbuf+0x4f/0x70 [zfs]
[10779805.588809] zfs_write+0x4b0/0xdd0 [zfs]
[10779805.588980] zpl_iter_write+0xf5/0x170 [zfs]
[10779805.589117] new_sync_write+0x114/0x1a0
[10779805.589132] vfs_write+0x1d5/0x270
[10779805.589138] ksys_write+0x67/0xf0
[10779805.589145] __x64_sys_write+0x19/0x20
[10779805.589149] do_syscall_64+0x5c/0xc0
[10779805.589155] ? do_syscall_64+0x69/0xc0
[10779805.589159] ? do_user_addr_fault+0x1e7/0x670
[10779805.589167] ? exit_to_user_mode_prepare+0x37/0xb0
[10779805.589174] ? irqentry_exit_to_user_mode+0x9/0x20
[10779805.589179] ? irqentry_exit+0x1d/0x30
[10779805.589184] ? exc_page_fault+0x89/0x170
[10779805.589191] entry_SYSCALL_64_after_hwframe+0x61/0xcb
[10779805.589197] RIP: 0033:0x7f71c726332f
[10779805.589202] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48
[10779805.589205] RSP: 002b:00007ffd468e4720 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[10779805.589209] RAX: ffffffffffffffda RBX: 000000000000011c RCX: 00007f71c726332f
[10779805.589211] RDX: 0000000000001000 RSI: 00005617938b1b08 RDI: 000000000000011c
[10779805.589213] RBP: 0000000000001000 R08: 0000000000000000 R09: 0000561794c17a20
[10779805.589215] R10: 0000561793618414 R11: 0000000000000293 R12: 00005617938b1b08
[10779805.589217] R13: 00000000000fd9ec R14: 00005617938b1b08 R15: 0000561793618660
[10779805.589232] </TASK>
Also hitting this issue after changing
recordsize
from16k
to1M
. We've set it back to16k
, but still see the issue randomly on some servers.
The recordsize property affects only new files. You may have a few files remaining which use different recordsize or compression setting.
Look like I've been hitting this issue as well.
System is Ubuntu 22.04:
$ zfs version
zfs-2.1.5-1ubuntu6~22.04.1
zfs-kmod-2.1.9-2ubuntu1.1
On our box we have two users. One user that systematically produces the panic (it happens when thunderbird downloads emails and start writing to one of the inbox files). In my case recordsize=1M
and compression=off
(somewhat surprizingly... I was sure I had set it as lz4
which is pretty much a default). the other user ostensibly has about as much (if not more) churn on writing files but has never (as far as I could tell) triggered such a panic.
I can't recall when each of those record sizes were set. It looks like this is "common enough", I wonder if it's easy to trigger this "from scratch", or are these files triggering the panic maybe created by a specific older version of openzfs and have some crap metadata somewhere.
I can confirm I had this issue, strangely enough also particularly with thunderbird, on datasets with recordsize 1M. I reverted back to 128K and things are running fine.
I have been using zstd for very long, and don't believe this is related. Encryption could be related?
PROPERTY VALUE encryption aes-256-gcm
Kernel 6.5.8-arch1-1-surface #1 SMP PREEMPT_DYNAMIC Sun, 22 Oct 2023 13:57:26 +0000 x86_64 GNU/Linux on Arch Linux (using zfs-dkms).
zfs-2.2.0-1 zfs-kmod-2.2.0-1
So I can trigger this at will, which minor changes, if indeed it is the same problem.
Unrelated to everything, we are looking at setting zp->z_blksz
to ashift (4096) instead of 512. (Ask me why, but it's Windows specific, but we are making the same change on macOS)
So we make an equivalent change to zfs_znode.c
:
@@ -564,7 +563,13 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
zp->z_atime_dirty = 0;
zp->z_mapcnt = 0;
zp->z_id = db->db_object;
- zp->z_blksz = blksz;
+ zp->z_blksz = 4096;
Don't forget there is code to scale zp->z_blksz
up here:
https://github.com/openzfs/zfs/blob/master/module/zfs/zfs_vnops.c#L540-L558
So I am just skipping the "might be higher" and starting from a position of higher.
Then I run the tests/functional/removal
tests run from common.run
. Then after completing
Test: tests/zfs-tests/tests/functional/removal/removal_check_space (run as root) [00:02] [PASS]
and it starts removal_condense_export
we get fireworks.
2023-11-10 15:44:36.798 Df kernel[0:317f] (zfs) <zfs`__dprintf (zfs_debug.c:249)>
vdev_indirect.c:533:spa_condense_indirect_complete_sync(): finished condense of vdev 0 in txg 196: new mapping object 295 has 29 entries (was 154 entries)
2023-11-10 15:44:41.785 Df kernel[0:2af1] (zfs) <zfs`vcmn_err (spl-err.c:54)>
PANIC: zfs: accessing past end of object 282/106 (size=2560 access=0+3574)
If I restore zp->z_blksz
to 512, it does not panic. This is running on a M2 mac with 4096 sector SSDs.
Since I can trigger it at will, let me know if there is anything you want to know. Sadly, I can not kernel live debug it until Apple releases a KDK for my specific OS version. But I sure can printf.
Also, in the WR_INDIRECT
case, if zp->z_blksz
is 0, then blocksize
is 0, and len
gets picked as 0. We end up in while (resid)
forever, taking all CPUs. This is straight forward bug which we can do a separate PR.
Describe the problem you're observing
Describe how to reproduce the problem
If I only knew... the process is run by a user in a container, every night at approx 3:00 am :( Still investigating. Ideas for bpftrace commands/etc. which could catch more information are welcome :)