openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.57k stars 1.74k forks source link

PANIC: zfs: accessing past end of object #12078

Open snajpa opened 3 years ago

snajpa commented 3 years ago
Type Version/Name
Distribution Name vpsAdminOS/NixOS
Distribution Version master
Linux Kernel 5.10.25+ (running .37 currently)
Architecture x86_64
ZFS Version master HEAD as of 18.05.2021
SPL Version

Describe the problem you're observing

[ 8108.223833] PANIC: zfs: accessing past end of object 1c5a0/13707b (size=149504 access=149399+8191)
[ 8108.224008] Showing stack for process 429599
[ 8108.224023] CPU: 23 PID: 14578 Comm: statistics_coll Tainted: G           OE     5.10.37 #1-vpsAdminOS
[ 8108.224030] Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.8.0 06/26/2019
[ 8108.224037] In memory cgroup /osctl/pool.tank/group.default/user.1177/ct.9795/user-owned/lxc.payload.9795
[ 8108.224058] Call Trace:
[ 8108.224084]  dump_stack+0x6d/0x88
[ 8108.224104]  vcmn_err.cold+0x58/0x80 [spl]
[ 8108.224122]  ? _cond_resched+0x15/0x30
[ 8108.224132]  ? _cond_resched+0x15/0x30
[ 8108.224143]  ? mutex_lock+0xe/0x30
[ 8108.224153]  ? _cond_resched+0x15/0x30
[ 8108.224165]  ? mutex_lock+0xe/0x30
[ 8108.224245]  ? aggsum_add+0x171/0x190 [zfs]
[ 8108.224302]  ? _cond_resched+0x15/0x30
[ 8108.224313]  ? mutex_lock+0xe/0x30
[ 8108.224323]  ? _cond_resched+0x15/0x30
[ 8108.224332]  ? mutex_lock+0xe/0x30
[ 8108.224394]  ? dbuf_find+0x1af/0x1c0 [zfs]
[ 8108.224516]  ? dbuf_rele_and_unlock+0x134/0x660 [zfs]
[ 8108.224633]  ? arc_buf_access+0x104/0x250 [zfs]
[ 8108.224771]  zfs_panic_recover+0x6f/0x90 [zfs]
[ 8108.224909]  dmu_buf_hold_array_by_dnode+0x219/0x520 [zfs]
[ 8108.225006]  ? dnode_hold_impl+0x348/0xc20 [zfs]
[ 8108.225103]  dmu_write_uio_dnode+0x4c/0x140 [zfs]
[ 8108.225198]  dmu_write_uio_dbuf+0x4a/0x70 [zfs]
[ 8108.225298]  zfs_write+0x48c/0xc70 [zfs]
[ 8108.225367]  ? aa_put_buffer.part.0+0x15/0x50
[ 8108.225414]  zpl_iter_write+0x105/0x190 [zfs]
[ 8108.225471]  do_iter_readv_writev+0x157/0x1b0
[ 8108.225479]  do_iter_write+0x7d/0x1b0
[ 8108.225486]  vfs_writev+0x83/0x140
[ 8108.225497]  do_writev+0x6b/0x110
[ 8108.225508]  do_syscall_64+0x33/0x40
[ 8108.225517]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Describe how to reproduce the problem

If I only knew... the process is run by a user in a container, every night at approx 3:00 am :( Still investigating. Ideas for bpftrace commands/etc. which could catch more information are welcome :)

jm650 commented 3 years ago

One of my backup machines is now in a panic-reboot cycle with this error. joyent_20200813T030805Z Supermicro Super Server H11SSL-C AMD EPYC 7401P 24-Core Processor CPU LSI SAS3008 PCI-Express Fusion-MPT SAS-3 Chelsio Communications Inc T580-SO-CR Unified Wire Ethernet Controller Sanima/Newisys/Viking Storage 106 disk JBOD

In single user mode ZFS checks out, passes scrubs etc. In multi-user mode it reports I/O errors, pool corruption, bad disks, and other nasties.

3 Other systems with the same config do no exhibit these errors yet. One other system is connected to the same JBOD but different zpools.

--- panic message --- backup004 ttya login: panic[cpu5]/thread=fffffe8579a29400: zfs: accessing past end of object 305/59ca (size=382976 access=382955+29)

fffffe00bd33e900 genunix:vcmn_err+42 () fffffe00bd33e970 zfs:zfs_panic_recover+65 () fffffe00bd33ea20 zfs:dmu_buf_hold_array_by_dnode+2b2 () fffffe00bd33eac0 zfs:dmu_write_uio_dnode+52 () fffffe00bd33eb10 zfs:dmu_write_uio_dbuf+5d () fffffe00bd33ed10 zfs:zfs_write+9ba () fffffe00bd33edb0 genunix:fop_write+111 () fffffe00bd33ee80 genunix:write+2c6 () fffffe00bd33eeb0 genunix:write32+15 () fffffe00bd33ef10 unix:brand_sys_sysenter+1dc ()

--- cut here ---

--- core status --- [root@00-07-43-52-c1-90 /var/crash/volatile]# mdb -k 28 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci ufs ip hook neti sockfs arp usba uhci smbios mm sd fctl stmf_sbd stmf zfs lofs idm mpt_sas sata crypto random cpc fcp logindmux ptm kvm sppp nsmb smbsrv nfs ipc ]

::status debugging crash dump vmcore.28 (64-bit) from backup004 operating system: 5.11 joyent_20200813T030805Z (i86pc) git branch: release-20200813 git rev: 09f210323354e07890b312da9b4f5e33ae6b6df5 image uuid: (not set) panic message: zfs: accessing past end of object 684/59ca (size=518656 access=518656+5) dump content: kernel pages only

::system set ibft_noprobe=0x1 [0t1] set noexec_user_stack=0x1 [0t1] set noexec_user_stack_log=0x1 [0t1] set rlim_fd_cur=0x10000 [0t65536] set idle_cpu_no_deep_c=0x1 [0t1] set ip:ip_squeue_fanout=0x1 [0t1] set pcplusmp:apic_panic_on_nmi=0x1 [0t1] set apix:apic_panic_on_nmi=0x1 [0t1] set dump_plat_mincpu=0x0 [0t0] set dump_bzip2_level=0x1 [0t1] set pcplusmp:apic_timer_preferred_mode=0x0 [0t0] set apix:apic_timer_preferred_mode=0x0 [0t0] set dump_metrics_on=0x1 [0t1] set sata:sata_auto_online=0x1 [0t1] set sd:sd_io_time=0xa [0t10] set hires_tick=0x1 [0t1] set zfs:zfs_multihost_interval=0xffff [0t65535] set zfs:zfs_resilver_delay=0x0 [0t0] set zfs:zfs_resilver_min_time_ms=0x0 [0t0] set zfs:zfs_scrub_delay=0x0 [0t0] set zfs:zfs_arc_max=0xc80000000 [0t53687091200] set ddi_msix_alloc_limit=0x8 [0t8] set pcplusmp:apic_intr_policy=0x1 [0t1] set ip_squeue_soft_ring=0x1 [0t1]

--- cut here ---

scotte commented 2 years ago

I'm seeing the same issue on Manjaro with 5.14 kernel and ZFS 2.1.1:

[Wed Nov 10 09:09:34 2021] PANIC: zfs: accessing past end of object 304/142547 (size=70656 access=70284+4096)
[Wed Nov 10 09:09:34 2021] Showing stack for process 112516
[Wed Nov 10 09:09:34 2021] CPU: 6 PID: 112516 Comm: ThreadPoolForeg Tainted: P           OE     5.14.10-1-MANJARO #1
[Wed Nov 10 09:09:34 2021] Hardware name: Dell Inc. Precision 3530/04MPXC, BIOS 1.10.1 09/04/2019
[Wed Nov 10 09:09:34 2021] Call Trace:
[Wed Nov 10 09:09:34 2021]  dump_stack_lvl+0x46/0x5a
[Wed Nov 10 09:09:34 2021]  vcmn_err.cold+0x50/0x68 [spl]
[Wed Nov 10 09:09:34 2021]  zfs_panic_recover+0x75/0x90 [zfs]
[Wed Nov 10 09:09:34 2021]  dmu_buf_hold_array_by_dnode+0x219/0x5f0 [zfs]
[Wed Nov 10 09:09:34 2021]  dmu_write_uio_dnode+0x5e/0x1a0 [zfs]
[Wed Nov 10 09:09:34 2021]  ? dnode_set_blksz+0x141/0x310 [zfs]
[Wed Nov 10 09:09:34 2021]  ? dnode_rele_and_unlock+0x59/0xf0 [zfs]
[Wed Nov 10 09:09:34 2021]  dmu_write_uio_dbuf+0x4a/0x70 [zfs]
[Wed Nov 10 09:09:34 2021]  zfs_write+0x4e8/0xcc0 [zfs]
[Wed Nov 10 09:09:34 2021]  zpl_iter_write+0x10c/0x190 [zfs]
[Wed Nov 10 09:09:34 2021]  new_sync_write+0x159/0x1f0
[Wed Nov 10 09:09:34 2021]  vfs_write+0x1db/0x270
[Wed Nov 10 09:09:34 2021]  __x64_sys_pwrite64+0x8c/0xc0
[Wed Nov 10 09:09:34 2021]  do_syscall_64+0x3b/0x90
[Wed Nov 10 09:09:34 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[Wed Nov 10 09:09:34 2021] RIP: 0033:0x7f20d82060ef
[Wed Nov 10 09:09:34 2021] Code: 08 89 3c 24 48 89 4c 24 18 e8 7d f3 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 cd f3 ff ff 48 8b
[Wed Nov 10 09:09:34 2021] RSP: 002b:00007f20cf81f660 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[Wed Nov 10 09:09:34 2021] RAX: ffffffffffffffda RBX: 000026fa04437c00 RCX: 00007f20d82060ef
[Wed Nov 10 09:09:34 2021] RDX: 0000000000001000 RSI: 000026fa04437c00 RDI: 00000000000000c4
[Wed Nov 10 09:09:34 2021] RBP: 00007f20cf81f6d0 R08: 0000000000000000 R09: 0000563d7f13f590
[Wed Nov 10 09:09:34 2021] R10: 000000000001128c R11: 0000000000000293 R12: 0000000000001000
[Wed Nov 10 09:09:34 2021] R13: 000026fa04437c00 R14: 00000000000000c4 R15: 000000000001128c

Kernel Linux 5.14.10-1-MANJARO #1 SMP PREEMPT Thu Oct 7 06:43:34 UTC 2021 x86_64 GNU/Linux

Kernel and ZFS packages:

local/linux514-zfs 2.1.1-5 (linux514-extramodules)
    Kernel modules for the Zettabyte File System.
local/zfs-utils 2.1.1-1 (archzfs-linux)
    Kernel module support files for the Zettabyte File System.

The same thing happens with linux 5.13 (ZFS 2.1.1-1).

Dataset properties:

NAME        PROPERTY              VALUE                  SOURCE
zmain/home  type                  filesystem             -
zmain/home  creation              Wed Apr 21 10:50 2021  -
zmain/home  used                  97.4G                  -
zmain/home  available             235G                   -
zmain/home  referenced            52.4G                  -
zmain/home  compressratio         1.43x                  -
zmain/home  mounted               yes                    -
zmain/home  quota                 none                   default
zmain/home  reservation           none                   default
zmain/home  recordsize            128K                   inherited from zmain
zmain/home  mountpoint            /home                  local
zmain/home  sharenfs              off                    default
zmain/home  checksum              on                     default
zmain/home  compression           zstd                   inherited from zmain
zmain/home  atime                 off                    inherited from zmain
zmain/home  devices               on                     default
zmain/home  exec                  on                     default
zmain/home  setuid                on                     default
zmain/home  readonly              off                    default
zmain/home  zoned                 off                    default
zmain/home  snapdir               hidden                 default
zmain/home  aclmode               discard                default
zmain/home  aclinherit            restricted             default
zmain/home  createtxg             240                    -
zmain/home  canmount              on                     default
zmain/home  xattr                 sa                     inherited from zmain
zmain/home  copies                1                      default
zmain/home  version               5                      -
zmain/home  utf8only              off                    -
zmain/home  normalization         none                   -
zmain/home  casesensitivity       sensitive              -
zmain/home  vscan                 off                    default
zmain/home  nbmand                off                    default
zmain/home  sharesmb              off                    default
zmain/home  refquota              none                   default
zmain/home  refreservation        none                   default
zmain/home  guid                  4366126344549602389    -
zmain/home  primarycache          all                    default
zmain/home  secondarycache        all                    default
zmain/home  usedbysnapshots       45.0G                  -
zmain/home  usedbydataset         52.4G                  -
zmain/home  usedbychildren        0B                     -
zmain/home  usedbyrefreservation  0B                     -
zmain/home  logbias               latency                default
zmain/home  objsetid              772                    -
zmain/home  dedup                 off                    default
zmain/home  mlslabel              none                   default
zmain/home  sync                  standard               default
zmain/home  dnodesize             legacy                 default
zmain/home  refcompressratio      1.32x                  -
zmain/home  written               13.8M                  -
zmain/home  logicalused           122G                   -
zmain/home  logicalreferenced     60.9G                  -
zmain/home  volmode               default                default
zmain/home  filesystem_limit      none                   default
zmain/home  snapshot_limit        none                   default
zmain/home  filesystem_count      none                   default
zmain/home  snapshot_count        none                   default
zmain/home  snapdev               hidden                 default
zmain/home  acltype               posix                  inherited from zmain
zmain/home  context               none                   default
zmain/home  fscontext             none                   default
zmain/home  defcontext            none                   default
zmain/home  rootcontext           none                   default
zmain/home  relatime              off                    default
zmain/home  redundant_metadata    all                    default
zmain/home  overlay               on                     default
zmain/home  encryption            aes-256-gcm            -
zmain/home  keylocation           none                   default
zmain/home  keyformat             passphrase             -
zmain/home  pbkdf2iters           350000                 -
zmain/home  encryptionroot        zmain                  -
zmain/home  keystatus             available              -
zmain/home  special_small_blocks  0                      default

Pool properties:

NAME   PROPERTY                       VALUE                          SOURCE
zmain  size                           398G                           -
zmain  capacity                       37%                            -
zmain  altroot                        -                              default
zmain  health                         ONLINE                         -
zmain  guid                           3347456179796344331            -
zmain  version                        -                              default
zmain  bootfs                         -                              default
zmain  delegation                     on                             default
zmain  autoreplace                    off                            default
zmain  cachefile                      -                              default
zmain  failmode                       wait                           default
zmain  listsnapshots                  off                            default
zmain  autoexpand                     off                            default
zmain  dedupratio                     1.00x                          -
zmain  free                           247G                           -
zmain  allocated                      151G                           -
zmain  readonly                       off                            -
zmain  ashift                         13                             local
zmain  comment                        -                              default
zmain  expandsize                     -                              -
zmain  freeing                        0                              -
zmain  fragmentation                  22%                            -
zmain  leaked                         0                              -
zmain  multihost                      off                            default
zmain  checkpoint                     -                              -
zmain  load_guid                      800961323049942777             -
zmain  autotrim                       off                            default
zmain  compatibility                  off                            default
zmain  feature@async_destroy          enabled                        local
zmain  feature@empty_bpobj            active                         local
zmain  feature@lz4_compress           active                         local
zmain  feature@multi_vdev_crash_dump  enabled                        local
zmain  feature@spacemap_histogram     active                         local
zmain  feature@enabled_txg            active                         local
zmain  feature@hole_birth             active                         local
zmain  feature@extensible_dataset     active                         local
zmain  feature@embedded_data          active                         local
zmain  feature@bookmarks              enabled                        local
zmain  feature@filesystem_limits      enabled                        local
zmain  feature@large_blocks           active                         local
zmain  feature@large_dnode            enabled                        local
zmain  feature@sha512                 enabled                        local
zmain  feature@skein                  enabled                        local
zmain  feature@edonr                  enabled                        local
zmain  feature@userobj_accounting     active                         local
zmain  feature@encryption             active                         local
zmain  feature@project_quota          active                         local
zmain  feature@device_removal         enabled                        local
zmain  feature@obsolete_counts        enabled                        local
zmain  feature@zpool_checkpoint       enabled                        local
zmain  feature@spacemap_v2            active                         local
zmain  feature@allocation_classes     enabled                        local
zmain  feature@resilver_defer         enabled                        local
zmain  feature@bookmark_v2            enabled                        local
zmain  feature@redaction_bookmarks    enabled                        local
zmain  feature@redacted_datasets      enabled                        local
zmain  feature@bookmark_written       enabled                        local
zmain  feature@log_spacemap           active                         local
zmain  feature@livelist               active                         local
zmain  feature@device_rebuild         enabled                        local
zmain  feature@zstd_compress          active                         local
zmain  feature@draid                  enabled                        local

This started occurring about 2 weeks ago with no kernel or ZFS package updates. The only change I have made recently was to switch compression from lz4 to zstd, but that was some weeks prior to the issue starting, so it's a very weak signal.

matthiasg commented 2 years ago

@jm650 I see the same stack trace on SmartOS and wrote about that in the newsgroup https://smartos.topicbox.com/groups/smartos-discuss/Td34fab84c5051dbd/panic-zfs-accessing-past-end-of-object

image

it seems to be related to some write operation on shell logon

jonathonf commented 2 years ago

I've also just seen a similar trace on 5.15 with 2.1.2:

kernel: PANIC: zfs: accessing past end of object 23a/2fbf91 (size=291328 access=291223+16384)
kernel: Showing stack for process 162504
kernel: CPU: 12 PID: 162504 Comm: thunderbird Tainted: P        W  OE     5.15.12-lqx1-1-lqx #1
kernel: Hardware name: LENOVO 82JY/LNVNB161216, BIOS GKCN49WW 11/08/2021
kernel: Call Trace:
kernel:  <TASK>
kernel:  dump_stack_lvl+0x46/0x62
kernel:  vcmn_err.cold+0x50/0x68 [spl]
kernel:  zfs_panic_recover+0x75/0xa0 [zfs]
kernel:  dmu_buf_hold_array_by_dnode+0x227/0x610 [zfs]
kernel:  dmu_write_uio_dnode+0x5e/0x1b0 [zfs]
kernel:  ? dnode_set_blksz+0x141/0x320 [zfs]
kernel:  ? dnode_rele_and_unlock+0x59/0xf0 [zfs]
kernel:  dmu_write_uio_dbuf+0x4a/0x70 [zfs]
kernel:  zfs_write+0x4fc/0xcd0 [zfs]
kernel:  zpl_iter_write+0x10c/0x190 [zfs]
kernel:  vfs_write+0x338/0x470
kernel:  ksys_write+0x67/0xf0
kernel:  do_syscall_64+0x3b/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f8d7190393f
kernel: Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 09 56 f9 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c 56 f9 ff 48
kernel: RSP: 002b:00007fffcb9d18f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
kernel: RAX: ffffffffffffffda RBX: 0000000000004000 RCX: 00007f8d7190393f
kernel: RDX: 0000000000004000 RSI: 00007f8d343ee000 RDI: 00000000000000b3
kernel: RBP: 00007f8d343ee000 R08: 0000000000000000 R09: 00007fffcb9d1ab4
kernel: R10: 00007fffcb9d1cba R11: 0000000000000293 R12: 0000000000004000
kernel: R13: 00007f8d3acf15c0 R14: 0000000000004000 R15: 00007f8d719d5700
kernel:  </TASK>

I'll see if I can replicate it, but I don't think it's going to be easy...

matthiasg commented 2 years ago

Invested some time to look into this again today.

I have only remote access to the server via IPMI but updated one of the USB sticks (which this server still uses) to the latest platform image to ensure it is not a resolved issue. (Mounting a new image via ISO IPMI mounting and copying over from ISO to USB). Didn't change anything.

image

I then booted into the root/root mode and looked at the dataset affected:

zdb config -d zones|grep 'ID 1085' (Note to people with similar errors: 21085 because 0x43d is 1085 decimal)

which showed it to be part of a filesystem/vm which i didn't need (looking it up in zones/config). Assuming it was only accessed during startup due to it being listed in /etc/zones/index (zones/config), I removed it from the index and rebooted.

The system booted again now (Note to others on SmartOS. When changing mountpoints in root/root mode, don't forget to reset them before rebooting)

Two outstanding things now:

1) Would still like to know how this error happened to begin with, but even more importantly whether there is something I can do help debug this issue. I can upload the core dumps if someone is interested.

2) Why would it crash on login predictably ? This is more likely a SmartOS question though and not directly related to ZFS. I would have assume the vmadm service or rather the underlying zones infrastructure to boot up irrespective of the login. Potentially this would have happened anyway when waiting long enough and ensuring no SSH login attempts, but still curious as to where the connection is.

I obviously have a large number of dumps in /cores I could provide (core.metadata, core.vmadm, and core.vmadmd) to anybody interestet, as well as the theoretical ability (with some planing on my end) to put the machine back into the index and re-cause the issue.

scotte commented 2 years ago

I never followed up on my previous report. After switching recordsize back to 128k and compression back to lz4, I took a snapshot then did a send|receive into a new dataset on the same pool - fortunately I had enough space to do so. That was nearly 4 months ago, and the system has been perfectly stable since. This is on my daily-driver development laptop which was hitting the panic multiple times per day.

As far as I can tell, no data was ever harmed or corrupted. It seems that the data at-rest was fine, and the issue only surfaced when trying to write.

32nmud commented 2 years ago

I have been struggling with what seems to be the same issue. Much like @scotte, I changed my recordsize to something a bit larger and switched the compression from 'lz4' to 'zstd' a few days before the panic appeared.

Since this has all started, it seems like just triggering several numerous reads (possibly some writes mixed in as the panic usually appears when launching applications) will cause it to panic and lock everything up.

Oddly enough, I can still use the machine if I switch TTY and log in as root. This includes being able to access data on the pool. As root, it appears the only thing that will not execute successfully is a clean shutdown/restart.

This is occurring on every kernel I've tried, and it does not make a difference if I use the dkms modules. I am running zfs 2.1.2.

Here is my stacktrace:

Feb 23 10:52:35.776839 <hostname> kernel: PANIC: zfs: accessing past end of object 36/ba611 (size=284160 access=283692+4096)
Feb 23 10:52:35.776993 <hostname> kernel: Showing stack for process 3868362
Feb 23 10:52:35.779653 <hostname> kernel: CPU: 7 PID: 3868362 Comm: ThreadPoolForeg Tainted: P           OE     5.16.10-arch1-1 #1 481a3e145f0d7fd554ab88c4fa876a7ddc0795df
Feb 23 10:52:35.779691 <hostname> kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B17/MPG Z390 GAMING EDGE AC (MS-7B17), BIOS A.B0 10/12/2020
Feb 23 10:52:35.779708 <hostname> kernel: Call Trace:
Feb 23 10:52:35.779723 <hostname> kernel:  <TASK>
Feb 23 10:52:35.779736 <hostname> kernel:  dump_stack_lvl+0x48/0x5e
Feb 23 10:52:35.779750 <hostname> kernel:  vcmn_err.cold+0x50/0x68 [spl d207c65daab89e4aba1563f3c9772829c82d0963]
Feb 23 10:52:35.779766 <hostname> kernel:  zfs_panic_recover+0x75/0x90 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779780 <hostname> kernel:  dmu_buf_hold_array_by_dnode+0x219/0x5f0 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779793 <hostname> kernel:  dmu_write_uio_dnode+0x5e/0x1a0 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779805 <hostname> kernel:  ? dnode_set_blksz+0x141/0x310 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779817 <hostname> kernel:  ? dnode_rele_and_unlock+0x59/0xf0 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779828 <hostname> kernel:  dmu_write_uio_dbuf+0x4a/0x70 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779852 <hostname> kernel:  zfs_write+0x4e8/0xcc0 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779870 <hostname> kernel:  zpl_iter_write+0x114/0x190 [zfs 570eebcdcdea05165128adc51f012c43819b32f8]
Feb 23 10:52:35.779886 <hostname> kernel:  new_sync_write+0x159/0x1f0
Feb 23 10:52:35.779899 <hostname> kernel:  vfs_write+0x1eb/0x280
Feb 23 10:52:35.779911 <hostname> kernel:  __x64_sys_pwrite64+0x8c/0xc0
Feb 23 10:52:35.779923 <hostname> kernel:  do_syscall_64+0x59/0x80
Feb 23 10:52:35.779935 <hostname> kernel:  ? vfs_write+0x174/0x280
Feb 23 10:52:35.779951 <hostname> kernel:  ? vfs_write+0x174/0x280
Feb 23 10:52:35.779964 <hostname> kernel:  ? syscall_exit_to_user_mode+0x23/0x40
Feb 23 10:52:35.779977 <hostname> kernel:  ? do_syscall_64+0x69/0x80
Feb 23 10:52:35.779989 <hostname> kernel:  ? exc_page_fault+0x72/0x170
Feb 23 10:52:35.780003 <hostname> kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Feb 23 10:52:35.780017 <hostname> kernel: RIP: 0033:0x7f757047c23f
Feb 23 10:52:35.780031 <hostname> kernel: Code: 08 89 3c 24 48 89 4c 24 18 e8 bd 98 f8 ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 04 24 e8 fd 98 f8 ff 48 8b
Feb 23 10:52:35.780044 <hostname> kernel: RSP: 002b:00007f756d0b7b20 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Feb 23 10:52:35.780058 <hostname> kernel: RAX: ffffffffffffffda RBX: 0000107c0315a000 RCX: 00007f757047c23f
Feb 23 10:52:35.780264 <hostname> kernel: RDX: 0000000000001000 RSI: 0000107c0315a000 RDI: 00000000000000a6
Feb 23 10:52:35.780282 <hostname> kernel: RBP: 00007f756d0b7b90 R08: 0000000000000000 R09: 000055b82f179b40
Feb 23 10:52:35.780314 <hostname> kernel: R10: 000000000004542c R11: 0000000000000293 R12: 0000000000001000
Feb 23 10:52:35.780329 <hostname> kernel: R13: 0000107c0315a000 R14: 00000000000000a6 R15: 000000000004542c
Feb 23 10:52:35.780344 <hostname> kernel:  </TASK>
jficz commented 2 years ago

Similar (maybe the same) issue for me, too: NixOS unstable, zfs v2.1.3 => v2.1.4 (kernel 5.15 => 5.17) and I also changed blocksize of some of the datasets to 1M.

After a minute or two into user login the system hanged partially (could switch VTs, some terminals worked but couldn't login or start new processes) and after another two-ish minutes I got kernel panic zfs: accessing past end of object.

Can't identify the exact moment this started to happen but it was neither immediately after the upgrade, nor immediately after the block size change. Pool created with zstd compression, single NVME device (partition), ashift=12, encrypted (single encryptionroot).

send | recv to the same pool helped although it was slightly complicated because of encryption - couldn't use -R because it changed encryptionroot so I had to send | recv each dataset one by one and manually set the parameters. Funnily enough, using -Rw was much slower than sending unencrypted datasets. As in 1-10 MB/s for raw and 100-500 MB/s for unencrypted.

mtippmann commented 1 year ago

Similar (maybe the same) issue for me, too: NixOS unstable, zfs v2.1.3 => v2.1.4 (kernel 5.15 => 5.17) and I also changed blocksize of some of the datasets to 1M.

After a minute or two into user login the system hanged partially (could switch VTs, some terminals worked but couldn't login or start new processes) and after another two-ish minutes I got kernel panic zfs: accessing past end of object.

Just hit the same issue using zfs git from a few days ago and recent arch (happens on 6.1 lts and 6.2 kernels) - it only started after I changed the recordsize and compression properties on my /home dataset - more details here: https://github.com/openzfs/zfs/issues/14117#issuecomment-1493411461

Rid commented 1 year ago

Also hitting this issue after changing recordsize from 16k to 1M. We've set it back to 16k, but still see the issue randomly on some servers.

Is there any known fix for this which doesn't involve migrating the dataset?

zfs-2.1.6-0york1~22.04
zfs-kmod-2.1.6-0york1~22.04
Kernel 5.15.0-56-generic
[10779805.585236] PANIC: zfs: accessing past end of object e84/1b6a3 (size=1038848 access=1038828+4096)
[10779805.587631] Showing stack for process 4139333
[10779805.587634] CPU: 11 PID: 4139333 Comm: **** Kdump: loaded Tainted: P           OE     5.15.0-56-generic #62-Ubuntu
[10779805.587639] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS 2.13.0 05/14/2021
[10779805.587641] Call Trace:
[10779805.587644]  <TASK>
[10779805.587651]  show_stack+0x52/0x5c
[10779805.587660]  dump_stack_lvl+0x4a/0x63
[10779805.587670]  dump_stack+0x10/0x16
[10779805.587676]  spl_dumpstack+0x29/0x2f [spl]
[10779805.587692]  vcmn_err.cold+0x60/0x78 [spl]
[10779805.587711]  ? arc_change_state.constprop.0+0x412/0x580 [zfs]
[10779805.587822]  ? aggsum_add+0x1a3/0x1c0 [zfs]
[10779805.587901]  ? arc_access.isra.0+0xc4/0x320 [zfs]
[10779805.587975]  ? dbuf_rele_and_unlock+0x13f/0x800 [zfs]
[10779805.588058]  ? arc_buf_access+0x2a7/0x2c0 [zfs]
[10779805.588140]  zfs_panic_recover+0x6d/0x90 [zfs]
[10779805.588266]  ? dbuf_read+0x37c/0x5c0 [zfs]
[10779805.588352]  dmu_buf_hold_array_by_dnode+0x221/0x600 [zfs]
[10779805.588441]  ? dnode_hold_impl+0x514/0xf00 [zfs]
[10779805.588536]  dmu_write_uio_dnode+0x4c/0x190 [zfs]
[10779805.588625]  ? dnode_rele+0x3d/0x50 [zfs]
[10779805.588721]  dmu_write_uio_dbuf+0x4f/0x70 [zfs]
[10779805.588809]  zfs_write+0x4b0/0xdd0 [zfs]
[10779805.588980]  zpl_iter_write+0xf5/0x170 [zfs]
[10779805.589117]  new_sync_write+0x114/0x1a0
[10779805.589132]  vfs_write+0x1d5/0x270
[10779805.589138]  ksys_write+0x67/0xf0
[10779805.589145]  __x64_sys_write+0x19/0x20
[10779805.589149]  do_syscall_64+0x5c/0xc0
[10779805.589155]  ? do_syscall_64+0x69/0xc0
[10779805.589159]  ? do_user_addr_fault+0x1e7/0x670
[10779805.589167]  ? exit_to_user_mode_prepare+0x37/0xb0
[10779805.589174]  ? irqentry_exit_to_user_mode+0x9/0x20
[10779805.589179]  ? irqentry_exit+0x1d/0x30
[10779805.589184]  ? exc_page_fault+0x89/0x170
[10779805.589191]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[10779805.589197] RIP: 0033:0x7f71c726332f
[10779805.589202] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2d 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48
[10779805.589205] RSP: 002b:00007ffd468e4720 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[10779805.589209] RAX: ffffffffffffffda RBX: 000000000000011c RCX: 00007f71c726332f
[10779805.589211] RDX: 0000000000001000 RSI: 00005617938b1b08 RDI: 000000000000011c
[10779805.589213] RBP: 0000000000001000 R08: 0000000000000000 R09: 0000561794c17a20
[10779805.589215] R10: 0000561793618414 R11: 0000000000000293 R12: 00005617938b1b08
[10779805.589217] R13: 00000000000fd9ec R14: 00005617938b1b08 R15: 0000561793618660
[10779805.589232]  </TASK>
IvanVolosyuk commented 1 year ago

Also hitting this issue after changing recordsize from 16k to 1M. We've set it back to 16k, but still see the issue randomly on some servers.

The recordsize property affects only new files. You may have a few files remaining which use different recordsize or compression setting.

ngilles commented 1 year ago

Look like I've been hitting this issue as well.

System is Ubuntu 22.04:

$ zfs version
zfs-2.1.5-1ubuntu6~22.04.1
zfs-kmod-2.1.9-2ubuntu1.1

On our box we have two users. One user that systematically produces the panic (it happens when thunderbird downloads emails and start writing to one of the inbox files). In my case recordsize=1M and compression=off (somewhat surprizingly... I was sure I had set it as lz4 which is pretty much a default). the other user ostensibly has about as much (if not more) churn on writing files but has never (as far as I could tell) triggered such a panic.

I can't recall when each of those record sizes were set. It looks like this is "common enough", I wonder if it's easy to trigger this "from scratch", or are these files triggering the panic maybe created by a specific older version of openzfs and have some crap metadata somewhere.

samvde commented 11 months ago

I can confirm I had this issue, strangely enough also particularly with thunderbird, on datasets with recordsize 1M. I reverted back to 128K and things are running fine.

I have been using zstd for very long, and don't believe this is related. Encryption could be related?

PROPERTY VALUE encryption aes-256-gcm

Kernel 6.5.8-arch1-1-surface #1 SMP PREEMPT_DYNAMIC Sun, 22 Oct 2023 13:57:26 +0000 x86_64 GNU/Linux on Arch Linux (using zfs-dkms).

zfs-2.2.0-1 zfs-kmod-2.2.0-1

lundman commented 11 months ago

So I can trigger this at will, which minor changes, if indeed it is the same problem.

Unrelated to everything, we are looking at setting zp->z_blksz to ashift (4096) instead of 512. (Ask me why, but it's Windows specific, but we are making the same change on macOS)

So we make an equivalent change to zfs_znode.c:

@@ -564,7 +563,13 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_atime_dirty = 0;
        zp->z_mapcnt = 0;
        zp->z_id = db->db_object;
-       zp->z_blksz = blksz;
+       zp->z_blksz = 4096;

Don't forget there is code to scale zp->z_blksz up here: https://github.com/openzfs/zfs/blob/master/module/zfs/zfs_vnops.c#L540-L558 So I am just skipping the "might be higher" and starting from a position of higher.

Then I run the tests/functional/removal tests run from common.run. Then after completing

Test: tests/zfs-tests/tests/functional/removal/removal_check_space (run as root) [00:02] [PASS]

and it starts removal_condense_export we get fireworks.

2023-11-10 15:44:36.798 Df kernel[0:317f] (zfs) <zfs`__dprintf (zfs_debug.c:249)> 
  vdev_indirect.c:533:spa_condense_indirect_complete_sync(): finished condense of vdev 0 in txg 196: new mapping object 295 has 29 entries (was 154 entries)

2023-11-10 15:44:41.785 Df kernel[0:2af1] (zfs) <zfs`vcmn_err (spl-err.c:54)>
   PANIC: zfs: accessing past end of object 282/106 (size=2560 access=0+3574)

If I restore zp->z_blksz to 512, it does not panic. This is running on a M2 mac with 4096 sector SSDs.

Since I can trigger it at will, let me know if there is anything you want to know. Sadly, I can not kernel live debug it until Apple releases a KDK for my specific OS version. But I sure can printf.

Also, in the WR_INDIRECT case, if zp->z_blksz is 0, then blocksize is 0, and len gets picked as 0. We end up in while (resid) forever, taking all CPUs. This is straight forward bug which we can do a separate PR.