openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.67k stars 1.76k forks source link

PANIC at zfs_znode.c with zfs 2.1 and 2.2 #16607

Open Lorac opened 1 month ago

Lorac commented 1 month ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version 12 Bookworm
Kernel Version 6.10.6+bpo-amd64
Architecture x86_64
OpenZFS Version zfs-2.2.6-1.bpo12+2
--- zfs-kmod-2.2.6-1.bpo12+2

Describe the problem you're observing

I am experiencing kernel panic when navigating to a ZFS dataset or copying data to it. The issue persists even after upgrading. Initially, I was using ZFS version 2.1 with kernel 6.1, and the problem occurred. I upgraded to the latest versions available to both ZFS and the kernel bookworm backports, but the issue remains.

Describe how to reproduce the problem

The kernel panic started when I used the "z" plugin in oh-my-zsh to navigate directories within a ZFS dataset. The panic also occurs when trying to rsync a directory from an ext4 filesystem to a ZFS dataset.

Include any warning/errors/backtraces from the system logs

[  515.074670] VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed
[  515.074675] PANIC at zfs_znode.c:345:zfs_znode_sa_init()
[  515.074676] Showing stack for process 4648
[  515.074678] CPU: 14 PID: 4648 Comm: ncdu Tainted: P           OE      6.10.6+bpo-amd64 #1  Debian 6.10.6-1~bpo12+1
[  515.074680] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B550M Phantom Gaming 4, BIOS P1.70 06/23/2021
[  515.074682] Call Trace:
[  515.074684]  <TASK>
[  515.074687]  dump_stack_lvl+0x64/0x80
[  515.074696]  spl_panic+0x100/0x120 [spl]
[  515.074705]  ? dmu_buf_set_user_ie+0x61/0x80 [zfs]
[  515.074837]  zfs_znode_sa_init+0xcc/0xe0 [zfs]
[  515.074942]  zfs_znode_alloc+0x16e/0x7c0 [zfs]
[  515.075042]  ? spl_kmem_alloc+0xf0/0x120 [spl]
[  515.075051]  zfs_zget+0x24d/0x290 [zfs]
[  515.075149]  zfs_dirent_lock+0x4f5/0x6d0 [zfs]
[  515.075250]  zfs_dirlook+0xb6/0x2f0 [zfs]
[  515.075348]  ? zfs_zaccess+0x278/0x460 [zfs]
[  515.075448]  zfs_lookup+0x264/0x410 [zfs]
[  515.075545]  zpl_lookup+0xed/0x270 [zfs]
[  515.075643]  ? generic_permission+0x39/0x240
[  515.075646]  __lookup_slow+0x83/0x130
[  515.075648]  walk_component+0xdb/0x150
[  515.075650]  path_lookupat+0x67/0x190
[  515.075653]  filename_lookup+0xde/0x1d0
[  515.075656]  vfs_statx+0x92/0x1c0
[  515.075658]  vfs_fstatat+0x85/0xb0
[  515.075660]  __do_sys_newfstatat+0x3f/0x80
[  515.075662]  do_syscall_64+0x82/0x190
[  515.075666]  ? syscall_exit_to_user_mode+0x77/0x210
[  515.075668]  ? do_syscall_64+0x8e/0x190
[  515.075669]  ? exc_page_fault+0x76/0x170
[  515.075671]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  515.075674] RIP: 0033:0x7f1e77ad675a
[  515.075676] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 0b 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 07 31 c0 c3 0f 1f 40 00 48 8b 15 71 a6 0d 00 f7
[  515.075678] RSP: 002b:00007ffc35ee74b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
[  515.075680] RAX: ffffffffffffffda RBX: 000055a6dfec0650 RCX: 00007f1e77ad675a
[  515.075681] RDX: 000055a6aefa1240 RSI: 000055a6f73a73cd RDI: 00000000ffffff9c
[  515.075682] RBP: 000055a6aefa1240 R08: 000055a6e2fbbf30 R09: 006f666e49626544
[  515.075682] R10: 0000000000000100 R11: 0000000000000246 R12: 000055a6e2fbbf30
[  515.075683] R13: 000055a6f73a73cd R14: 000055a6aef9b227 R15: 00007f1e77c86020
[  515.075685]  </TASK>
Lorac commented 1 month ago

@snajpa it looks like you deleted your comment, but I will post my properties on my pool. Maybe it will help, looks like project_quota i active.

NAME  PROPERTY                       VALUE                          SOURCE
a     size                           884G                           -
a     capacity                       85%                            -
a     altroot                        -                              default
a     health                         ONLINE                         -
a     guid                           8296607161116845554            -
a     version                        -                              default
a     bootfs                         -                              default
a     delegation                     on                             default
a     autoreplace                    off                            default
a     cachefile                      -                              default
a     failmode                       wait                           default
a     listsnapshots                  on                             local
a     autoexpand                     off                            default
a     dedupratio                     1.00x                          -
a     free                           130G                           -
a     allocated                      754G                           -
a     readonly                       off                            -
a     ashift                         12                             local
a     comment                        -                              default
a     expandsize                     -                              -
a     freeing                        0                              -
a     fragmentation                  64%                            -
a     leaked                         0                              -
a     multihost                      off                            default
a     checkpoint                     -                              -
a     load_guid                      14949411101328869406           -
a     autotrim                       off                            default
a     compatibility                  off                            default
a     bcloneused                     0                              -
a     bclonesaved                    0                              -
a     bcloneratio                    1.00x                          -
a     feature@async_destroy          enabled                        local
a     feature@empty_bpobj            active                         local
a     feature@lz4_compress           active                         local
a     feature@multi_vdev_crash_dump  enabled                        local
a     feature@spacemap_histogram     active                         local
a     feature@enabled_txg            active                         local
a     feature@hole_birth             active                         local
a     feature@extensible_dataset     active                         local
a     feature@embedded_data          active                         local
a     feature@bookmarks              enabled                        local
a     feature@filesystem_limits      enabled                        local
a     feature@large_blocks           enabled                        local
a     feature@large_dnode            enabled                        local
a     feature@sha512                 enabled                        local
a     feature@skein                  enabled                        local
a     feature@edonr                  enabled                        local
a     feature@userobj_accounting     active                         local
a     feature@encryption             enabled                        local
a     feature@project_quota          active                         local
a     feature@device_removal         enabled                        local
a     feature@obsolete_counts        enabled                        local
a     feature@zpool_checkpoint       enabled                        local
a     feature@spacemap_v2            active                         local
a     feature@allocation_classes     enabled                        local
a     feature@resilver_defer         enabled                        local
a     feature@bookmark_v2            enabled                        local
a     feature@redaction_bookmarks    enabled                        local
a     feature@redacted_datasets      enabled                        local
a     feature@bookmark_written       enabled                        local
a     feature@log_spacemap           active                         local
a     feature@livelist               enabled                        local
a     feature@device_rebuild         enabled                        local
a     feature@zstd_compress          active                         local
a     feature@draid                  disabled                       local
a     feature@zilsaxattr             disabled                       local
a     feature@head_errlog            disabled                       local
a     feature@blake3                 disabled                       local
a     feature@block_cloning          disabled                       local
a     feature@vdev_zaps_v2           disabled                       local
snajpa commented 1 month ago

yeah sorry I was reading through the code and my fingers were faster than my brain, I thought I saw something related to project quota but I invalidated it after a bit :-D

snajpa commented 1 month ago

I think this could fit what I'm trying to fix in #16625; but that fix won't help in this case, when there's a znode without SA initialized, but somehow still linked to a directory.

I tried figuring out a way how to call zpl_xattr_security_init for the znode, but I'd really need the parent directory inode and I can't get to that in zfs_znode_sa_init. Perhaps some way could be found by using thread specific data to mark what we're looking for in zpl_lookup, if I'm right and if the file in question turns out to be really important for you, we can work on that later.

For now please make a snapshot of the affected dataset and then let's try the patch below, to see if it enables you to move forward without a panic - it should evict the inode and throw -EIO on access to that file.

If I'm right, the file was created with O_TMPFILE and perhaps power was lost, or the kernel crashed in the wrong section, otherwise I can't explain how we end up with a znode without any SA at all.

The patch (should be applied to current master - 75dda92dc):

diff --git a/module/os/linux/zfs/zfs_znode_os.c b/module/os/linux/zfs/zfs_znode_os.c
index f13edf95b..025f2482a 100644
--- a/module/os/linux/zfs/zfs_znode_os.c
+++ b/module/os/linux/zfs/zfs_znode_os.c
@@ -323,7 +323,7 @@ zfs_cmpldev(uint64_t dev)
        return (dev);
 }

-static void
+static int
 zfs_znode_sa_init(zfsvfs_t *zfsvfs, znode_t *zp,
     dmu_buf_t *db, dmu_object_type_t obj_type, sa_handle_t *sa_hdl)
 {
@@ -334,8 +334,11 @@ zfs_znode_sa_init(zfsvfs_t *zfsvfs, znode_t *zp,
        ASSERT(zp->z_sa_hdl == NULL);
        ASSERT(zp->z_acl_cached == NULL);
        if (sa_hdl == NULL) {
-               VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp,
-                   SA_HDL_SHARED, &zp->z_sa_hdl));
+               if (0 != sa_handle_get_from_db(zfsvfs->z_os, db, zp,
+                   SA_HDL_SHARED, &zp->z_sa_hdl)) {
+                       zfs_dbgmsg("sa_handle_get_from_db failed");
+                       return (1);
+               }
        } else {
                zp->z_sa_hdl = sa_hdl;
                sa_set_userp(sa_hdl, zp);
@@ -344,6 +347,7 @@ zfs_znode_sa_init(zfsvfs_t *zfsvfs, znode_t *zp,
        zp->z_is_sa = (obj_type == DMU_OT_SA) ? B_TRUE : B_FALSE;

        mutex_exit(&zp->z_lock);
+       return (0);
 }

 void
@@ -538,7 +542,11 @@ zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
        zp->z_sync_writes_cnt = 0;
        zp->z_async_writes_cnt = 0;

-       zfs_znode_sa_init(zfsvfs, zp, db, obj_type, hdl);
+       int fail = zfs_znode_sa_init(zfsvfs, zp, db, obj_type, hdl);
+       if (fail) {
+               iput(ip);
+               return (SET_ERROR(EIO));
+       }

        SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL, &mode, 8);
        SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GEN(zfsvfs), NULL, &tmp_gen, 8);
Lorac commented 1 month ago

I initially tried deleting a corrupted file and directory, but I destroyed and recreated two ZFS datasets when that failed. This resolved my issue. The datasets weren’t critical (mostly build directories), but I needed them to continue working. The first time I encountered the error was from a directory which can't be a

At the time, I didn’t realize I should have taken a snapshot, assuming a corrupted file or directory wouldn’t be included in a snapshot. I am relatively new to ZFS.

The problem first appeared when deleting the directory: a specific file caused ZFS to panic. I opened a second SSH connection to my workstation and retried, but I encountered an EIO error instead of the panic. This resulted in a CPU hang on rm and 15GB of logs.

At this point, I’m not sure if the patch will make a difference, as I’ve already recreated the datasets. I don't know if the files I was trying to delete were created with O_TMPFILE since they’ve been there for a while. They are part of a build system that typically doesn’t create temporary files for the type I was trying to delete. The second dataset could have used O_TMPFILE, as it was from a C++ build with -pipe—maybe.

The drive they’re on has 68,000 power-on hours (SSD). I need help analyzing the smartctl output; you likely have more expertise interpreting that data than I do.

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       1
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       68383
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       198
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   018   018   000    Old_age   Always       -       1642
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       100
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       5589
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   076   048   000    Old_age   Always       -       24 (Min/Max 17/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       1
202 Percent_Lifetime_Remain 0x0030   018   018   001    Old_age   Offline      -       82
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       278295250114
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       8696884170
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       46386914208
Lorac commented 1 month ago

I actually got the panic again on another more critical dataset. It's already in my snapshots, so I'll try the patch.

This happened during a scan with ncdu

snajpa commented 1 month ago

Curious, is your xattr property at the affected dataset =sa or =on?

Lorac commented 1 month ago

xattr is on on the affected dataset.

Lorac commented 1 week ago

@snajpa so I had bad memory stick, swap to new sticks and now no more problems. So it's not a ZFS bug per say I guess?

snajpa commented 4 days ago

if you can't reproduce after a HW change (and no SW upgrade), then it must have been it :)