robbat2 commented 2 months ago

System information

Type	Version/Name
Distribution Name	Gentoo
Distribution Version	N/A
Kernel Version	6.8.11-gentoo-dist
Architecture	AMD64
OpenZFS Version	Initially found on v2.2.5-r0-gentoo also reproduced on zfs-2.2.99-684_g73866cf34 zfs-kmod-2.2.99-683_g6be8bf555

Describe the problem you're observing

zpool import $POOL fails; includes a trackback (to follow in next paste)
Kernel BUG: kernel NULL pointer dereference, address: 0000000000000020 during zpool import
Doing a read-only import of the pool is successful.
Rolling back to an earlier TXG always fails with the same kernel fault. ** Rollback does NOT show a traceback on the zpool command, but does have a kernel traceback.

Describe how to reproduce the problem

I don't have a reproduction case for another system.

Include any warning/errors/backtraces from the system logs

# dmesg >dmesg.1
(ignore the extra rmmod / modprobe)
# (set -x ; rmmod zfs ; lsmod |grep -sq zfs && echo zfs still loaded && break ;   modprobe zfs spa_load_verify_data=0  spa_load_verify_metadata=0 && time zpool import -o readonly=off -T 20892830 tank 2>&1 | tee crash.txt )
# dmesg >dmesg.2
+ rmmod zfs
+ lsmod
+ grep --color=auto -sq zfs
+ modprobe zfs spa_load_verify_data=0 spa_load_verify_metadata=0
+ zpool import -o readonly=off -T 20892830 tank
+ tee crash.txt

real    0m8.867s
user    0m0.021s
sys 0m0.290s

--- dmesg.1 2024-08-29 02:13:58.029339091 +0000
+++ dmesg.2 2024-08-29 02:15:36.829176386 +0000
@@ -1608,3 +1608,101 @@
 [  327.637673] overlayfs: workdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.
 [  327.676912] overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.
 [  327.676919] overlayfs: workdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.
+[  377.082693] ZFS: Unloaded module v2.2.99-683_g6be8bf555
+[  422.549560] ZFS: Loaded module v2.2.99-683_g6be8bf555, ZFS pool version 5000, ZFS filesystem version 5
+[  434.254939] ZFS: Unloaded module v2.2.99-683_g6be8bf555
+[  435.519409] ZFS: Loaded module v2.2.99-683_g6be8bf555, ZFS pool version 5000, ZFS filesystem version 5
+[  444.383633] BUG: kernel NULL pointer dereference, address: 0000000000000020
+[  444.383644] #PF: supervisor write access in kernel mode
+[  444.383648] #PF: error_code(0x0002) - not-present page
+[  444.383652] PGD 0 P4D 0 
+[  444.383658] Oops: 0002 [#1] PREEMPT SMP NOPTI
+[  444.383664] CPU: 13 PID: 13932 Comm: zpool Tainted: P           OE      6.8.11-gentoo-dist #1
+[  444.383670] Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VIII HERO (WI-FI), BIOS 1105 11/04/2019
+[  444.383674] RIP: 0010:spa_log_summary_dirty_flushed_metaslab+0x58/0x190 [zfs]
+[  444.383823] Code: f7 d9 4c 29 c2 74 23 31 ff 48 3b 32 72 19 4a 8b 04 02 49 39 c1 74 13 48 89 d7 48 01 c8 74 0b 48 89 c2 48 3b 32 73 e7 48 89 fa <48> 83 42 20 01 e9 9e d9 4d da 31 d2 eb f2 66 2e 0f 1f 84 00 00 00
+[  444.383827] RSP: 0018:ffff9b7e0147b7e0 EFLAGS: 00010297
+[  444.383833] RAX: ffff8e207e7781f0 RBX: ffff9b7e0147b870 RCX: ffffffffffffffd0
+[  444.383837] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
+[  444.383841] RBP: ffff8e1f8f7ba000 R08: 0000000000000030 R09: ffff8e2053d98c88
+[  444.383844] R10: ffffbb7dfe149990 R11: 000000000000004d R12: 0000000020000000
+[  444.383848] R13: ffff8e2053d98000 R14: 0000000000054000 R15: 00000000013ecbd3
+[  444.383852] FS:  00007f46aa42f080(0000) GS:ffff8e3e1ed40000(0000) knlGS:0000000000000000
+[  444.383857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+[  444.383860] CR2: 0000000000000020 CR3: 000000014ea84000 CR4: 0000000000350ef0
+[  444.383865] Call Trace:
+[  444.383870]  <TASK>
+[  444.383875]  ? __die+0x23/0x70
+[  444.383884]  ? page_fault_oops+0x171/0x4e0
+[  444.383895]  ? exc_page_fault+0x7f/0x180
+[  444.383904]  ? asm_exc_page_fault+0x26/0x30
+[  444.383918]  ? spa_log_summary_dirty_flushed_metaslab+0x58/0x190 [zfs]
+[  444.384053]  spa_log_summary_dirty_flushed_metaslab+0x149/0x190 [zfs]
+[  444.384188]  space_map_iterate+0x196/0x400 [zfs]
+[  444.384317]  ? spa_log_summary_dirty_flushed_metaslab+0x80/0x190 [zfs]
+[  444.384457]  spa_ld_log_spacemaps+0x5ea/0x7a0 [zfs]
+[  444.384595]  spa_reset+0xd36/0x2240 [zfs]
+[  444.384734]  ? __dprintf+0x13d/0x1c0 [zfs]
+[  444.384836]  spa_reset+0x1b34/0x2240 [zfs]
+[  444.384973]  spa_import+0x22e/0x6d0 [zfs]
+[  444.385113]  zfs_impl_get_ops+0x26ec/0x71b0 [zfs]
+[  444.385229]  zfsdev_ioctl_common+0x813/0x950 [zfs]
+[  444.385341]  ? kvmalloc_node+0x43/0xd0
+[  444.385348]  ? srso_return_thunk+0x5/0x5f
+[  444.385358]  zfs_file_put+0x123/0x2d0 [zfs]
+[  444.385456]  __x64_sys_ioctl+0x97/0xd0
+[  444.385464]  do_syscall_64+0x84/0x1a0
+[  444.385474]  ? do_sys_openat2+0x97/0xe0
+[  444.385481]  ? srso_return_thunk+0x5/0x5f
+[  444.385487]  ? __alloc_pages+0x1a3/0x350
+[  444.385497]  ? alloc_pages_mpol+0x95/0x1f0
+[  444.385503]  ? srso_return_thunk+0x5/0x5f
+[  444.385508]  ? __pte_offset_map+0x1b/0x180
+[  444.385515]  ? srso_return_thunk+0x5/0x5f
+[  444.385521]  ? __mod_memcg_lruvec_state+0x97/0x110
+[  444.385527]  ? srso_return_thunk+0x5/0x5f
+[  444.385533]  ? srso_return_thunk+0x5/0x5f
+[  444.385538]  ? set_ptes.isra.0+0x28/0x90
+[  444.385545]  ? srso_return_thunk+0x5/0x5f
+[  444.385550]  ? do_anonymous_page+0x357/0x6e0
+[  444.385555]  ? pmdp_invalidate+0xd0/0x100
+[  444.385563]  ? srso_return_thunk+0x5/0x5f
+[  444.385568]  ? __handle_mm_fault+0xb4a/0xe40
+[  444.385574]  ? srso_return_thunk+0x5/0x5f
+[  444.385584]  ? srso_return_thunk+0x5/0x5f
+[  444.385589]  ? __count_memcg_events+0x4d/0xc0
+[  444.385595]  ? srso_return_thunk+0x5/0x5f
+[  444.385600]  ? count_memcg_events.constprop.0+0x1a/0x30
+[  444.385606]  ? srso_return_thunk+0x5/0x5f
+[  444.385612]  ? handle_mm_fault+0xa2/0x360
+[  444.385618]  ? srso_return_thunk+0x5/0x5f
+[  444.385623]  ? do_user_addr_fault+0x304/0x670
+[  444.385631]  ? srso_return_thunk+0x5/0x5f
+[  444.385636]  ? srso_return_thunk+0x5/0x5f
+[  444.385643]  entry_SYSCALL_64_after_hwframe+0x78/0x80
+[  444.385650] RIP: 0033:0x7f46aa593b2b
+[  444.385655] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
+[  444.385659] RSP: 002b:00007ffef0124920 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
+[  444.385665] RAX: ffffffffffffffda RBX: 000055816f76a9e0 RCX: 00007f46aa593b2b
+[  444.385669] RDX: 00007ffef01249e0 RSI: 0000000000005a02 RDI: 0000000000000003
+[  444.385672] RBP: 00007ffef01290d0 R08: 0000000000052f21 R09: 0000000000000000
+[  444.385676] R10: 0000000000000004 R11: 0000000000000246 R12: 000055816f74b4e0
+[  444.385679] R13: 00007ffef01249e0 R14: 000055816f75be70 R15: 000055816f792530
+[  444.385690]  </TASK>
+[  444.385692] Modules linked in: zfs(POE) veth xt_nat xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge 8021q garp mrp stp llc bonding tls bnep dummy vfat fat binfmt_misc radeon tun dm_multipath overlay hangcheck_timer nfsd auth_rpcgss nfs_acl lockd grace iTCO_wdt intel_rapl_msr intel_pmc_bxt iTCO_vendor_support snd_hda_codec_realtek amdgpu intel_rapl_common snd_hda_codec_generic spl(OE) nct6775 nct6775_core asus_ec_sensors hwmon_vid snd_hda_codec_hdmi drivetemp edac_mce_amd amdxcp i2c_dev snd_hda_intel iwlmvm snd_intel_dspcfg kvm_amd snd_intel_sdw_acpi gpu_sched mac80211 drm_suballoc_helper snd_hda_codec drm_ttm_helper libarc4 kvm ttm snd_hda_core drm_exec snd_hwdep drm_display_helper eeepc_wmi irqbypass btusb iwlwifi snd_pcm asus_wmi btrtl ledtrig_audio btintel cec snd_timer sparse_keymap btbcm drm_buddy platform_profile btmtk joydev rapl wmi_bmof pcspkr video acpi_cpufreq mxm_wmi snd cfg80211
+[  444.385855]  k10temp bluetooth i2c_piix4 apple_mfi_fastcharge soundcore rfkill ryzen_smu(OE) dm_thin_pool dm_persistent_data dm_bio_prison raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx uas hid_apple usb_storage raid1 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel igb sha512_ssse3 nvme sha256_ssse3 sha1_ssse3 ccp sp5100_tco r8169 nvme_core i2c_algo_bit realtek dca sata_sil24 nvme_auth wmi sunrpc
+[  444.385927] Unloaded tainted modules: zfs(POE):2 [last unloaded: zfs(POE)]
+[  444.385938] CR2: 0000000000000020
+[  444.385943] ---[ end trace 0000000000000000 ]---
+[  444.385947] RIP: 0010:spa_log_summary_dirty_flushed_metaslab+0x58/0x190 [zfs]
+[  444.386090] Code: f7 d9 4c 29 c2 74 23 31 ff 48 3b 32 72 19 4a 8b 04 02 49 39 c1 74 13 48 89 d7 48 01 c8 74 0b 48 89 c2 48 3b 32 73 e7 48 89 fa <48> 83 42 20 01 e9 9e d9 4d da 31 d2 eb f2 66 2e 0f 1f 84 00 00 00
+[  444.386094] RSP: 0018:ffff9b7e0147b7e0 EFLAGS: 00010297
+[  444.386100] RAX: ffff8e207e7781f0 RBX: ffff9b7e0147b870 RCX: ffffffffffffffd0
+[  444.386105] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
+[  444.386108] RBP: ffff8e1f8f7ba000 R08: 0000000000000030 R09: ffff8e2053d98c88
+[  444.386112] R10: ffffbb7dfe149990 R11: 000000000000004d R12: 0000000020000000
+[  444.386116] R13: ffff8e2053d98000 R14: 0000000000054000 R15: 00000000013ecbd3
+[  444.386120] FS:  00007f46aa42f080(0000) GS:ffff8e3e1ed40000(0000) knlGS:0000000000000000
+[  444.386124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+[  444.386129] CR2: 0000000000000020 CR3: 000000014ea84000 CR4: 0000000000350ef0
+[  444.386133] note: zpool[13932] exited with irqs disabled

other traces

With the zpool imported read-only

# zdb  -i  -e tank 

Dataset mos [META], ID 0, cr_txg 4, 135G, 8787 objects
Dataset tank@zfs-auto-snap_daily-2024-08-02-0321 [ZPL], ID 113372, cr_txg 20488548, 1.23T, 9197498 objects
Dataset tank@zfs-auto-snap_weekly-2024-07-15-0355 [ZPL], ID 30124, cr_txg 20182414, 1.22T, 9150957 objects
...
Dataset tank@zfs-auto-snap_monthly-2024-08-09-0329 [ZPL], ID 1402, cr_txg 20608113, 1.23T, 9227918 objects
Dataset tank@zfs-auto-snap_hourly-2024-08-25-2201 [ZPL], ID 69647, cr_txg 20891678, 1.26T, 9322271 objects
Dataset tank [ZPL], ID 54, cr_txg 1, 1.26T, 9322447 objects

    ZIL header: claim_txg 0, claim_blk_seq 0, claim_lr_seq 0 replay_seq 0, flags 0x0
error: zfs: adding existent segment to range tree (offset=1 size=1)
Call trace:
/lib64/libzpool.so.5(libspl_backtrace+0x33)[0x7f13a150f223]
zdb(+0x15867)[0x55ad4c109867]
/lib64/libc.so.6(+0x3b590)[0x7f13a0a61590]
/lib64/libc.so.6(+0x8a1dc)[0x7f13a0ab01dc]
/lib64/libc.so.6(gsignal+0x12)[0x7f13a0a614f2]
/lib64/libc.so.6(abort+0xd3)[0x7f13a0a4a4ed]
/lib64/libzpool.so.5(+0x5a0d8)[0x7f13a125a0d8]
/lib64/libzpool.so.5(+0x5a21a)[0x7f13a125a21a]
/lib64/libzpool.so.5(zfs_panic_recover+0xbd)[0x7f13a138965d]
/lib64/libzpool.so.5(+0x15aa14)[0x7f13a135aa14]
zdb(+0x23e37)[0x55ad4c117e37]
zdb(+0xcd1a)[0x55ad4c100d1a]
/lib64/libc.so.6(+0x25c8a)[0x7f13a0a4bc8a]
/lib64/libc.so.6(__libc_start_main+0x85)[0x7f13a0a4bd45]
zdb(+0xdb61)[0x55ad4c101b61]
Aborted

#  zdb -c -b -e $(ls -d /dev/vg/*zfs* |xargs printf ' --path=%s') tank 2>&1 |tee zdb.cb.txt

loading concrete vdev 3, metaslab 126 of 127 ...error: zfs: removing nonexistent segment from range tree (offset=202a6000 size=1a000)
Call trace:
/lib64/libzpool.so.5(libspl_backtrace+0x33)[0x7f194090f223]
zdb(+0x15867)[0x563bc8180867]
/lib64/libc.so.6(+0x3b590)[0x7f193fe61590]
/lib64/libc.so.6(+0x8a1dc)[0x7f193feb01dc]
/lib64/libc.so.6(gsignal+0x12)[0x7f193fe614f2]
/lib64/libc.so.6(abort+0xd3)[0x7f193fe4a4ed]
/lib64/libzpool.so.5(+0x5a0d8)[0x7f194065a0d8]
/lib64/libzpool.so.5(+0x5a21a)[0x7f194065a21a]
/lib64/libzpool.so.5(zfs_panic_recover+0xbd)[0x7f194078965d]
/lib64/libzpool.so.5(+0x15d9f8)[0x7f194075d9f8]
zdb(+0xf9ce)[0x563bc817a9ce]
/lib64/libzpool.so.5(space_map_iterate+0x373)[0x7f1940791103]
zdb(+0x12ab5)[0x563bc817dab5]
zdb(+0x12e35)[0x563bc817de35]
zdb(+0x1a6f5)[0x563bc81856f5]
zdb(+0x2344f)[0x563bc818e44f]
zdb(+0xcd1a)[0x563bc8177d1a]
/lib64/libc.so.6(+0x25c8a)[0x7f193fe4bc8a]
/lib64/libc.so.6(__libc_start_main+0x85)[0x7f193fe4bd45]
zdb(+0xdb61)[0x563bc8178b61]

robbat2 commented 2 months ago

Additional debug steps taken, other notes

12-hour memtest is successful; this is non-ECC RAM
This was working up to 2024/08/25 22:35 UTC approx; when the system was under very heavy load and had a reboot for unclear reasons. I'd stepped away for a moment and found it stuck after an apparent reboot.
Earlier in the same day IIRC I had reconfigured L2ARC & ZILs to prepare for preemptive replacing older/too-small drives [there are zero signs of hardware failure

amotin commented 2 months ago

The panic seems to be from spa_log_summary_dirty_flushed_metaslab() unable to find log summary entry for transaction group metaslab belongs to. Messages about missing/duplicate segments make me think there may be some mess in what spacemap logs are replay and which are not, but it was a while since I looked there last, so no idea what could cause it other than mentioned non-ECC RAM, etc.

If rollback by few transactions does not help, then you may have to import the pool read-only to evacuate the data, since there is no other way to recover spacemaps.

robbat2 commented 2 months ago

Thanks. I did try the rollback unsuccessfully, to the entries in the uberblock

1. Is there some other data you'd like for bug analysis?

2. Assuming I can scrounge up another system with enough RAM; will a zfs send | zfs recv combination also be affected by the spacemap problem? I'd like to keep the snapshots.

This pool normally runs on my desktop w/ 128GB RAM; and makes good use of dedup; I tried to recv on another handy system with 8GB of RAM and it just ground to a halt and OOM'd.

amotin commented 2 months ago

Assuming I can scrounge up another system with enough RAM; will a zfs send | zfs recv combination also be affected by the spacemap problem? I'd like to keep the snapshots.

You won't be able to create a new snapshots on read-only pool, but previous should replicate just fine.

MagicRB commented 2 months ago

@robbat2 if your dedup tables can fit into 60gb of ram, run two VMs and send/recv between them

robbat2 commented 2 months ago

The data was recovered; I did zfs send to (two) external devices; and then did zfs recv from one of them.

Last call if the devs feel they want any reproduction data from the buggy pool before I delete it outright (will do the deletion in a few days).

wiesl commented 2 months ago

Isn't there further analysis possible that the pool can be in a new consistent state to be imported correctly and some possible bugs can be fixed?

robbat2 commented 2 months ago

The buggy pool has been deleted now because I needed that 3+TB back.

openzfs / zfs

zpool import fails: kernel NULL pointer dereference, address: 0000000000000020 #16494

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

other traces

Additional debug steps taken, other notes