openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.47k stars 1.74k forks source link

Kernel OOps when removing a device from the pool #16021

Open lordrasmus opened 6 months ago

lordrasmus commented 6 months ago

System information

Type Version/Name
Distribution Name Proxmox ( Debian )
Distribution Version 8.1.5 ( 12 )
Kernel Version 6.5.13-3-pve
Architecture x86_64
OpenZFS Version 2.2.3

Describe the problem you're observing

removing a device leads to kernel Oops

BUG: kernel NULL pointer dereference, address: 0000000000000088

Describe how to reproduce the problem

zpool remove zfs-pool wwn-0x50014ee6052e6cf1

Include any warning/errors/backtraces from the system logs

i added a printk to find out phich pointer is zero

module/zfs/vdev_removal.c -> vdev_passivate()

                for (uint64_t id = 0; id < rvd->vdev_children; id++) {
                        vdev_t *cvd = rvd->vdev_child[id];
+#ifdef __KERNEL__
+                       printk(KERN_EMERG"rvd->vdev_child[%llu] %px\n",id,rvd->vdev_child[id]);
+                       printk(KERN_EMERG"   cvd->vdev_mg           %px\n", cvd->vdev_mg);
+                       printk(KERN_EMERG"   cvd->vdev_mg->mg_class %px\n", cvd->vdev_mg->mg_class);
+#endif

and here is the output

[ 44.868315] rvd->vdev_child[0] ffff98197f054000 [ 44.868325] cvd->vdev_mg 0000000000000000

so this line is crashing metaslab_class_t *mc = cvd->vdev_mg->mg_class;

i guess its a bug if indirect-0 exists in the pool ?

zpool status

  pool: zfs-pool
 state: ONLINE
  scan: scrub canceled on Sat Mar 23 00:42:29 2024
remove: Removal of vdev 0 copied 2.60T in 4h49m, completed on Tue Nov 15 22:06:15 2022
        16.3M memory used for removed device mappings
config:

        NAME                                              STATE     READ WRITE CKSUM
        zfs-pool                                          ONLINE       0     0     0
          wwn-0x50014ee6052e6cf1                          ONLINE       0     0     0
          wwn-0x50014ee6afd86888                          ONLINE       0     0     0
          nvme-Samsung_SSD_970_EVO_1TB_S5H9NS0NC03793P_1  ONLINE       0     0     0
          nvme-eui.e8238fa6bf530001001b448b49f8611a   ONLINE       0     0     0
          wwn-0x5002538e908057d6                          ONLINE       0     0     0
zpool list -v

NAME                                               SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zfs-pool                                          7.70T  2.89T  4.81T        -         -    17%    37%  1.00x    ONLINE  -
  indirect-0                                          -      -      -        -         -      -      -      -    ONLINE
  wwn-0x50014ee6052e6cf1                          2.73T  1.41T  1.31T        -         -    25%  52.0%      -    ONLINE
  wwn-0x50014ee6afd86888                          2.73T  1.48T  1.24T        -         -    24%  54.3%      -    ONLINE
  nvme-Samsung_SSD_970_EVO_1TB_S5H9NS0NC03793P_1   932G   148M   928G        -         -     0%  0.01%      -    ONLINE
  nvme-eui.e8238fa6bf530001001b448b49f8611a        466G   113M   464G        -         -     0%  0.02%      -    ONLINE
  wwn-0x5002538e908057d6                           932G  93.0M   928G        -         -     0%  0.00%      -    ONLINE
[  303.646291] BUG: kernel NULL pointer dereference, address: 0000000000000088
[  303.646302] #PF: supervisor read access in kernel mode
[  303.646311] #PF: error_code(0x0000) - not-present page
[  303.646319] PGD 0 P4D 0
[  303.646326] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  303.646334] CPU: 2 PID: 8076 Comm: zpool Tainted: P           OE  6.5.13-3-pve #1
[  303.646346] Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS ULTRA GAMING/X470 AORUS ULTRA GAMING-CF, BIOS F64a 02/09/2023
[  303.646360] RIP: 0010:vdev_passivate+0x11f/0x1b0 [zfs]
[  303.646472] Code: 00 00 00 31 d2 eb 09 48 83 c2 01 48 39 d7 74 3a 49 8b 0c d0 49 39 ce 74 ee 48 81 79 60 c0 f7 05 c1 74 e4 48 8b b1 98 2b 00>
[  303.646490] RSP: 0018:ffffa74f1f4afc90 EFLAGS: 00010202
[  303.646498] RAX: ffff951bc076f000 RBX: ffff951b8f7f8000 RCX: ffff951b52208000
[  303.646508] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000007
[  303.646518] RBP: ffffa74f1f4afcb8 R08: ffff951bbc9f5100 R09: 0000000000000000
[  303.646528] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa74f1f4afd00
[  303.646539] R13: ffff951bc0e75c00 R14: ffff951bbf1bc000 R15: ffff951bbf710000
[  303.646549] FS:  000070b92ec83800(0000) GS:ffff952a3ea80000(0000) knlGS:0000000000000000
[  303.646560] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  303.646569] CR2: 0000000000000088 CR3: 000000038fc0a000 CR4: 0000000000750ee0
[  303.646579] PKRU: 55555554
[  303.646585] Call Trace:
[  303.646591]  <TASK>
[  303.646597]  ? show_regs+0x6d/0x80
[  303.646607]  ? __die+0x24/0x80
[  303.646615]  ? page_fault_oops+0x176/0x500
[  303.646626]  ? do_user_addr_fault+0x31d/0x6a0
[  303.646635]  ? srso_alias_return_thunk+0x5/0x7f
[  303.646645]  ? exc_page_fault+0x83/0x1b0
[  303.646654]  ? asm_exc_page_fault+0x27/0x30
[  303.646665]  ? vdev_passivate+0x11f/0x1b0 [zfs]
[  303.646762]  spa_vdev_remove+0x7f9/0x9b0 [zfs]
[  303.646665]  ? vdev_passivate+0x11f/0x1b0 [zfs]
[  303.646762]  spa_vdev_remove+0x7f9/0x9b0 [zfs]
[  303.646855]  ? spa_open_common+0x27f/0x450 [zfs]
[  303.646957]  zfs_ioc_vdev_remove+0x5e/0xb0 [zfs]
[  303.647052]  zfsdev_ioctl_common+0x8e1/0xa20 [zfs]
[  303.647144]  ? srso_alias_return_thunk+0x5/0x7f
[  303.647152]  ? srso_alias_return_thunk+0x5/0x7f
[  303.647160]  ? __check_object_size+0x9d/0x300
[  303.647171]  zfsdev_ioctl+0x57/0xf0 [zfs]
[  303.647553]  __x64_sys_ioctl+0xa3/0xf0
[  303.647828]  do_syscall_64+0x5b/0x90
[  303.648096]  ? handle_mm_fault+0xad/0x360
[  303.648362]  ? srso_alias_return_thunk+0x5/0x7f
[  303.648622]  ? exit_to_user_mode_prepare+0x39/0x190
[  303.648883]  ? srso_alias_return_thunk+0x5/0x7f
[  303.649143]  ? irqentry_exit_to_user_mode+0x17/0x20
[  303.649403]  ? srso_alias_return_thunk+0x5/0x7f
[  303.649664]  ? irqentry_exit+0x43/0x50
[  303.649927]  ? srso_alias_return_thunk+0x5/0x7f
[  303.650184]  ? exc_page_fault+0x94/0x1b0
[  303.650439]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  303.650694] RIP: 0033:0x70b92f3eec5b
[  303.650964] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f>
[  303.651243] RSP: 002b:00007fff3b88fc20 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  303.651532] RAX: ffffffffffffffda RBX: 0000647f39611830 RCX: 000070b92f3eec5b
[  303.651822] RDX: 00007fff3b88fc90 RSI: 0000000000005a0c RDI: 0000000000000003
[  303.652113] RBP: 00007fff3b893680 R08: 0000000000041000 R09: 0000000000000000
[  303.652406] R10: 0000000000000000 R11: 0000000000000246 R12: 0000647f3960b2c0
[  303.652700] R13: 00007fff3b88fc90 R14: 00007fff3b893240 R15: 0000647f396064e0
[  303.652998]  </TASK>
[  303.653294] Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sc>
[  303.653363]  blake2b_generic xor hid_cherry hid_generic usbkbd usbhid hid raid6_pq simplefb dm_cache_smq dm_cache dm_persistent_data dm_bio_>
[  303.655829] CR2: 0000000000000088
[  303.656207] ---[ end trace 0000000000000000 ]---
[  303.656582] RIP: 0010:vdev_passivate+0x11f/0x1b0 [zfs]
[  303.657049] Code: 00 00 00 31 d2 eb 09 48 83 c2 01 48 39 d7 74 3a 49 8b 0c d0 49 39 ce 74 ee 48 81 79 60 c0 f7 05 c1 74 e4 48 8b b1 98 2b 00>
[  303.657445] RSP: 0018:ffffa74f1f4afc90 EFLAGS: 00010202
[  303.657843] RAX: ffff951bc076f000 RBX: ffff951b8f7f8000 RCX: ffff951b52208000
[  303.658244] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000007
[  303.658651] RBP: ffffa74f1f4afcb8 R08: ffff951bbc9f5100 R09: 0000000000000000
[  303.659048] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa74f1f4afd00
[  303.659444] R13: ffff951bc0e75c00 R14: ffff951bbf1bc000 R15: ffff951bbf710000
[  303.659826] FS:  000070b92ec83800(0000) GS:ffff952a3ea80000(0000) knlGS:0000000000000000
[  303.660200] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  303.660570] CR2: 0000000000000088 CR3: 000000038fc0a000 CR4: 0000000000750ee0
[  303.660942] PKRU: 55555554
flisk commented 5 months ago

I think I'm seeing the same problem on one of my systems. The conditions for the oops are roughly the same: I'm trying to remove a vdev from a pool that contains an indirect vdev (two of them, technically), and upon zpool remove, this happens:

[ 1518.859606] BUG: kernel NULL pointer dereference, address: 0000000000000088
[ 1518.944079] #PF: supervisor read access in kernel mode
[ 1519.006652] #PF: error_code(0x0000) - not-present page
[ 1519.069215] PGD 0 P4D 0
[ 1519.100558] Oops: 0000 [#1] PREEMPT SMP PTI
[ 1519.151659] CPU: 1 PID: 9607 Comm: zpool Tainted: P           O       6.5.13-5-pve #1
[ 1519.246438] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 2.18.2 10/18/2023
[ 1519.338099] RIP: 0010:vdev_passivate+0x113/0x1a0 [zfs]
[ 1519.401100] Code: 00 00 00 31 d2 eb 09 48 83 c2 01 48 39 d7 74 3a 49 8b 0c d0 49 39 ce 74 ee 48 81 79 60 c0 17 b0 c0 74 e4 48 8b b1 98 2b 00 00 <48> 3b 86 88 00 00 00 75 d4 48 83 b9 d0 2c 00 00 00 0f 84 15 ff ff
[ 1519.628019] RSP: 0018:ffffc328cb37fc78 EFLAGS: 00010202
[ 1519.691612] RAX: ffff9ead402e7c00 RBX: ffff9ead558f4000 RCX: ffff9ead62f8c000
[ 1519.778085] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000005
[ 1519.864555] RBP: ffffc328cb37fca0 R08: ffff9eb156f24f00 R09: 0000000000000000
[ 1519.951014] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc328cb37fce8
[ 1520.037464] R13: ffff9ead63b84800 R14: ffff9ead62f7c000 R15: ffff9ead62f74000
[ 1520.123923] FS:  0000770e9a2c9800(0000) GS:ffff9ecc3fc40000(0000) knlGS:0000000000000000
[ 1520.221813] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1520.291603] CR2: 0000000000000088 CR3: 0000000bc9d20006 CR4: 00000000001706e0
[ 1520.378031] Call Trace:
[ 1520.408277]  <TASK>
[ 1520.434348]  ? show_regs+0x6d/0x80
[ 1520.476010]  ? __die+0x24/0x80
[ 1520.513495]  ? page_fault_oops+0x176/0x500
[ 1520.563455]  ? vdev_passivate+0x113/0x1a0 [zfs]
[ 1520.619061]  ? kernelmode_fixup_or_oops+0xb2/0x140
[ 1520.677315]  ? __bad_area_nosemaphore+0x1a5/0x280
[ 1520.734521]  ? bad_area_nosemaphore+0x16/0x30
[ 1520.787547]  ? do_user_addr_fault+0x2c4/0x6a0
[ 1520.840567]  ? exc_page_fault+0x83/0x1b0
[ 1520.888367]  ? asm_exc_page_fault+0x27/0x30
[ 1520.939283]  ? vdev_passivate+0x113/0x1a0 [zfs]
[ 1520.994773]  ? vdev_passivate+0x32/0x1a0 [zfs]
[ 1521.049202]  spa_vdev_remove+0x7f9/0x9b0 [zfs]
[ 1521.103609]  ? spa_open_common+0x27f/0x450 [zfs]
[ 1521.160116]  zfs_ioc_vdev_remove+0x5e/0xb0 [zfs]
[ 1521.216574]  zfsdev_ioctl_common+0x8e1/0xa20 [zfs]
[ 1521.275153]  ? __check_object_size+0x9d/0x300
[ 1521.328100]  zfsdev_ioctl+0x57/0xf0 [zfs]
[ 1521.377275]  __x64_sys_ioctl+0xa3/0xf0
[ 1521.422909]  do_syscall_64+0x5b/0x90
[ 1521.466447]  ? exit_to_user_mode_prepare+0x39/0x190
[ 1521.525576]  ? syscall_exit_to_user_mode+0x37/0x60
[ 1521.583660]  ? do_syscall_64+0x67/0x90
[ 1521.629240]  ? exc_page_fault+0x94/0x1b0
[ 1521.676887]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1521.738044] RIP: 0033:0x770e9aa33c5b
[ 1521.781533] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 1522.007770] RSP: 002b:00007ffe41dfd270 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1522.099122] RAX: ffffffffffffffda RBX: 00005b3088957830 RCX: 0000770e9aa33c5b
[ 1522.185273] RDX: 00007ffe41dfd2e0 RSI: 0000000000005a0c RDI: 0000000000000003
[ 1522.271427] RBP: 00007ffe41e00cd0 R08: 0000000000000000 R09: 0000000000000000
[ 1522.357574] R10: 0000000000000000 R11: 0000000000000246 R12: 00005b3088951298
[ 1522.443720] R13: 00007ffe41dfd2e0 R14: 00007ffe41e00890 R15: 00005b308894c4e0
[ 1522.529864]  </TASK>
[ 1522.556688] Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel 8021q garp mrp softdog msr sunrpc binfmt_misc nfnetlink_log bonding tls dm_crypt intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel dell_wmi sha256_ssse3 dell_smbios sha1_ssse3 aesni_intel ipmi_ssif dell_wmi_descriptor crypto_simd cryptd ledtrig_audio sparse_keymap mgag200 video drm_shmem_helper ipmi_si rapl drm_kms_helper dcdbas input_leds joydev mei_me ipmi_devintf pcspkr i2c_algo_bit intel_cstate mei mxm_wmi mac_hid ipmi_msghandler acpi_power_meter nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nft_ct nft_redir vhost_net vhost vhost_iotlb tap nft_chain_nat nf_nat nf_conntrack nbd nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 zfs(PO)
[ 1522.556842]  spl(O) raid10 raid456 hid_generic async_raid6_recov async_memcpy async_pq usbkbd usbmouse async_xor async_tx usbhid xor hid raid6_pq libcrc32c raid0 multipath linear simplefb raid1 xhci_pci xhci_pci_renesas ehci_pci tg3 lpc_ich crc32_pclmul xhci_hcd ehci_hcd ahci libahci megaraid_sas wmi
[ 1523.960768] CR2: 0000000000000088
[ 1524.001324] ---[ end trace 0000000000000000 ]---
[ 1524.133180] RIP: 0010:vdev_passivate+0x113/0x1a0 [zfs]
[ 1524.196050] Code: 00 00 00 31 d2 eb 09 48 83 c2 01 48 39 d7 74 3a 49 8b 0c d0 49 39 ce 74 ee 48 81 79 60 c0 17 b0 c0 74 e4 48 8b b1 98 2b 00 00 <48> 3b 86 88 00 00 00 75 d4 48 83 b9 d0 2c 00 00 00 0f 84 15 ff ff
[ 1524.422737] RSP: 0018:ffffc328cb37fc78 EFLAGS: 00010202
[ 1524.486243] RAX: ffff9ead402e7c00 RBX: ffff9ead558f4000 RCX: ffff9ead62f8c000
[ 1524.572667] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000005
[ 1524.659104] RBP: ffffc328cb37fca0 R08: ffff9eb156f24f00 R09: 0000000000000000
[ 1524.745499] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc328cb37fce8
[ 1524.831830] R13: ffff9ead63b84800 R14: ffff9ead62f7c000 R15: ffff9ead62f74000
[ 1524.918282] FS:  0000770e9a2c9800(0000) GS:ffff9ecc3fc40000(0000) knlGS:0000000000000000
[ 1525.016178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1525.085972] CR2: 0000000000000088 CR3: 0000000bc9d20006 CR4: 00000000001706e0

My system is similar to OPs: Proxmox 8.1.10 Kernel 6.5.13-5-pve OpenZFS 2.2.3

I'm guessing my only option besides waiting for a fix is to recreate the pool entirely to get rid of the indirect vdevs?