openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.69k stars 1.75k forks source link

Null dereference while tyring to zpool remove #16786

Open pshirshov opened 1 week ago

pshirshov commented 1 week ago

System information

Type Version/Name
Distribution Name NixOS
Distribution Version 24.11.20241011.a28d979 (Vicuna)
Kernel Version 6.6.56
Architecture x86_64
OpenZFS Version zfs-2.2.6-1 zfs-kmod-2.2.6-1

Describe the problem you're observing

I have the following pool:

❯ zpool status -v
  pool: storage-main
 state: ONLINE
  scan: scrub repaired 0B in 13:09:39 with 0 errors on Fri Nov  1 14:59:55 2024
remove: Removal of vdev 1 copied 8.33T in 12h16m, completed on Wed Nov 20 07:31:18 2024
    22.5M memory used for removed device mappings
config:

    NAME                                   STATE     READ WRITE CKSUM
    storage-main                           ONLINE       0     0     0
     mirror-0                             ONLINE       0     0     0
       virtio-ST16T-ZL267DH30000C0-part1  ONLINE       0     0     0
       virtio-WD16T-3WHSW4HP-part1        ONLINE       0     0     0
     mirror-5                             ONLINE       0     0     0
       virtio-ST24-ZYD1GEJA               ONLINE       0     0     0
       virtio-WDC24-65JWH7WB              ONLINE       0     0     0
     mirror-6                             ONLINE       0     0     0
       virtio-ST24-ZYD0EMCT               ONLINE       0     0     0
       virtio-WDC24-65JWJ9JB              ONLINE       0     0     0
    logs    
     mirror-4                             ONLINE       0     0     0
       virtio-WD-2T-21210P800016-part1    ONLINE       0     0     0
       virtio-WD-2T-21210P800027-part1    ONLINE       0     0     0
    cache
     virtio-SAMSUNG-256G-S42VNF0-part1    ONLINE       0     0     0

errors: No known data errors

I'm trying to:

 zpool remove storage-main mirror-0

Immediately I'm getting a stacktrace from the kernel:

[  317.434903] BUG: kernel NULL pointer dereference, address: 0000000000000088
[  317.435083] #PF: supervisor read access in kernel mode
[  317.435174] #PF: error_code(0x0000) - not-present page
[  317.435219] PGD 80000001d797e067 P4D 80000001d797e067 PUD 1d7900067 PMD 0
[  317.435268] Oops: 0000 [#1] PREEMPT SMP PTI
[  317.435313] CPU: 1 PID: 5543 Comm: zpool Tainted: P           O       6.6.56 #1-NixOS
[  317.435372] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  317.435435] RIP: 0010:vdev_passivate+0x10c/0x190 [zfs]
[  317.436545] Code: 00 00 00 31 d2 eb 09 48 83 c2 01 48 39 fa 74 3a 49 8b 0c d0 48 39 cb 74 ee 48 81 79 60 60 88 a0 c0 74 e4 48 8b b1 98 2b 00 00 <48> 3b 86 88 00 00 00 75 d4 48 83 b9 d0 2c 00 00 00 0f 84 17 ff ff
[  317.436710] RSP: 0018:ffffb5850ae2fd70 EFLAGS: 00010202
[  317.436821] RAX: ffff99955ebdf400 RBX: ffff9995564ac000 RCX: ffff9995563c8000
[  317.436903] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000007
[  317.436952] RBP: ffff99955afc4000 R08: ffff9995456d0500 R09: 0000000000000000
[  317.437005] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5850ae2fdd0
[  317.437052] R13: ffff999558a9e800 R14: ffff9995567ac000 R15: ffff9995d0391880
[  317.437091] FS:  00007fe4a6cc17c0(0000) GS:ffff999c9fa80000(0000) knlGS:0000000000000000
[  317.437142] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  317.437173] CR2: 0000000000000088 CR3: 00000001ce6b8001 CR4: 0000000000770ee0
[  317.437223] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  317.437256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  317.437291] PKRU: 55555554
[  317.437310] Call Trace:
[  317.437335]  <TASK>
[  317.437366]  ? __die+0x23/0x80
[  317.437429]  ? page_fault_oops+0x171/0x500
[  317.437471]  ? __schedule+0x404/0x1430
[  317.437525]  ? exc_page_fault+0x71/0x160
[  317.437556]  ? asm_exc_page_fault+0x26/0x30
[  317.437613]  ? vdev_passivate+0x10c/0x190 [zfs]
[  317.438651]  spa_vdev_remove+0x7f6/0x9c0 [zfs]
[  317.439236]  ? spa_open_common+0x27f/0x440 [zfs]
[  317.439837]  zfs_ioc_vdev_remove+0x5b/0xa0 [zfs]
[  317.440408]  zfsdev_ioctl_common+0x878/0x9b0 [zfs]
[  317.440975]  ? kvmalloc_node+0x43/0xe0
[  317.441039]  zfsdev_ioctl+0x53/0xe0 [zfs]
[  317.441565]  __x64_sys_ioctl+0x9c/0xe0
[  317.441625]  do_syscall_64+0x39/0x90
[  317.441684]  entry_SYSCALL_64_after_hwframe+0x78/0xe2
[  317.441733] RIP: 0033:0x7fe4a739b79f
[  317.441889] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 28 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  317.442965] RSP: 002b:00007fff7de93ca0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  317.443856] RAX: ffffffffffffffda RBX: 0000000012c96670 RCX: 00007fe4a739b79f
[  317.444743] RDX: 00007fff7de94110 RSI: 0000000000005a0c RDI: 0000000000000003
[  317.445623] RBP: 00007fff7de976f0 R08: 0000000000000000 R09: 0000000000000000
[  317.446500] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000012c991e0
[  317.447377] R13: 00007fff7de94110 R14: 00007fff7de93d10 R15: 0000000012c8f4e0
[  317.448079]  </TASK>
[  317.448601] Modules linked in: xt_MASQUERADE xt_mark nft_chain_nat nf_nat veth af_packet skx_edac_common nfit edac_core libnvdimm cfg80211 cbc encrypted_keys trusted asn1_encoder tee tpm rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_common kvm_intel kvm snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi irqbypass crc32_pclmul iTCO_wdt snd_hda_codec polyval_clmulni polyval_generic intel_pmc_bxt gf128mul ghash_clmulni_intel watchdog sha512_ssse3 snd_hda_core sha256_ssse3 sha1_ssse3 aesni_intel snd_hwdep crypto_simd snd_pcm cryptd xt_comment snd_timer rapl i2c_i801 joydev snd i2c_smbus psmouse soundcore lpc_ich mousedev tiny_power_button evdev intel_agp qxl intel_gtt xt_conntrack drm_ttm_helper ttm button input_leds led_class mac_hid nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 serio_raw ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc fuse efi_pstore
[  317.449043]  configfs nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 hid_generic usbhid hid sr_mod cdrom virtio_net virtio_scsi net_failover failover virtio_blk ahci libahci xhci_pci xhci_pci_renesas libata atkbd libps2 xhci_hcd vivaldi_fmap scsi_mod crct10dif_pclmul crct10dif_common scsi_common i8042 virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev rtc_cmos serio dm_mod dax btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq zfs(PO) spl(O) virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring
[  317.457518] CR2: 0000000000000088
[  317.458076] ---[ end trace 0000000000000000 ]---
[  317.458527] RIP: 0010:vdev_passivate+0x10c/0x190 [zfs]
[  317.459718] Code: 00 00 00 31 d2 eb 09 48 83 c2 01 48 39 fa 74 3a 49 8b 0c d0 48 39 cb 74 ee 48 81 79 60 60 88 a0 c0 74 e4 48 8b b1 98 2b 00 00 <48> 3b 86 88 00 00 00 75 d4 48 83 b9 d0 2c 00 00 00 0f 84 17 ff ff
[  317.460606] RSP: 0018:ffffb5850ae2fd70 EFLAGS: 00010202
[  317.461063] RAX: ffff99955ebdf400 RBX: ffff9995564ac000 RCX: ffff9995563c8000
[  317.461518] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000007
[  317.461977] RBP: ffff99955afc4000 R08: ffff9995456d0500 R09: 0000000000000000
[  317.462450] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb5850ae2fdd0
[  317.462914] R13: ffff999558a9e800 R14: ffff9995567ac000 R15: ffff9995d0391880
[  317.463390] FS:  00007fe4a6cc17c0(0000) GS:ffff999c9fa80000(0000) knlGS:0000000000000000
[  317.463865] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  317.464347] CR2: 0000000000000088 CR3: 00000001ce6b8001 CR4: 0000000000770ee0
[  317.464836] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  317.465327] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  317.465810] PKRU: 55555554

I don't have a reproducer for this, as this only happens on a live pool. Previous removal of another mirror completed successfully.

pshirshov commented 1 week ago

Potentially relevant: https://github.com/openzfs/zfs/issues/13552

It seems like

1) the problem is the presence of the indirect-1 vdev, which are mappings for the previously evacuated mirror. 2) this is a bug, openzfs tries to treat the indirect vdev as a regular one in https://github.com/openzfs/zfs/blob/534688948c395619af328c60ba3b863bfcf2ef20/module/zfs/vdev_removal.c#L192 3) Once you removed a mirror from a pool, it's over, it won't be possible to remove another one and the pool has to be recreated