openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.5k stars 1.74k forks source link

zpool cannot remove vdev #14312

Open gchmurka123 opened 1 year ago

gchmurka123 commented 1 year ago

zfs-2.1.4-1 zfs-kmod-2.1.5-1

when i try remove mirror-1 vdev i have error: cannot remove mirror-1: invalid config; all top-level vdevs must have the same sector size and not be raidz.

How to reproduce:

  1. create pool with 1 mirror (2 part of hard drive), default ashift
# zpool create vservers mirror  wwn-0x5000cca097c66efd-part3 wwn-0x5000cca097c254b8-part3

# zpool status
  pool: vservers
 state: ONLINE
        vservers                          ONLINE       0     0     0
          mirror-0                        ONLINE       0     0     0
            wwn-0x5000cca097c66efd-part3  ONLINE       0     0     0
            wwn-0x5000cca097c254b8-part3  ONLINE       0     0     0

errors: No known data errors

# zpool list 
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
vservers  3.61T  3.25G  3.61T        -         -     0%     0%  1.00x    ONLINE  -
  1. add a mirror-1 consisting of 2 files (no ashift set = default)
# dd if=/dev/zero of=/1 bs=1M count=1000
# dd if=/dev/zero of=/2 bs=1M count=1000

# zpool add vservers mirror /1 /2

# zpool status
  pool: vservers
 state: ONLINE
  scan: resilvered 16.0M in 00:00:01 with 0 errors on Thu Dec 22 10:39:07 2022
config:

        NAME                              STATE     READ WRITE CKSUM
        vservers                          ONLINE       0     0     0
          mirror-0                        ONLINE       0     0     0
            wwn-0x5000cca097c66efd-part3  ONLINE       0     0     0
            wwn-0x5000cca097c254b8-part3  ONLINE       0     0     0
          mirror-1                        ONLINE       0     0     0
            /1                            ONLINE       0     0     0
            /2                            ONLINE       0     0     0

errors: No known data errors
  1. Try remove mirror-1
zpool remove vservers mirror-1
cannot remove mirror-1: invalid config; all top-level vdevs must have the same sector size and not be raidz.

My disk has:

        Logical  Sector size:                   512 bytes
        Physical Sector size:                  4096 bytes

my pool alocation:

# zpool list 
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
vservers  3.61T  3.25G  3.61T        -         -     0%     0%  1.00x    ONLINE  -

mu pool ashift:

# zpool get ashift vservers 
NAME      PROPERTY  VALUE   SOURCE
vservers  ashift    0       default
gmelikov commented 1 year ago

Please show zdb -C output, it will show actual ashifts.

gchmurka123 commented 1 year ago
# zdb -C

    vdev_children: 2
    vdev_tree:
        type: 'root'
        id: 0
         children[0]:
            type: 'mirror'
            ashift: 12
             children[0]:
                type: 'disk'
                path: '/dev/disk/by-id/wwn-0x5000cca097c66efd-part3'
               children[1]:
                type: 'disk'
                 path: '/dev/disk/by-id/wwn-0x5000cca097c254b8-part3'

           children[1]:
            type: 'mirror'
            id: 1
            ashift: 9
             children[0]:
                type: 'file'
                path: '/1'
              children[1]:
                type: 'file'
                path: '/2'

vdev mirror-1 was created in files which stored in ext4 fs (default blocksize=4k), I don't understand why zpool used ashift=9

The expected behavior would be that when adding vdev to a pool, zpool would automatically match the best pool ashift. Only when the user forces ashift (zpool add -o ashift=x poolname devices) zpool should use another

gmelikov commented 1 year ago

zpool add will use best ashift per disk now, not per pool. So that's the reason, unfortunately works as expected now (zpool remove itself, at least).

ghost commented 1 year ago

scenario:

Current approach blocks removing of vdev if ashift will differ, that means user will have to rebuild whole pool I think, "zpool add" should at least warn before adding vdev that would cause mixed ashift scenario

yshui commented 1 year ago

I am having this problem right now. Seems illogical to allow mixed ashift when adding a disk, then complaining about it when trying to remove.

Since mixed ashift pool apparently works, is there just a check that needs to be removed from zpool-remove? Or is it actually not possible to remove vdev from a mixed pool?

EDIT: I see... removal operation needs to keep a block level mapping table for the removed vdev. I assume that's why it requires all vdev to have the same ashift. And it doesn't support removal of raidz vdevs either.

Seems like a big trap for inexperienced user (like me) to fall into.

samvv commented 10 months ago

Well this was awkward ... I was adding the wrong device to my zpool and suddenly I have to rebuild my entire pool from scratch because of this bug.

I really appreciate the time and effort that is put into this project but I really hope this bug can be fixed.

AllKind commented 10 months ago

It's in the works: https://github.com/openzfs/zfs/pull/15509

XANi commented 5 months ago

We've had that happen without user manually setting ashift, just by adding a hard disk

Sooo it is unfixable and requires re-creating everything ?

zpool add will use best ashift per disk now, not per pool.

why it works like that, it's a massive footgun. We just had junior admin following the usual instructions that happened to use the replacement drive that got ashift 12 instead of rest of the pool's 9 and now it is unfixable.

tkittich commented 1 month ago

Hi,

Not sure if I have the same problem, but it seems my pool has the same ashift but I also couldn't remove a mirror special vdev. Also got the same error message.

root@h00:~# zpool --version
zfs-2.2.4-pve1
zfs-kmod-2.2.4-pve1
root@h00:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 15:46:08 with 0 errors on Sun Jul 14 16:10:09 2024
config:

        NAME                                                    STATE     READ WRITE CKSUM
        rpool                                                   ONLINE       0     0     0
          raidz2-0                                              ONLINE       0     0     0
            ata-WDC_WD80EFBX-68AZZN0_VRGP4JMK-part3             ONLINE       0     0     0
            ata-WDC_WD80EFBX-68AZZN0_VRGTD3EK-part3             ONLINE       0     0     0
            ata-WDC_WD80EFBX-68AZZN0_VRGSUW4K-part3             ONLINE       0     0     0
            ata-WDC_WD80EFBX-68AZZN0_VRGTG4SK-part3             ONLINE       0     0     0
            ata-WDC_WD80EFAX-68KNBN0_VDGZ7E4D-part3             ONLINE       0     0     0
            ata-WDC_WD80EFAX-68KNBN0_VAGV3BYL-part3             ONLINE       0     0     0
        special
          mirror-8                                              ONLINE       0     0     0
            ata-SPCC_Solid_State_Disk_AA230719S325605518-part2  ONLINE       0     0     0
            ata-ZADAK_TWSS3_256GB_2021020109006274-part2        ONLINE       0     0     0
        logs
          mirror-7                                              ONLINE       0     0     0
            ata-SPCC_Solid_State_Disk_AA230719S325605518-part1  ONLINE       0     0     0
            ata-ZADAK_TWSS3_256GB_2021020109006274-part1        ONLINE       0     0     0
        cache
          ata-SPCC_Solid_State_Disk_AA230719S325605518-part4    ONLINE       0     0     0
          ata-ZADAK_TWSS3_256GB_2021020109006274-part4          ONLINE       0     0     0

errors: No known data errors

Got an error removing a mirror special vdev:

root@h00:~# zpool remove rpool mirror-8
cannot remove mirror-8: invalid config; all top-level vdevs must have the same sector size and not be raidz.

zdb -C

root@h00:~# zdb -C
rpool:
    version: 5000
    name: 'rpool'
    state: 0
    txg: 9472559
    pool_guid: 14857977243201417359
    errata: 0
    hostid: 3568171833
    hostname: 'h00.xyz'
    com.delphix:has_per_vdev_zaps
    hole_array[0]: 1
    hole_array[1]: 2
    hole_array[2]: 3
    hole_array[3]: 4
    hole_array[4]: 5
    hole_array[5]: 6
    vdev_children: 9
    vdev_tree:
        type: 'root'
        id: 0
        guid: 14857977243201417359
        create_txg: 4
        com.klarasystems:vdev_zap_root: 138
        children[0]:
            type: 'raidz'
            id: 0
            guid: 10924420287307297835
            nparity: 2
            metaslab_array: 136
            metaslab_shift: 34
            ashift: 12
            asize: 47999446155264
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 129
            children[0]:
                type: 'disk'
                id: 0
                guid: 13796952775983555730
                path: '/dev/disk/by-id/ata-WDC_WD80EFBX-68AZZN0_VRGP4JMK-part3'
                whole_disk: 0
                DTL: 8372
                create_txg: 4
                com.delphix:vdev_zap_leaf: 130
            children[1]:
                type: 'disk'
                id: 1
                guid: 11821346028286358251
                path: '/dev/disk/by-id/ata-WDC_WD80EFBX-68AZZN0_VRGTD3EK-part3'
                whole_disk: 0
                DTL: 8371
                create_txg: 4
                com.delphix:vdev_zap_leaf: 131
            children[2]:
                type: 'disk'
                id: 2
                guid: 12605929507524345983
                path: '/dev/disk/by-id/ata-WDC_WD80EFBX-68AZZN0_VRGSUW4K-part3'
                whole_disk: 0
                DTL: 8370
                create_txg: 4
                com.delphix:vdev_zap_leaf: 132
            children[3]:
                type: 'disk'
                id: 3
                guid: 3289633560704325600
                path: '/dev/disk/by-id/ata-WDC_WD80EFBX-68AZZN0_VRGTG4SK-part3'
                whole_disk: 0
                DTL: 8369
                create_txg: 4
                com.delphix:vdev_zap_leaf: 133
            children[4]:
                type: 'disk'
                id: 4
                guid: 2939265759442187858
                path: '/dev/disk/by-id/ata-WDC_WD80EFAX-68KNBN0_VDGZ7E4D-part3'
                whole_disk: 0
                DTL: 8368
                create_txg: 4
                com.delphix:vdev_zap_leaf: 134
            children[5]:
                type: 'disk'
                id: 5
                guid: 13901960047013718557
                path: '/dev/disk/by-id/ata-WDC_WD80EFAX-68KNBN0_VAGV3BYL-part3'
                whole_disk: 0
                DTL: 8367
                create_txg: 4
                com.delphix:vdev_zap_leaf: 135
        children[1]:
            type: 'hole'
            id: 1
            guid: 0
            whole_disk: 0
            metaslab_array: 0
            metaslab_shift: 0
            ashift: 0
            asize: 0
            is_log: 0
            is_hole: 1
        children[2]:
            type: 'hole'
            id: 2
            guid: 0
            whole_disk: 0
            metaslab_array: 0
            metaslab_shift: 0
            ashift: 0
            asize: 0
            is_log: 0
            is_hole: 1
        children[3]:
            type: 'hole'
            id: 3
            guid: 0
            whole_disk: 0
            metaslab_array: 0
            metaslab_shift: 0
            ashift: 0
            asize: 0
            is_log: 0
            is_hole: 1
        children[4]:
            type: 'hole'
            id: 4
            guid: 0
            whole_disk: 0
            metaslab_array: 0
            metaslab_shift: 0
            ashift: 0
            asize: 0
            is_log: 0
            is_hole: 1
        children[5]:
            type: 'hole'
            id: 5
            guid: 0
            whole_disk: 0
            metaslab_array: 0
            metaslab_shift: 0
            ashift: 0
            asize: 0
            is_log: 0
            is_hole: 1
        children[6]:
            type: 'hole'
            id: 6
            guid: 0
            whole_disk: 0
            metaslab_array: 0
            metaslab_shift: 0
            ashift: 0
            asize: 0
            is_log: 0
            is_hole: 1
        children[7]:
            type: 'mirror'
            id: 7
            guid: 870147496990295746
            metaslab_array: 1409
            metaslab_shift: 29
            ashift: 12
            asize: 8585216000
            is_log: 1
            create_txg: 9472535
            com.delphix:vdev_zap_top: 777
            children[0]:
                type: 'disk'
                id: 0
                guid: 11348972443591950374
                path: '/dev/disk/by-id/ata-SPCC_Solid_State_Disk_AA230719S325605518-part1'
                whole_disk: 0
                create_txg: 9472535
                com.delphix:vdev_zap_leaf: 778
            children[1]:
                type: 'disk'
                id: 1
                guid: 5765583175076336363
                path: '/dev/disk/by-id/ata-ZADAK_TWSS3_256GB_2021020109006274-part1'
                whole_disk: 0
                create_txg: 9472535
                com.delphix:vdev_zap_leaf: 779
        children[8]:
            type: 'mirror'
            id: 8
            guid: 3352077960722362674
            metaslab_array: 394
            metaslab_shift: 29
            ashift: 12
            asize: 25765085184
            is_log: 0
            create_txg: 9472546
            com.delphix:vdev_zap_top: 899
            children[0]:
                type: 'disk'
                id: 0
                guid: 15807543372832174255
                path: '/dev/disk/by-id/ata-SPCC_Solid_State_Disk_AA230719S325605518-part2'
                whole_disk: 0
                create_txg: 9472546
                com.delphix:vdev_zap_leaf: 900
            children[1]:
                type: 'disk'
                id: 1
                guid: 9446007039320661796
                path: '/dev/disk/by-id/ata-ZADAK_TWSS3_256GB_2021020109006274-part2'
                whole_disk: 0
                create_txg: 9472546
                com.delphix:vdev_zap_leaf: 901
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
        com.klarasystems:vdev_zaps_v2
amotin commented 1 month ago

@tkittich You was told "all top-level vdevs ... and not be raidz". Your pool has raidz2 vdev.