openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.65k stars 1.75k forks source link

Some extra intelligence for replace_with_spare() #12376

Open soyfrien opened 3 years ago

soyfrien commented 3 years ago

Describe the feature would like to see added to OpenZFS

The function replace_with_spare() behaves as if all vdevs are uniform. The ease of working with OpenZFS has enhanced the possibility that mirrored vdevs will differ as they are modified over time. On a mirror vdev with one spare that has run out of space, it is entirely plausible that a user would add another mirrored vdev to the pool, but perhaps this vdev has a much higher capacity than the original vdev, as does the accompanying spare the diligent pool owner also supplied and registered with ZFS. Let's imagine this happens a third time, with the final vdev and space being larger than the previous two.

replace_with_spare() seems to behave as it it will attempt a replacement based on decisions not clear to this casual reader. I am of the assumption that it will correctly fail to put a small spare in a vdev requiring a larger one. But I also believe it may put a spare that will satisfy the pool as a replacement, for mirror_0 or mirror_1, both with physical capacities lower than it, when the user expected this largest spare drive only to be used in the largest vdev. As they expect the middle sized one only to be used in the middle sized vdev, even-though it could satisfy the smallest vdev.

I propose we optimize the choice of a spare in such a way that, the maximum number of spares can be used after replace_with_spare() runs. If the maximum is 3 before the function runs, it should be 2 after, and again then 1, only 0 when there are no available spares.

Except for the condition that such optimization is not possible. In this case it should chose to make a replacement for system integrity.

How will this feature improve OpenZFS?

ZFS has a simple command line interface and is user friendly. Users are able to build simply looking filesystems that are deceivingly complex. The codebase should keep these users happy, silently making the best choices for them.

In this case, we can eliminate the chance that the wrong spare will be used in the wrong vdev.

Additional context

Example pool borrowed from Server Fault, under licensed under cc by-sa. rev 2021.7.15.39754. Please use it when considering the example scenario described above.

Quote Max ZFS codebase for pool: zfs-linux (0.7.5-1ubuntu16.11)

Imagine a pool that grew unexpectedly, by adding larger mirrors in terms of physical disk capacity. Spares went in tow. New mirror, new spare. SAS Enterprise Grade on HBA. Mirror 0 is smaller than 1 and 1 is smaller than 2. Each mirror has an appropriately sized spare.


      pool: glue
     state: ONLINE
      scan: scrub repaired 0B in 27h55m with 0 errors on Mon Jul 12 04:19:14 2021
    config:

        NAME                        STATE     READ WRITE CKSUM
        glue                        ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x5000cca2a501f240  ONLINE       0     0     0
            wwn-0x5000cca2975af090  ONLINE       0     0     0
          mirror-1                  ONLINE       0     0     0
            wwn-0x5000cca271340e4c  ONLINE       0     0     0
            wwn-0x5000cca27134c71c  ONLINE       0     0     0
          mirror-2                  ONLINE       0     0     0
            wwn-0x5000cca2972cce94  ONLINE       0     0     0
            wwn-0x5000cca298192df4  ONLINE       0     0     0
        spares
          wwn-0x5000cca2558480fc    AVAIL   
          wwn-0x5000cca2972be67c    AVAIL   
          wwn-0x5000c50083bbae43    AVAIL   

    errors: No known data errors

That's what it might look like. If automatic spares use autoreplace (sorry, I don't know the correct terminology, please edit this) and a small spare tries mirror a disk that is larger than it, will the pool break, or is there an error we can scan for?

Or will autoreplace do checks to make sure spares join mirrors of the same or smaller size? In that case, is it possible for the largest spare to join the smallest mirror?

I'd be happy to take a look at the code if you can point me. Even more, I would love to give you an upvote and a check.

You can see the current implementation here function here. [On Ubuntu 18.04]

Thank you for your hard work and consideration, Louis

burk80 commented 3 years ago

each vdev has a set size?? so if vdev 1 has 2tb disks vdev 2 has 3tb disks and vdev 3 has 4tb disks .. a 4tb disk should be able to be in all 3 vdevs .. a 2 tb only in 1 aso ..

if size >= broken disk ..

soyfrien commented 3 years ago

Right, but in that scenario it would be better for a 3 TB spare to replace it than a 4 TB spare, if the spares were 2, 3 and 4 TBs.