openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.45k stars 1.73k forks source link

vdev open fail because of the label is missing or invalid #5102

Open liaoyuxiangqin opened 8 years ago

liaoyuxiangqin commented 8 years ago

Dear All, When testing spare device replacing, pool export and import I ran into a problem with vdev open fail because of the label is missing or invalid.The detail test case as following description:

Conditions: OS:Linux A22770782_00 2.6.33.20 Simulation file: file4(1G), file5(1G), file6(1G), file7(100M)

Test: The first step: Create a raidz pool base on 3 simulation files(file4(1G), file5(1G), file7(100M)) which name is raid5, after that add a spare device file6 to this pool,and then use spare device file6 replacing pool member device file7, also set pool's autoexpand property value to on. The pool status is spare device currently in use andSIZE property value is 272M, more detail information as following:

[root@A22770782_00 ~]# zpool status raid5
      pool: raid5
     state: ONLINE
      scan: resilvered 39.5K in 0h0m with 0 errors on Fri Sep  2 11:25:05 2016
    config:
            NAME                      STATE     READ WRITE CKSUM
            raid5                     ONLINE       0     0     0
              raidz1-0                ONLINE       0     0     0
                /home/wugang/file4    ONLINE       0     0     0
                /home/wugang/file5    ONLINE       0     0     0
                spare-2               ONLINE       0     0     0
                  /home/wugang/file7  ONLINE       0     0     0
                  /home/wugang/file6  ONLINE       0     0     0
            spares
              /home/wugang/file6      INUSE     currently in use
    errors: No known data errors

    root@A22770782_00 ~]# zpool list
    NAME              SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
    raid5             272M   690K   271M         -     0%     0%  1.00x  ONLINE  -

The second step: Remove simulation file7 to /home/ directory, then export and import pool again, so the pool state is degraded and member device file7 can't open, and EXPANDSZproperty value is 2.64G, more information as following:

[root@A22770782_00 ~]# zpool status raid5
      pool: raid5
     state: DEGRADED
    status: One or more devices could not be opened.  Sufficient replicas exist for
            the pool to continue functioning in a degraded state.
    action: Attach the missing device and online it using 'zpool online'.
       see: http://zfsonlinux.org/msg/ZFS-8000-2Q
      scan: resilvered 39.5K in 0h0m with 0 errors on Fri Sep  2 11:25:05 2016
    config:
            NAME                        STATE     READ WRITE CKSUM
            raid5                       DEGRADED     0     0     0
              raidz1-0                  DEGRADED     0     0     0
                /home/wugang/file4      ONLINE       0     0     0
                /home/wugang/file5      ONLINE       0     0     0
                spare-2                 DEGRADED     0     0     0
                  11832754904235861952  UNAVAIL      0     0     0  was /home/wugang/file7
                  /home/wugang/file6    ONLINE       0     0     0
            spares
              /home/wugang/file6        INUSE     currently in use
    errors: No known data errors

    [root@A22770782_00 ~]# zpool list
    NAME              SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
    raid5             272M   528K   271M     2.64G     2%     0%  1.00x  DEGRADED  -

The third step: Restore simulation file7 from /home/ directory, then perform zpool clear which will reopen all vdev and clear errors. the result is pool member device file7 can't open beacuse the label is missing or invalid, and SIZE property value change to 2.91G, more information as following:

[root@A22770782_00 ~]# zpool clear raid5

    [root@A22770782_00 ~]# zpool status raid5
      pool: raid5
     state: DEGRADED
    status: One or more devices could not be used because the label is missing or
            invalid.  Sufficient replicas exist for the pool to continue
            functioning in a degraded state.
    action: Replace the device using 'zpool replace'.
       see: http://zfsonlinux.org/msg/ZFS-8000-4J
      scan: resilvered 11.5K in 0h0m with 0 errors on Fri Sep  2 11:27:32 2016
    config:
            NAME                    STATE     READ WRITE CKSUM
            raid5                   DEGRADED     0     0     0
              raidz1-0              DEGRADED     0     0     0
                /home/wugang/file4  ONLINE       0     0     0
                /home/wugang/file5  ONLINE       0     0     0
                /home/wugang/file7  UNAVAIL      0     0     0  corrupted data
            spares
              /home/wugang/file6    AVAIL   
    errors: No known data errors

    [root@A22770782_00 ~]# zpool list
    NAME              SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
    raid5            2.91G   381K  2.91G         -     0%     0%  1.00x  DEGRADED  -

Solution: In above test case the vdev open fail because its allocatable size has shrunk. The shrunk reason is during import pool in the second step because larger size spare device file6 in use lead to grow up top vdev's vdev_asize value, also increased leaf vdev's vdev_min_asize value. so, when reopen all vdev in third step, the member device file7's corresponding leaf vdev's vdev_asize(about 100M) will less than its vdev_min_asize(about 1G), therefore the vdev open faild and failure code in vdev_open function as following:

/*
     * Make sure the allocatable size hasn't shrunk.
     */
    if (asize < vd->vdev_min_asize) {
        vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
            VDEV_AUX_BAD_LABEL);
        return (SET_ERROR(EINVAL));
    }

Thanks!

behlendorf commented 8 years ago

@liaoyuxiangqin thanks for the clear description of the failure, test case, and analysis. You've found an interesting corner case.

It should like the issue is that the device was automatically expanded when replaced by a spare. Once this happens the vdev can never be replaced with a smaller one, even the original vdev, because sectors past the end of the original device may now be needed.

The most straight forward fix for this would be to never allow a vdev to be auto-expanded when it's replaced with a vdev from the spare pool. @don-brady may have some thoughts on this too.

richardelling commented 8 years ago

There are dueling policies here. The best practice for many years is to keep autoexpand property off until you know you're going to intentionally expand. Since expansion is normally a manual process, this works fine. VxVM users will no doubt weigh in with auto sparing horror stories.

The use of autoexpand and sparing represent features to enable policies. The policies themselves can be site-specific, with no general solution. As we get more FMA features implemented, we will find more cases where policies are exposed in conflicting features.

liaoyuxiangqin commented 7 years ago

The pool's auto-expanded in the form of configuration to the user which is a reasonable solution for this issue, so the user could be able to control the behavior of the pool and recognize the affect of spare replaced under auto-expanded on, so keep auto-expanded property off in default can reduce the emergence of this problem to some extent, thanks!