Open liaoyuxiangqin opened 8 years ago
@liaoyuxiangqin thanks for the clear description of the failure, test case, and analysis. You've found an interesting corner case.
It should like the issue is that the device was automatically expanded when replaced by a spare. Once this happens the vdev can never be replaced with a smaller one, even the original vdev, because sectors past the end of the original device may now be needed.
The most straight forward fix for this would be to never allow a vdev to be auto-expanded when it's replaced with a vdev from the spare pool. @don-brady may have some thoughts on this too.
There are dueling policies here. The best practice for many years is to keep autoexpand property off until you know you're going to intentionally expand. Since expansion is normally a manual process, this works fine. VxVM users will no doubt weigh in with auto sparing horror stories.
The use of autoexpand and sparing represent features to enable policies. The policies themselves can be site-specific, with no general solution. As we get more FMA features implemented, we will find more cases where policies are exposed in conflicting features.
The pool's auto-expanded in the form of configuration to the user which is a reasonable solution for this issue, so the user could be able to control the behavior of the pool and recognize the affect of spare replaced under auto-expanded on, so keep auto-expanded property off in default can reduce the emergence of this problem to some extent, thanks!
Dear All, When testing spare device replacing, pool export and import I ran into a problem with vdev open fail because of the label is missing or invalid.The detail test case as following description:
Conditions: OS:Linux A22770782_00 2.6.33.20 Simulation file: file4(1G), file5(1G), file6(1G), file7(100M)
Test: The first step: Create a raidz pool base on 3 simulation files(file4(1G), file5(1G), file7(100M)) which name is raid5, after that add a spare device file6 to this pool,and then use spare device file6 replacing pool member device file7, also
set pool's autoexpand property value to on
. The pool status is spare device currently in use andSIZE
property value is272M
, more detail information as following:The second step: Remove simulation file7 to /home/ directory, then export and import pool again, so the pool state is degraded and member device file7 can't open, and
EXPANDSZ
property value is2.64G
, more information as following:The third step: Restore simulation file7 from /home/ directory, then perform zpool clear which will reopen all vdev and clear errors. the result is pool member device file7 can't open beacuse the label is missing or invalid, and
SIZE property value change to 2.91G
, more information as following:Solution: In above test case the vdev open fail because its allocatable size has shrunk. The shrunk reason is during import pool in the second step because larger size spare device file6 in use lead to grow up top vdev's vdev_asize value, also increased leaf vdev's vdev_min_asize value. so, when reopen all vdev in third step, the member device file7's corresponding leaf vdev's vdev_asize(about 100M) will less than its vdev_min_asize(about 1G), therefore the vdev open faild and failure code in
vdev_open
function as following:Thanks!