Open duskmoss opened 1 year ago
Yeah, that's a feature.
A feature poorly implemented, but a feature.
See #14505
Disks getting written completely twice does not seem like an intentional result of having deferred re-silvering
Especially since that's extra endurance burned on SSDs.
The drive that wasn't yet being resilvered and then got resilvered is a feature. Drives being resilvered on one running and then again on a deferral seems like a bug.
Oops didn't mean to hit close
Ah, I see, either I read too fast or my brain is just more roasted than I ever gave it credit for. Why not both.
I might speculate that 056a658dee00cab7cd42e6146f3fa0690f07c93e (which is not in 2.1) might make your life nicer, but I'm not certain.
Ah, I see, either I read too fast or my brain is just more roasted than I ever gave it credit for. Why not both.
I might speculate that 056a658dee00cab7cd42e6146f3fa0690f07c93e (which is not in 2.1) might make your life nicer, but I'm not certain.
FYI: I just checked the 2.1 branch and your referenced commit seems to be in 2.1: https://github.com/openzfs/zfs/commit/4ac37f8b2e6bdbfb3a0fd2ca56aedf05114719e8
System information
FreeBSD | 13.2-RELEASE-P1 Architecture | x86-64 OpenZFS Version | zfs-2.1.9-FreeBSD_g92e0d9d18
Describe the problem you're observing
If you attach multiple mirrors in short succession - then two re-silvers happen and some mirrors get re-silvered twice.
I have a pool with 3 vdevs that are each a 2 disk mirror. I had detached one side of 2 of my mirrors temporarily. I reattached them, then detached the mirror that was not re-silvering, and reattached it.
During the resilver I noticed that only 2 mirrors had active read and write operations (via zpool iostat). Zpool status showed re-silvering next to all three mirrors. The re-silver came to 100% completion with a total amount re-silvered matching 2 vdevs. Immediately after completing a new re-silver operation began. Now only 2 vdevs showed as re-silvering in zpool status, and the same two showed active read and write operations in zpool iostat. Again 100% completion showed with amount re-silvered matching 2 vdevs.
However as I only had 3 vdevs this means one was re-silvered twice. Mirror-1 was re-silvered in the initial operation and the second operation.
When I went to reproduce a slightly different version of the bug occurred. I detached all three mirrors then reattached them. This time all three mirrors were actively being re-silvered according to iostat, and the count of data re-silvered in zpool status counted up to a total matching all three vdevs. However when re-silvering completed it immediately started another re-silver with 2 vdevs. The one I attached first was complete.
Describe how to reproduce the problem
To reproduce the second bug take a pool with three mirrors and detach one side of each mirror. Attach all three mirrors. Notice they're all marked as re-silvering in zpool status and are actively being copies in zpool iostat. Wait for re-silver to complete. Notice a new re-silver operation has started with the two mirrors you attached second.
To reproduce the original case take a pool with 3 mirrors. Detach two mirrors. Reattach those two mirrors. When the re-silver is underway detach the mirror you did not originally detach. Then reattach it. Check zpool status to see all three mirrors marked as re-silvering, then check zpool iostat to see that only 2 mirrors are actively being copied (mirror-1 and mirror-2). Wait for these mirrors to finish re-silvering. Observe that now mirror-0 and mirror-1 are marked as re-silvering in zpool status and have visible activity in zpool iostat.
NB: In both cases I attached mirrors starting at the bottom of the vdev listing. I do not know if order matters, and don't have time to test again tonight with a different order.
I have not tried similar states with a pool with more mirror vdevs. I may try this soon as I am adding some vdevs to this pool. I haven't tried with just two mirrors resilvering yet either.
I have not tried with RaidZ vdevs and do not have a reasonable
Include any warning/errors/backtraces from the system logs
originally noticed it here:
repro attempt 1, 3 mirrors re-silvered first pass, 2 re-silvered second
repro attempt 2, 2 mirrors re-silvered first pass, 2 re-silvered second
(missing second "scan done" because I need to go to bed instead of waiting 22+ minutes)