openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.51k stars 1.74k forks source link

Re-silvering multiple mirrors needlessly resilvers some mirrors twice #15249

Open duskmoss opened 1 year ago

duskmoss commented 1 year ago

System information

FreeBSD | 13.2-RELEASE-P1 Architecture | x86-64 OpenZFS Version | zfs-2.1.9-FreeBSD_g92e0d9d18

Describe the problem you're observing

If you attach multiple mirrors in short succession - then two re-silvers happen and some mirrors get re-silvered twice.

I have a pool with 3 vdevs that are each a 2 disk mirror. I had detached one side of 2 of my mirrors temporarily. I reattached them, then detached the mirror that was not re-silvering, and reattached it.

During the resilver I noticed that only 2 mirrors had active read and write operations (via zpool iostat). Zpool status showed re-silvering next to all three mirrors. The re-silver came to 100% completion with a total amount re-silvered matching 2 vdevs. Immediately after completing a new re-silver operation began. Now only 2 vdevs showed as re-silvering in zpool status, and the same two showed active read and write operations in zpool iostat. Again 100% completion showed with amount re-silvered matching 2 vdevs.

However as I only had 3 vdevs this means one was re-silvered twice. Mirror-1 was re-silvered in the initial operation and the second operation.

When I went to reproduce a slightly different version of the bug occurred. I detached all three mirrors then reattached them. This time all three mirrors were actively being re-silvered according to iostat, and the count of data re-silvered in zpool status counted up to a total matching all three vdevs. However when re-silvering completed it immediately started another re-silver with 2 vdevs. The one I attached first was complete.

Describe how to reproduce the problem

To reproduce the second bug take a pool with three mirrors and detach one side of each mirror. Attach all three mirrors. Notice they're all marked as re-silvering in zpool status and are actively being copies in zpool iostat. Wait for re-silver to complete. Notice a new re-silver operation has started with the two mirrors you attached second.

zpool detach hottub da6p2
zpool detach hottub da3p2
zpool detach hottub da0p2
zpool attach hottub da7p2 da6p2
zpool attach hottub da4p2 da3p2
zpool attach hottub da2p2 da0p2

To reproduce the original case take a pool with 3 mirrors. Detach two mirrors. Reattach those two mirrors. When the re-silver is underway detach the mirror you did not originally detach. Then reattach it. Check zpool status to see all three mirrors marked as re-silvering, then check zpool iostat to see that only 2 mirrors are actively being copied (mirror-1 and mirror-2). Wait for these mirrors to finish re-silvering. Observe that now mirror-0 and mirror-1 are marked as re-silvering in zpool status and have visible activity in zpool iostat.

zpool detach hottub da7p2
zpool detach hottub da4p2
zpool attach hottub da6p2 da7p2
zpool attach hottub da3p2 da4p2
zpool detach hottub da2p2
zpool attach hottub da0p2 da2p2

NB: In both cases I attached mirrors starting at the bottom of the vdev listing. I do not know if order matters, and don't have time to test again tonight with a different order.

I have not tried similar states with a pool with more mirror vdevs. I may try this soon as I am adding some vdevs to this pool. I haven't tried with just two mirrors resilvering yet either.

I have not tried with RaidZ vdevs and do not have a reasonable

Include any warning/errors/backtraces from the system logs

originally noticed it here:

2023-09-08.01:46:06 [txg:813237] scan setup func=2 mintxg=3 maxtxg=813237
2023-09-08.01:46:08 [txg:813239] vdev attach attach vdev=/dev/da7p2 to vdev=/dev/da6p2
2023-09-08.01:46:08 zpool attach hottub da6p2 da7p2
2023-09-08.01:46:30 [txg:813245] vdev attach attach vdev=/dev/da4p2 to vdev=/dev/da3p2
2023-09-08.01:46:30 zpool attach hottub da3p2 da4p2
2023-09-08.01:47:39 [txg:813259] detach vdev=/dev/da0p2
2023-09-08.01:47:39 zpool detach hottub da0p2
2023-09-08.01:48:01 [txg:813265] vdev attach attach vdev=/dev/da0p2 to vdev=/dev/da2p2
2023-09-08.01:48:01 zpool attach hottub da2p2 da0p2
2023-09-08.02:11:55 [txg:813550] scan done errors=0
2023-09-08.02:11:55 [txg:813550] starting deferred resilver errors=0
2023-09-08.02:12:00 [txg:813551] scan setup func=2 mintxg=3 maxtxg=813263
2023-09-08.02:34:33 [txg:813822] scan done errors=0

repro attempt 1, 3 mirrors re-silvered first pass, 2 re-silvered second

2023-09-08.02:39:35 [txg:813887] scan setup func=2 mintxg=3 maxtxg=813887
2023-09-08.02:39:38 [txg:813889] vdev attach attach vdev=/dev/da6p2 to vdev=/dev/da7p2
2023-09-08.02:39:38 zpool attach hottub da7p2 da6p2
2023-09-08.02:40:00 [txg:813895] vdev attach attach vdev=/dev/da3p2 to vdev=/dev/da4p2
2023-09-08.02:40:00 zpool attach hottub da4p2 da3p2
2023-09-08.02:40:27 [txg:813902] vdev attach attach vdev=/dev/da0p2 to vdev=/dev/da2p2
2023-09-08.02:40:27 zpool attach hottub da2p2 da0p2
2023-09-08.03:04:15 [txg:814189] scan done errors=0
2023-09-08.03:04:15 [txg:814189] starting deferred resilver errors=0
2023-09-08.03:04:20 [txg:814190] scan setup func=2 mintxg=3 maxtxg=813900
2023-09-08.03:26:30 [txg:814458] scan done errors=0

repro attempt 2, 2 mirrors re-silvered first pass, 2 re-silvered second

2023-09-08.03:27:45 [txg:814477] scan setup func=2 mintxg=3 maxtxg=814477
2023-09-08.03:27:47 [txg:814479] vdev attach attach vdev=/dev/da7p2 to vdev=/dev/da6p2
2023-09-08.03:27:47 zpool attach hottub da6p2 da7p2
2023-09-08.03:28:09 [txg:814485] vdev attach attach vdev=/dev/da4p2 to vdev=/dev/da3p2
2023-09-08.03:28:09 zpool attach hottub da3p2 da4p2
2023-09-08.03:28:37 [txg:814491] detach vdev=/dev/da2p2
2023-09-08.03:28:37 zpool detach hottub da2p2
2023-09-08.03:29:15 [txg:814500] vdev attach attach vdev=/dev/da2p2 to vdev=/dev/da0p2
2023-09-08.03:29:15 zpool attach hottub da0p2 da2p2
2023-09-08.03:49:57 [txg:814755] scan done errors=0
2023-09-08.03:49:57 [txg:814755] starting deferred resilver errors=0
2023-09-08.03:50:02 [txg:814756] scan setup func=2 mintxg=3 maxtxg=814498

(missing second "scan done" because I need to go to bed instead of waiting 22+ minutes)

rincebrain commented 1 year ago

Yeah, that's a feature.

A feature poorly implemented, but a feature.

See #14505

duskmoss commented 1 year ago

Disks getting written completely twice does not seem like an intentional result of having deferred re-silvering

Especially since that's extra endurance burned on SSDs.

The drive that wasn't yet being resilvered and then got resilvered is a feature. Drives being resilvered on one running and then again on a deferral seems like a bug.

duskmoss commented 1 year ago

Oops didn't mean to hit close

rincebrain commented 1 year ago

Ah, I see, either I read too fast or my brain is just more roasted than I ever gave it credit for. Why not both.

I might speculate that 056a658dee00cab7cd42e6146f3fa0690f07c93e (which is not in 2.1) might make your life nicer, but I'm not certain.

jumbi77 commented 1 year ago

Ah, I see, either I read too fast or my brain is just more roasted than I ever gave it credit for. Why not both.

I might speculate that 056a658dee00cab7cd42e6146f3fa0690f07c93e (which is not in 2.1) might make your life nicer, but I'm not certain.

FYI: I just checked the 2.1 branch and your referenced commit seems to be in 2.1: https://github.com/openzfs/zfs/commit/4ac37f8b2e6bdbfb3a0fd2ca56aedf05114719e8