Open sempervictus opened 9 years ago
You got me concerned about my 3 pools on 2 servers, so I went to check them. Maybe I am out to lunch, but I can't get this to work at all:
[root@centos7-ha3 ~]# zpool status pool: tank-copy state: ONLINE scan: scrub repaired 0 in 1h44m with 0 errors on Sun Feb 15 01:44:07 2015 config:
NAME STATE READ WRITE CKSUM
tank-copy ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wwn-5000c5002e38e0eb ONLINE 0 0 0
wwn-5000c5002e3aa680 ONLINE 0 0 0
logs
wwn-55cd2e404b4cd14f ONLINE 0 0 0
errors: No known data errors
but:
[root@centos7-ha3 ~]# zdb -mc tank-copy zdb: can't open 'tank-copy': No such file or directory
this fails for all 3 pools. centos 7 using a code drop from back in early December. I checked and double-checked the command syntax from the man page and other folks emails. wth?
this is strange. so i went to my dev testbed and created a scratch pool, a dataset in it and some files in that dataset and ran 'zdb -c foo':
[root@zolbuild hmailserver]# zdb -c foo
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 48 of 127 ...
No leaks (block sum matches space maps exactly)
bp count: 1835
bp logical: 62722048 avg: 34180
bp physical: 60032512 avg: 32715 compression: 1.04
bp allocated: 60284416 avg: 32852 compression: 1.04
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 60284416 used: 0.35%
additional, non-pointer bps of type 0: 506
e.g. all seems ok. this seems to be the same code drop from december 5th as the other two servers. so why doesn't 'zdb -c' work there? are my pools borked in some manner?
It means you don't have a /etc/zfs/zpool.cache
file for your tank-copy
pool. zdb depends on it for imported pools. If the pool is exported you can use zdb -e
instead.
It means you don't have a
/etc/zfs/zpool.cache
file for yourtank-copy
pool. zdb depends on it for imported pools. If the pool is exported you can usezdb -e
instead.
ah, thanks. i understand 'tank' being that way - i am using it on a 2-head jbod via pacemaker/corosync so it isn't in the cachefile. not sure why the others weren't. i will check. thanks again!
something else going on: 'windows' on primary server and 'tank-copy' on backup server ARE in /etc/zfs/zpool.cache, but are not found by zdb...
dunno why those two pools aren't findable normally, but i was able to with:
zdb -m -e -p /dev/disk/by-vdev tank-copy
synopsis:
space map refcount mismatch: expected 313 != actual 303
this is a live pool - i hope this doesn't indicate a problem?
and on the production server:
zdb -m -e -p /dev/disk/by-vdev windows
(snip)
space map refcount mismatch: expected 56 != actual 53
and:
zdb -m -e -p /dev/disk/by-vdev tank
(snip)
[no discrepancies]
Spoke with @ryao in #OpenZFS briefly and he confirms that it may be possible to recompute the maps, but there may be other unforeseen consequences to this path.
Would be nice to find out roughly what percentage of users have space map problems without causing pandemonium in the user community... would help guage the real-world urgency of a change to the scrub procedures.
@sempervictus I absolutely agree we need a tool which can rebuild the space maps.
The good news is that space maps can absolutely be rebuilt, this is effectively what zdb
does today to check for leaks. It walks the entire block tree constructing the space maps in memory and them compares them to the ones stored on disk. If they differ it reports the leak. The bad news is that it's much harder, but not impossible, to do this operation on a live pool where blocks are constantly being allocated and freed.
The easiest way is to extend zdb
, or even better zhack
, so that it can write out new correct space maps after it has calculated them. This would be an offline operation which isn't ideal but at least there would be a utility available. In my opinion extending zhack
is also preferable because zdb
is by design a read-only utility which means it's always safe to run. zhack
on the other hand already has the ability to modify existing pools.
A more complicated, and preferable, solution would be to build this functionality in to scrub which also walks the block tree. We could do something like have scrub build up new space maps from scatch as it traverses all the blocks. All new allocs and frees while the scrub was running would need to update the usual in-memory space maps and additionally the ones being constructed by scrub. Once the scrub completes it should then be able to authoritatively compare both sets and space maps in a txg sync. If they don't match they can be replaced with those generated by scrub.
The devil's going to be in the details here so by default I doubt we want it to repair any damage. It's probably safely just to take any damaged spacemaps offline, which is already supported internally, and then use zdb
/ zhack
offline to verify the diagnosis. Once we're absolutely sure that's working correctly we could enable an online repair.
I'd like to add a little followup to @behlendorf's commentary: It kept popping into my head while reading the recent spate of spacemap issues that some people are under the impression that zfs scrub
is somehow analogous to fsck
when, of course, they're completely different beasts.
A tool which reconstructs spacemaps would be a step in the direction of implementing the mythical fsck.zfs
. There are, of course, a myriad of other semantic and structural issues which could be fixed or worked around by such a tool. For example, it wouldn't be terribly difficult to repair many classes of dnode damage as we've seen caused by the various SA issues.
Given how critical the spacemaps are; frequently fully traversed and ultimately, the database used for block allocation, they're clearly a hot spot and seem to be one of more common places for corruption to occur. This would clearly be a good step toward a more comprehensive fsck.zfs
.
I am having the same error message: space map refcount mismatch: expected 369 != actual 273
zdb -mc reports that nothing leaked (and the same refcount mismatch at the end)
I found the discussion of the same issue here: https://forums.freenas.org/index.php?threads/problems-with-freenas-9-2-1-5.20879/
Citation: "dlavigne, Jun 10, 2014
I asked our in-house ZFS guru who said:
We have never seen this on FreeBSD, it's possibly a ZFS on Linux bug.
It seems to be caused by bad accounting for spacemap_histrogram feature. I don't think it's big deal though, the feature is active and stays active for the lifetime of pool and therefore the refcount no longer matters."
Indeed, "expected 369" in my case is shown as com.delphix:spacemap_histogram = 369 in zhack feature stat output.
So the questions are:
Thanks, Paul.
referencing:
adding :exclamation: marks since this patch is dangerous
:exclamation: http://lists.open-zfs.org/pipermail/developer/2014-July/000732.html https://github.com/wesolows/illumos-joyent/commit/dc4d7e06c8e0af213619f0aa517d819172911005 :exclamation: maybe something inspired by the patch above could be created ?
Just thought I'd try and provide what info I have on this issue to help the developers. I've only been using ZFS for less than a week which means I've just created a brand new RAIDZ pool. I'm using Ubuntu 15.10 (amd64) and my ZFS is from the Ubuntu repos meaning I'm currently on and always have been on ZFS version 0.6.4.2 (Ubuntu package version 0.6.4.2-0ubuntu1.2). Thought this might be interesting to rule out only previous ZFSonLinux version's causing this issue.
While I've only written around 200GB of data to my pool so far, I have already encountered this space map mismatch issue when running the zdb -b command. Before scrubbing I received the following:
sudo zdb -b tank
Traversing all blocks to verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 50 of 130 ...
225G completed (4433MB/s) estimated time remaining: 0hr 00min 01sec
No leaks (block sum matches space maps exactly)
bp count: 1522626
bp logical: 196977910784 avg: 129367
bp physical: 163064331264 avg: 107094 compression: 1.21
bp allocated: 247859871744 avg: 162784 compression: 0.79
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 247859871744 used: 2.77%
additional, non-pointer bps of type 0: 964
space map refcount mismatch: expected 15 != actual 9
I then scrubbed (with apparently 0 errors being detected or repaired) and subsequently ran the zdb -mc command as suggested:
sudo zdb -mc tank
Metaslabs:
vdev 0
metaslabs 130 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 38 free 3.54G
metaslab 1 offset 1000000000 spacemap 61 free 6.21G
metaslab 2 offset 2000000000 spacemap 68 free 26.6G
metaslab 3 offset 3000000000 spacemap 73 free 27.8G
metaslab 4 offset 4000000000 spacemap 74 free 50.4G
metaslab 5 offset 5000000000 spacemap 69 free 42.4G
metaslab 6 offset 6000000000 spacemap 75 free 60.3G
metaslab 7 offset 7000000000 spacemap 0 free 64G
metaslab 8 offset 8000000000 spacemap 0 free 64G
metaslab 9 offset 9000000000 spacemap 0 free 64G
metaslab 10 offset a000000000 spacemap 0 free 64G
metaslab 11 offset b000000000 spacemap 0 free 64G
metaslab 12 offset c000000000 spacemap 0 free 64G
metaslab 13 offset d000000000 spacemap 0 free 64G
metaslab 14 offset e000000000 spacemap 0 free 64G
metaslab 15 offset f000000000 spacemap 0 free 64G
metaslab 16 offset 10000000000 spacemap 0 free 64G
metaslab 17 offset 11000000000 spacemap 0 free 64G
metaslab 18 offset 12000000000 spacemap 0 free 64G
metaslab 19 offset 13000000000 spacemap 0 free 64G
metaslab 20 offset 14000000000 spacemap 0 free 64G
metaslab 21 offset 15000000000 spacemap 0 free 64G
metaslab 22 offset 16000000000 spacemap 0 free 64G
metaslab 23 offset 17000000000 spacemap 0 free 64G
metaslab 24 offset 18000000000 spacemap 0 free 64G
metaslab 25 offset 19000000000 spacemap 37 free 63.8G
metaslab 26 offset 1a000000000 spacemap 0 free 64G
metaslab 27 offset 1b000000000 spacemap 0 free 64G
metaslab 28 offset 1c000000000 spacemap 0 free 64G
metaslab 29 offset 1d000000000 spacemap 0 free 64G
metaslab 30 offset 1e000000000 spacemap 0 free 64G
metaslab 31 offset 1f000000000 spacemap 0 free 64G
metaslab 32 offset 20000000000 spacemap 0 free 64G
metaslab 33 offset 21000000000 spacemap 0 free 64G
metaslab 34 offset 22000000000 spacemap 0 free 64G
metaslab 35 offset 23000000000 spacemap 0 free 64G
metaslab 36 offset 24000000000 spacemap 0 free 64G
metaslab 37 offset 25000000000 spacemap 0 free 64G
metaslab 38 offset 26000000000 spacemap 0 free 64G
metaslab 39 offset 27000000000 spacemap 0 free 64G
metaslab 40 offset 28000000000 spacemap 0 free 64G
metaslab 41 offset 29000000000 spacemap 0 free 64G
metaslab 42 offset 2a000000000 spacemap 0 free 64G
metaslab 43 offset 2b000000000 spacemap 0 free 64G
metaslab 44 offset 2c000000000 spacemap 0 free 64G
metaslab 45 offset 2d000000000 spacemap 0 free 64G
metaslab 46 offset 2e000000000 spacemap 0 free 64G
metaslab 47 offset 2f000000000 spacemap 0 free 64G
metaslab 48 offset 30000000000 spacemap 0 free 64G
metaslab 49 offset 31000000000 spacemap 0 free 64G
metaslab 50 offset 32000000000 spacemap 36 free 64.0G
metaslab 51 offset 33000000000 spacemap 0 free 64G
metaslab 52 offset 34000000000 spacemap 0 free 64G
metaslab 53 offset 35000000000 spacemap 0 free 64G
metaslab 54 offset 36000000000 spacemap 0 free 64G
metaslab 55 offset 37000000000 spacemap 0 free 64G
metaslab 56 offset 38000000000 spacemap 0 free 64G
metaslab 57 offset 39000000000 spacemap 0 free 64G
metaslab 58 offset 3a000000000 spacemap 0 free 64G
metaslab 59 offset 3b000000000 spacemap 0 free 64G
metaslab 60 offset 3c000000000 spacemap 0 free 64G
metaslab 61 offset 3d000000000 spacemap 0 free 64G
metaslab 62 offset 3e000000000 spacemap 0 free 64G
metaslab 63 offset 3f000000000 spacemap 0 free 64G
metaslab 64 offset 40000000000 spacemap 0 free 64G
metaslab 65 offset 41000000000 spacemap 0 free 64G
metaslab 66 offset 42000000000 spacemap 0 free 64G
metaslab 67 offset 43000000000 spacemap 0 free 64G
metaslab 68 offset 44000000000 spacemap 0 free 64G
metaslab 69 offset 45000000000 spacemap 0 free 64G
metaslab 70 offset 46000000000 spacemap 0 free 64G
metaslab 71 offset 47000000000 spacemap 0 free 64G
metaslab 72 offset 48000000000 spacemap 0 free 64G
metaslab 73 offset 49000000000 spacemap 0 free 64G
metaslab 74 offset 4a000000000 spacemap 0 free 64G
metaslab 75 offset 4b000000000 spacemap 0 free 64G
metaslab 76 offset 4c000000000 spacemap 0 free 64G
metaslab 77 offset 4d000000000 spacemap 0 free 64G
metaslab 78 offset 4e000000000 spacemap 0 free 64G
metaslab 79 offset 4f000000000 spacemap 0 free 64G
metaslab 80 offset 50000000000 spacemap 0 free 64G
metaslab 81 offset 51000000000 spacemap 0 free 64G
metaslab 82 offset 52000000000 spacemap 0 free 64G
metaslab 83 offset 53000000000 spacemap 0 free 64G
metaslab 84 offset 54000000000 spacemap 0 free 64G
metaslab 85 offset 55000000000 spacemap 0 free 64G
metaslab 86 offset 56000000000 spacemap 0 free 64G
metaslab 87 offset 57000000000 spacemap 0 free 64G
metaslab 88 offset 58000000000 spacemap 0 free 64G
metaslab 89 offset 59000000000 spacemap 0 free 64G
metaslab 90 offset 5a000000000 spacemap 0 free 64G
metaslab 91 offset 5b000000000 spacemap 0 free 64G
metaslab 92 offset 5c000000000 spacemap 0 free 64G
metaslab 93 offset 5d000000000 spacemap 0 free 64G
metaslab 94 offset 5e000000000 spacemap 0 free 64G
metaslab 95 offset 5f000000000 spacemap 0 free 64G
metaslab 96 offset 60000000000 spacemap 0 free 64G
metaslab 97 offset 61000000000 spacemap 0 free 64G
metaslab 98 offset 62000000000 spacemap 0 free 64G
metaslab 99 offset 63000000000 spacemap 0 free 64G
metaslab 100 offset 64000000000 spacemap 0 free 64G
metaslab 101 offset 65000000000 spacemap 0 free 64G
metaslab 102 offset 66000000000 spacemap 0 free 64G
metaslab 103 offset 67000000000 spacemap 0 free 64G
metaslab 104 offset 68000000000 spacemap 0 free 64G
metaslab 105 offset 69000000000 spacemap 0 free 64G
metaslab 106 offset 6a000000000 spacemap 0 free 64G
metaslab 107 offset 6b000000000 spacemap 0 free 64G
metaslab 108 offset 6c000000000 spacemap 0 free 64G
metaslab 109 offset 6d000000000 spacemap 0 free 64G
metaslab 110 offset 6e000000000 spacemap 0 free 64G
metaslab 111 offset 6f000000000 spacemap 0 free 64G
metaslab 112 offset 70000000000 spacemap 0 free 64G
metaslab 113 offset 71000000000 spacemap 0 free 64G
metaslab 114 offset 72000000000 spacemap 0 free 64G
metaslab 115 offset 73000000000 spacemap 0 free 64G
metaslab 116 offset 74000000000 spacemap 0 free 64G
metaslab 117 offset 75000000000 spacemap 0 free 64G
metaslab 118 offset 76000000000 spacemap 0 free 64G
metaslab 119 offset 77000000000 spacemap 0 free 64G
metaslab 120 offset 78000000000 spacemap 0 free 64G
metaslab 121 offset 79000000000 spacemap 0 free 64G
metaslab 122 offset 7a000000000 spacemap 0 free 64G
metaslab 123 offset 7b000000000 spacemap 0 free 64G
metaslab 124 offset 7c000000000 spacemap 0 free 64G
metaslab 125 offset 7d000000000 spacemap 0 free 64G
metaslab 126 offset 7e000000000 spacemap 0 free 64G
metaslab 127 offset 7f000000000 spacemap 0 free 64G
metaslab 128 offset 80000000000 spacemap 0 free 64G
metaslab 129 offset 81000000000 spacemap 0 free 64G
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 50 of 130 ...
209G completed (1978MB/s) estimated time remaining: 0hr 00min 11sec
No leaks (block sum matches space maps exactly)
bp count: 1522627
bp logical: 196978038272 avg: 129367
bp physical: 163064331264 avg: 107094 compression: 1.21
bp allocated: 247859896320 avg: 162784 compression: 0.79
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 247859896320 used: 2.77%
additional, non-pointer bps of type 0: 964
space map refcount mismatch: expected 18 != actual 12
As you can see I still have the mismatch error, but what's interesting is the numbers reported are now different after the scrub, but the difference on both counts is still 6.
I'm afraid I'm not too knowledgeable on the internals of file systems, but I'm quite happy to run further tests and report pack if the developers wish, so please just let me know. :)
@jay-to-the-dee
Hi,
please also provide information on your harddrives, mainboard, RAM (ECC ?), cpu,
and layers in between (cryptsetup/luks ? lvm ? ecryptfs or others ?)
Sure, no problem @kernelOfTruth :) In brief I'm running 3 x 3TB drives in RAIDZ on a workstation setup.
2 x 8GB NON-ECC RAM i5-4670K CPU Gigabyte GA-Z87X-D3H Motherboard 3 x brand new WD30EZRZ WD Blue 3TB hard disks for my ZFS pool in a RAIDZ configuration (the OS itself is ran from an SSD)
No LUKS, LVM, eCryptFS are being used. LZ4 compression is enabled, dedup is disabled. All the drives are attached to the motherboard directly.
uname -a output:
Linux haswell 4.2.0-30-generic #36-Ubuntu SMP Fri Feb 26 00:58:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
I apologize for the noise if this comment isn't related to the problem, but my invocation of zdb -mc is failing with an assert. There might be corruption here.
-[~:#]- cat /sys/module/zfs/version
0.6.5-317_g669cf0a
-[~:#]- cat /sys/module/spl/version
0.6.5-63_g5ad98ad
-[~:#]- zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0 in 358h2m with 0 errors on Fri Jun 17 23:28:08 2016
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J859GF-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J801SF-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J80ZSF-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J71V4F-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J85DJF-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J7ZSWF-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J7YV5F-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J81BYF-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J7Y1ZF-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK11A8B9J816ZF-part3 ONLINE 0 0 0
ata-Hitachi_HDS722020ALA330_JK1171YAGYKL7S-part3 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
ata-OCZ-AGILITY3_OCZ-M935L9UP3HLO32NL-part1 ONLINE 0 0 0
ata-M4-CT256M4SSD2_000000001221090B4FE5-part1 ONLINE 0 0 0
errors: No known data errors
-[~:#]- zdb -mc rpool
Metaslabs:
vdev 0
metaslabs 158 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 2792989 free 12.9G
metaslab 1 offset 2000000000 spacemap 3257787 free 19.7G
metaslab 2 offset 4000000000 spacemap 2702229 free 538M
metaslab 3 offset 6000000000 spacemap 2213359 free 28.2G
metaslab 4 offset 8000000000 spacemap 2262135 free 34.2G
metaslab 5 offset a000000000 spacemap 4292047 free 12.4G
metaslab 6 offset c000000000 spacemap 3171610 free 4.23G
metaslab 7 offset e000000000 spacemap 112980 free 32.0G
metaslab 8 offset 10000000000 spacemap 2180939 free 29.7G
metaslab 9 offset 12000000000 spacemap 2678626 free 33.8G
metaslab 10 offset 14000000000 spacemap 2705532 free 22.4G
metaslab 11 offset 16000000000 spacemap 2755856 free 11.1G
metaslab 12 offset 18000000000 spacemap 2240895 free 14.1G
metaslab 13 offset 1a000000000 spacemap 92380 free 27.4G
metaslab 14 offset 1c000000000 spacemap 2138606 free 29.0G
metaslab 15 offset 1e000000000 spacemap 376935 free 1.31G
metaslab 16 offset 20000000000 spacemap 376936 free 16.0G
metaslab 17 offset 22000000000 spacemap 4207089 free 23.0G
metaslab 18 offset 24000000000 spacemap 4722403 free 30.8G
metaslab 19 offset 26000000000 spacemap 2751652 free 42.2G
metaslab 20 offset 28000000000 spacemap 4286587 free 34.6G
metaslab 21 offset 2a000000000 spacemap 376937 free 3.60G
metaslab 22 offset 2c000000000 spacemap 376938 free 13.7G
metaslab 23 offset 2e000000000 spacemap 376939 free 27.8G
metaslab 24 offset 30000000000 spacemap 4208777 free 25.6G
metaslab 25 offset 32000000000 spacemap 2731937 free 17.1G
metaslab 26 offset 34000000000 spacemap 2724927 free 219M
metaslab 27 offset 36000000000 spacemap 4292046 free 25.2G
metaslab 28 offset 38000000000 spacemap 112515 free 39.6G
metaslab 29 offset 3a000000000 spacemap 2116887 free 47.9G
metaslab 30 offset 3c000000000 spacemap 2213061 free 29.4G
metaslab 31 offset 3e000000000 spacemap 2705531 free 23.8G
metaslab 32 offset 40000000000 spacemap 173557 free 44.3G
metaslab 33 offset 42000000000 spacemap 4749475 free 48.4G
metaslab 34 offset 44000000000 spacemap 2698341 free 22.8G
metaslab 35 offset 46000000000 spacemap 4753356 free 28.2G
metaslab 36 offset 48000000000 spacemap 2804927 free 16.2G
metaslab 37 offset 4a000000000 spacemap 3148668 free 40.2G
metaslab 38 offset 4c000000000 spacemap 3290174 free 28.9G
metaslab 39 offset 4e000000000 spacemap 4765705 free 30.9G
metaslab 40 offset 50000000000 spacemap 376940 free 24.5G
metaslab 41 offset 52000000000 spacemap 376941 free 2.24G
metaslab 42 offset 54000000000 spacemap 376942 free 14.8G
metaslab 43 offset 56000000000 spacemap 376943 free 378M
metaslab 44 offset 58000000000 spacemap 3155472 free 29.1G
metaslab 45 offset 5a000000000 spacemap 3226000 free 32.3G
metaslab 46 offset 5c000000000 spacemap 2669520 free 32.6G
metaslab 47 offset 5e000000000 spacemap 92144 free 278M
metaslab 48 offset 60000000000 spacemap 2678627 free 31.5G
metaslab 49 offset 62000000000 spacemap 3237314 free 17.1G
metaslab 50 offset 64000000000 spacemap 2224359 free 21.2G
metaslab 51 offset 66000000000 spacemap 2700636 free 21.1G
metaslab 52 offset 68000000000 spacemap 3211357 free 31.0G
metaslab 53 offset 6a000000000 spacemap 4208546 free 25.1G
metaslab 54 offset 6c000000000 spacemap 3148667 free 33.7G
metaslab 55 offset 6e000000000 spacemap 3200913 free 39.5G
metaslab 56 offset 70000000000 spacemap 3696013 free 414M
metaslab 57 offset 72000000000 spacemap 3713172 free 504M
metaslab 58 offset 74000000000 spacemap 2691520 free 39.3G
metaslab 59 offset 76000000000 spacemap 4749026 free 13.0G
metaslab 60 offset 78000000000 spacemap 2702230 free 5.13G
metaslab 61 offset 7a000000000 spacemap 2118476 free 34.4G
metaslab 62 offset 7c000000000 spacemap 1825661 free 35.9G
metaslab 63 offset 7e000000000 spacemap 3169421 free 37.5G
metaslab 64 offset 80000000000 spacemap 3158397 free 26.0G
metaslab 65 offset 82000000000 spacemap 4763567 free 16.7G
metaslab 66 offset 84000000000 spacemap 2798415 free 30.1G
metaslab 67 offset 86000000000 spacemap 2254521 free 38.7G
metaslab 68 offset 88000000000 spacemap 376944 free 2.32G
metaslab 69 offset 8a000000000 spacemap 376945 free 1.81G
metaslab 70 offset 8c000000000 spacemap 376946 free 53.1G
metaslab 71 offset 8e000000000 spacemap 2179835 free 39.4G
metaslab 72 offset 90000000000 spacemap 2224360 free 39.6G
metaslab 73 offset 92000000000 spacemap 376947 free 37.3G
metaslab 74 offset 94000000000 spacemap 109978 free 570M
metaslab 75 offset 96000000000 spacemap 2213978 free 108M
metaslab 76 offset 98000000000 spacemap 376948 free 40.8G
metaslab 77 offset 9a000000000 spacemap 4785341 free 20.7G
metaslab 78 offset 9c000000000 spacemap 376949 free 7.00G
metaslab 79 offset 9e000000000 spacemap 376950 free 37.7G
metaslab 80 offset a0000000000 spacemap 376951 free 12.8G
metaslab 81 offset a2000000000 spacemap 376952 free 14.2G
metaslab 82 offset a4000000000 spacemap 376973 free 36.3G
metaslab 83 offset a6000000000 spacemap 376974 free 45.2G
metaslab 84 offset a8000000000 spacemap 376975 free 8.80G
metaslab 85 offset aa000000000 spacemap 376953 free 27.7G
metaslab 86 offset ac000000000 spacemap 376954 free 37.4G
metaslab 87 offset ae000000000 spacemap 376955 free 357M
metaslab 88 offset b0000000000 spacemap 376956 free 41.7G
metaslab 89 offset b2000000000 spacemap 376957 free 49.4G
metaslab 90 offset b4000000000 spacemap 376958 free 30.9G
metaslab 91 offset b6000000000 spacemap 376959 free 48.3G
metaslab 92 offset b8000000000 spacemap 376960 free 1.92G
metaslab 93 offset ba000000000 spacemap 376976 free 49.2G
metaslab 94 offset bc000000000 spacemap 376961 free 18.8G
metaslab 95 offset be000000000 spacemap 376977 free 4.25G
metaslab 96 offset c0000000000 spacemap 376962 free 3.84G
metaslab 97 offset c2000000000 spacemap 376978 free 19.9G
metaslab 98 offset c4000000000 spacemap 376979 free 11.5G
metaslab 99 offset c6000000000 spacemap 376980 free 41.9G
metaslab 100 offset c8000000000 spacemap 376981 free 34.7G
metaslab 101 offset ca000000000 spacemap 376963 free 42.0G
metaslab 102 offset cc000000000 spacemap 376964 free 45.2G
metaslab 103 offset ce000000000 spacemap 4758199 free 31.3G
metaslab 104 offset d0000000000 spacemap 2241127 free 36.6G
metaslab 105 offset d2000000000 spacemap 2103132 free 47.1G
metaslab 106 offset d4000000000 spacemap 2731942 free 30.1G
metaslab 107 offset d6000000000 spacemap 376965 free 5.19G
metaslab 108 offset d8000000000 spacemap 4765186 free 15.7G
metaslab 109 offset da000000000 spacemap 4818001 free 4.80G
metaslab 110 offset dc000000000 spacemap 376966 free 4.27G
metaslab 111 offset de000000000 spacemap 376982 free 7.42G
metaslab 112 offset e0000000000 spacemap 3696012 free 25.0G
metaslab 113 offset e2000000000 spacemap 376983 free 5.52G
metaslab 114 offset e4000000000 spacemap 376967 free 10.1G
metaslab 115 offset e6000000000 spacemap 4762304 free 7.84G
metaslab 116 offset e8000000000 spacemap 3200912 free 35.9G
metaslab 117 offset ea000000000 spacemap 376984 free 2.26G
metaslab 118 offset ec000000000 spacemap 376968 free 35.7G
metaslab 119 offset ee000000000 spacemap 376985 free 5.70G
metaslab 120 offset f0000000000 spacemap 376986 free 3.61G
metaslab 121 offset f2000000000 spacemap 376987 free 686M
metaslab 122 offset f4000000000 spacemap 376988 free 1.47G
metaslab 123 offset f6000000000 spacemap 376989 free 1.05G
metaslab 124 offset f8000000000 spacemap 376990 free 961M
metaslab 125 offset fa000000000 spacemap 376991 free 3.39G
metaslab 126 offset fc000000000 spacemap 376992 free 1.05G
metaslab 127 offset fe000000000 spacemap 376993 free 32.2G
metaslab 128 offset 100000000000 spacemap 376994 free 30.4G
metaslab 129 offset 102000000000 spacemap 110854 free 53.2G
metaslab 130 offset 104000000000 spacemap 376969 free 25.8G
metaslab 131 offset 106000000000 spacemap 376995 free 13.7G
metaslab 132 offset 108000000000 spacemap 376996 free 664M
metaslab 133 offset 10a000000000 spacemap 4772027 free 51.6G
metaslab 134 offset 10c000000000 spacemap 376970 free 44.4G
metaslab 135 offset 10e000000000 spacemap 376997 free 58.3G
metaslab 136 offset 110000000000 spacemap 376998 free 15.7G
metaslab 137 offset 112000000000 spacemap 4765185 free 10.9G
metaslab 138 offset 114000000000 spacemap 4722425 free 33.6G
metaslab 139 offset 116000000000 spacemap 2258865 free 44.1G
metaslab 140 offset 118000000000 spacemap 3290173 free 35.4G
metaslab 141 offset 11a000000000 spacemap 2705590 free 40.0G
metaslab 142 offset 11c000000000 spacemap 376971 free 2.95G
metaslab 143 offset 11e000000000 spacemap 3226001 free 29.8G
metaslab 144 offset 120000000000 spacemap 2247699 free 49.9G
metaslab 145 offset 122000000000 spacemap 109977 free 51.7G
metaslab 146 offset 124000000000 spacemap 2191863 free 32.3G
metaslab 147 offset 126000000000 spacemap 2628237 free 24.5G
metaslab 148 offset 128000000000 spacemap 376972 free 7.72G
metaslab 149 offset 12a000000000 spacemap 113602 free 51.5G
metaslab 150 offset 12c000000000 spacemap 2179836 free 34.1G
metaslab 151 offset 12e000000000 spacemap 2116843 free 58.9G
metaslab 152 offset 130000000000 spacemap 1717792 free 51.6G
metaslab 153 offset 132000000000 spacemap 2716506 free 47.4G
metaslab 154 offset 134000000000 spacemap 3176173 free 28.9G
metaslab 155 offset 136000000000 spacemap 1824803 free 33.0G
metaslab 156 offset 138000000000 spacemap 4268866 free 36.2G
metaslab 157 offset 13a000000000 spacemap 3220982 free 51.4G
vdev 1
metaslabs 127 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 92118 free 63.9M
metaslab 1 offset 4000000 spacemap 114010 free 64M
metaslab 2 offset 8000000 spacemap 92117 free 64M
metaslab 3 offset c000000 spacemap 2115079 free 64M
metaslab 4 offset 10000000 spacemap 2115078 free 64M
metaslab 5 offset 14000000 spacemap 155616 free 64M
metaslab 6 offset 18000000 spacemap 155626 free 64M
metaslab 7 offset 1c000000 spacemap 155625 free 64.0M
metaslab 8 offset 20000000 spacemap 155624 free 64M
metaslab 9 offset 24000000 spacemap 155623 free 64M
metaslab 10 offset 28000000 spacemap 155622 free 64M
metaslab 11 offset 2c000000 spacemap 155621 free 64M
metaslab 12 offset 30000000 spacemap 155620 free 64M
metaslab 13 offset 34000000 spacemap 155648 free 64M
metaslab 14 offset 38000000 spacemap 155647 free 64M
metaslab 15 offset 3c000000 spacemap 155646 free 64M
metaslab 16 offset 40000000 spacemap 155645 free 64M
metaslab 17 offset 44000000 spacemap 155644 free 64M
metaslab 18 offset 48000000 spacemap 155643 free 64M
metaslab 19 offset 4c000000 spacemap 155642 free 64M
metaslab 20 offset 50000000 spacemap 155641 free 64M
metaslab 21 offset 54000000 spacemap 155640 free 64M
metaslab 22 offset 58000000 spacemap 155639 free 64M
metaslab 23 offset 5c000000 spacemap 155638 free 64M
metaslab 24 offset 60000000 spacemap 155637 free 64M
metaslab 25 offset 64000000 spacemap 155636 free 64M
metaslab 26 offset 68000000 spacemap 155635 free 64M
metaslab 27 offset 6c000000 spacemap 155634 free 64M
metaslab 28 offset 70000000 spacemap 155633 free 64M
metaslab 29 offset 74000000 spacemap 155632 free 64M
metaslab 30 offset 78000000 spacemap 155631 free 64M
metaslab 31 offset 7c000000 spacemap 155630 free 64M
metaslab 32 offset 80000000 spacemap 155629 free 64M
metaslab 33 offset 84000000 spacemap 155628 free 64M
metaslab 34 offset 88000000 spacemap 155627 free 64M
metaslab 35 offset 8c000000 spacemap 155739 free 64M
metaslab 36 offset 90000000 spacemap 155738 free 64M
metaslab 37 offset 94000000 spacemap 155737 free 64M
metaslab 38 offset 98000000 spacemap 155736 free 64M
metaslab 39 offset 9c000000 spacemap 155735 free 64M
metaslab 40 offset a0000000 spacemap 155734 free 64M
metaslab 41 offset a4000000 spacemap 155733 free 64M
metaslab 42 offset a8000000 spacemap 155732 free 64M
metaslab 43 offset ac000000 spacemap 155731 free 64M
metaslab 44 offset b0000000 spacemap 155730 free 64M
metaslab 45 offset b4000000 spacemap 155729 free 64M
metaslab 46 offset b8000000 spacemap 155728 free 64M
metaslab 47 offset bc000000 spacemap 155727 free 64M
metaslab 48 offset c0000000 spacemap 155726 free 64M
metaslab 49 offset c4000000 spacemap 155725 free 64M
metaslab 50 offset c8000000 spacemap 155724 free 64.0M
metaslab 51 offset cc000000 spacemap 155720 free 64M
metaslab 52 offset d0000000 spacemap 155719 free 64M
metaslab 53 offset d4000000 spacemap 155718 free 64M
metaslab 54 offset d8000000 spacemap 155717 free 63.9M
metaslab 55 offset dc000000 spacemap 155715 free 64M
metaslab 56 offset e0000000 spacemap 155714 free 64M
metaslab 57 offset e4000000 spacemap 155713 free 64M
metaslab 58 offset e8000000 spacemap 155712 free 64M
metaslab 59 offset ec000000 spacemap 155711 free 64M
metaslab 60 offset f0000000 spacemap 155710 free 64M
metaslab 61 offset f4000000 spacemap 155709 free 64.0M
metaslab 62 offset f8000000 spacemap 155708 free 64M
metaslab 63 offset fc000000 spacemap 155707 free 64M
metaslab 64 offset 100000000 spacemap 155706 free 64M
metaslab 65 offset 104000000 spacemap 155705 free 64M
metaslab 66 offset 108000000 spacemap 155704 free 64M
metaslab 67 offset 10c000000 spacemap 155703 free 64M
metaslab 68 offset 110000000 spacemap 155701 free 64M
metaslab 69 offset 114000000 spacemap 155700 free 64M
metaslab 70 offset 118000000 spacemap 155694 free 64M
metaslab 71 offset 11c000000 spacemap 155691 free 64M
metaslab 72 offset 120000000 spacemap 155690 free 64M
metaslab 73 offset 124000000 spacemap 155689 free 64M
metaslab 74 offset 128000000 spacemap 155688 free 64M
metaslab 75 offset 12c000000 spacemap 155687 free 64M
metaslab 76 offset 130000000 spacemap 155686 free 64M
metaslab 77 offset 134000000 spacemap 155685 free 64.0M
metaslab 78 offset 138000000 spacemap 155684 free 64M
metaslab 79 offset 13c000000 spacemap 155683 free 64.0M
metaslab 80 offset 140000000 spacemap 155682 free 63.1M
metaslab 81 offset 144000000 spacemap 155681 free 64M
metaslab 82 offset 148000000 spacemap 155680 free 64M
metaslab 83 offset 14c000000 spacemap 4348072 free 64M
metaslab 84 offset 150000000 spacemap 4348080 free 64M
metaslab 85 offset 154000000 spacemap 4348082 free 64M
metaslab 86 offset 158000000 spacemap 4348096 free 64M
metaslab 87 offset 15c000000 spacemap 4348099 free 64M
metaslab 88 offset 160000000 spacemap 4348107 free 64M
metaslab 89 offset 164000000 spacemap 4348112 free 64M
metaslab 90 offset 168000000 spacemap 4348117 free 64M
metaslab 91 offset 16c000000 spacemap 4348122 free 64M
metaslab 92 offset 170000000 spacemap 4348131 free 64M
metaslab 93 offset 174000000 spacemap 4348144 free 64M
metaslab 94 offset 178000000 spacemap 4348146 free 64M
metaslab 95 offset 17c000000 spacemap 4348148 free 64M
metaslab 96 offset 180000000 spacemap 4348149 free 64M
metaslab 97 offset 184000000 spacemap 4348151 free 64M
metaslab 98 offset 188000000 spacemap 4348152 free 64M
metaslab 99 offset 18c000000 spacemap 4348216 free 64M
metaslab 100 offset 190000000 spacemap 4348227 free 64M
metaslab 101 offset 194000000 spacemap 4348235 free 64M
metaslab 102 offset 198000000 spacemap 4348239 free 64M
metaslab 103 offset 19c000000 spacemap 4348256 free 64M
metaslab 104 offset 1a0000000 spacemap 4348290 free 64M
metaslab 105 offset 1a4000000 spacemap 4348302 free 64M
metaslab 106 offset 1a8000000 spacemap 4348374 free 64M
metaslab 107 offset 1ac000000 spacemap 4348376 free 64M
metaslab 108 offset 1b0000000 spacemap 4348378 free 64M
metaslab 109 offset 1b4000000 spacemap 4348379 free 64M
metaslab 110 offset 1b8000000 spacemap 4348380 free 64M
metaslab 111 offset 1bc000000 spacemap 4348381 free 64M
metaslab 112 offset 1c0000000 spacemap 4348415 free 64M
metaslab 113 offset 1c4000000 spacemap 4348448 free 64M
metaslab 114 offset 1c8000000 spacemap 4348467 free 64M
metaslab 115 offset 1cc000000 spacemap 4348474 free 64M
metaslab 116 offset 1d0000000 spacemap 4348486 free 64M
metaslab 117 offset 1d4000000 spacemap 4348488 free 64M
metaslab 118 offset 1d8000000 spacemap 4348494 free 64M
metaslab 119 offset 1dc000000 spacemap 4348497 free 64M
metaslab 120 offset 1e0000000 spacemap 4348509 free 64M
metaslab 121 offset 1e4000000 spacemap 4348512 free 64M
metaslab 122 offset 1e8000000 spacemap 4348513 free 64M
metaslab 123 offset 1ec000000 spacemap 4348514 free 64M
metaslab 124 offset 1f0000000 spacemap 4348516 free 64M
metaslab 125 offset 1f4000000 spacemap 4348517 free 64M
metaslab 126 offset 1f8000000 spacemap 4348519 free 64M
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
loading space map for vdev 0 of 2, metaslab 23 of 158 ...space_map_load(msp->ms_sm, msp->ms_tree, SM_ALLOC) == 0 (0x34 == 0x0)
ASSERT at zdb.c:2668:zdb_leak_init()Aborted
I have a broken space map, too. Is there some way to get the data of the filesystem?
root@ubuntu:~# zdb -b -e rpool
Traversing all blocks to verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 0 of 159 ...zdb: ../../module/zfs/range_tree.c:261: Assertion `rs->rs_start <= start (0xda000 <= 0<0)' failed.
Aborted
root@ubuntu:~#
Importing the pool always fails:
root@ubuntu:~# zpool import -N -f -R /mnt rpool
[ 1277.024109] VERIFY(rs == NULL) failed
[ 1277.024273] PANIC at range_tree.c:186:range_tree_add()
Not that I know. IIUC, the issue is that ZFS can only traverse block pointers when scrubbing, going from one block to another. Seeing as the free space isn't data blocks and has no ptrs to it, it can't be verified as free the same way data is verified as accurate. We originally found the bug a few years back, and I believe it was the maker himself who told me this was a very non trivial issue. Offline scrubs or fsck are probably the way to go here since the FSM can't change during an offline op outside the locks held by the scrub.
@behlendorf wrote in Feb. 2015:
I absolutely agree we need a tool which can rebuild the space maps.
3 years later I have a couple of question about this:
This refcount mismatch bug is still open. It looks like the Devs all agree that fixing this bug is not a priority. Does that mean that it is not critical for data integrity? Can somebody please share a risk assessment.
Fixing refcount mismatches with an extra tool is one thing. And I would highly appreciate a tool like this. But why do these refcount mismatches exist in the first place? What is going wrong?
Il 24 feb 2018 10:28, "mabod" notifications@github.com ha scritto:
@behlendorf https://github.com/behlendorf worte in Feb. 2015:
I absolutely agree we need a tool which can rebuild the space maps.
+1
3 years later I have a couple of question about this:
1.
This refcount mismatch bug is still open. It looks like the Devs all agree that fixing this bug is not a priority. Does that mean that it is not critical for data integrity? Can somebody please share a risk assessment. 2.
Fixing refcount mismatches with an extra tool is one thing. And I would highly appreciate a tool like this. But why do these refcount mismatches exist in the first place? What is going wrong?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/3111#issuecomment-368214880, or mute the thread https://github.com/notifications/unsubscribe-auth/AOSfJcGfo5YUruMX9SiQ805V-BHLlxh8ks5tX9ZOgaJpZM4Dg6lG .
I've around 7 pools, of different sizes, of different ages, on 7 different systems.
Five of them has feature@spacemap_histogram disabled, while 2 of them have this feature enabled.
Both the systems with feature@spacemap_histogram enabled, one gentoo stable, ZFS-v0.7.8-r0-gentoo and one Scientific-Linux 6.8, ZFS-v0.6.5.8-1, show a "space map refcount mismatch".
The two systems are completely different, one is old, about six years, the other is more or less new, less than one year, the first hasn't ECC RAM, the second has, both share the same mismatch. The pool of the Gentoo system has been created years ago, with a completely different version of ZFS and the pool has been upgraded along the way.
I can't see any other error on my systems, beside this mismatch, scrub is clean, no hardware errors, "zdb -mc" doesn't report any leak.
Of course I've "zfs send" backups. Should I worry?
@sempervictus If I understood correctly, you asked for scrub to fix spacemaps online. Somehow I understood that they could be fixed offline. Did I understand correctly? If so, how?
Some other people here reported assertions in their ZDB runs. I have a pool with this right now. Not only assertion (that I could bypass with -AAA), but also segmentation fault, always after the same amount of space:
# pwd
/root/zfs/zfs/cmd/zdb/
# ./zdb -cccvvAAAs -I 400 tank
Traversing all blocks to verify checksums and verify nothing leaked ...
loading concrete vdev 1, metaslab 14 of 15 .....
22.1G completed ( 2MB/s) estimated time remaining: 257hr 51min 50sec Segmentation fault (core dumped)
Again:
# time ./zdb -cccvAAAs -I 100 tank
Traversing all blocks to verify checksums and verify nothing leaked ...
loading concrete vdev 1, metaslab 14 of 15 .....
22.1G completed ( 2MB/s) estimated time remaining: 294hr 17min 18sec Segmentation fault (core dumped)
real 262m49.257s
user 47m40.745s
sys 11m27.985s
Again:
# time ./zdb -cccvAAAs -I 50 tank
Traversing all blocks to verify checksums and verify nothing leaked ...
loading concrete vdev 1, metaslab 14 of 15 .....
82.4M completed ( 26MB/s) estimated time remaining: 28hr 18min 00sec zdb_blkptr_cb: Got error 52 reading <0, 0, 0, 0> -- skipping
158M completed ( 2MB/s) estimated time remaining: 319hr 26min 24sec zdb_blkptr_cb: Got error 52 reading <0, 12, 1, 0> -- skipping
186M completed ( 1MB/s) estimated time remaining: 381hr 23min 27sec zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 4> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 5> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 7> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 8> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 6> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 9> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, a> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, b> -- skipping
188M completed ( 1MB/s) estimated time remaining: 387hr 44min 15sec zdb_blkptr_cb: Got error 52 reading <0, 62, 0, c> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, d> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, e> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, f> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 10> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 11> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 12> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 13> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 14> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 15> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 16> -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 17> -- skipping
22.2G completed ( 1MB/s) estimated time remaining: 733hr 09min 24sec Segmentation fault (core dumped)
real 554m37.720s
user 35m25.028s
sys 12m24.917s
Again, but this time I think I triggered something very strange:
time ./zdb -cccvvvvvvvvvvvvvAAAs -I 800 tank
Traversing all blocks to verify checksums and verify nothing leaked ...
loading concrete vdev 0, metaslab 116 of 145 ...space_map_load(msp->ms_sm, msp->ms_allocatable, maptype) == 0 (0x34 == 0x0)
ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees()Aborted (core dumped)
real 6m1.323s
user 0m45.443s
sys 0m21.463s
And this is not an old pool. I have just created it, and send/recv filesystems from an old pool, and found file with wrong checksum.
With all the changes and portage going on in the codebase, especially by people like me who insist on tanning their hide and sharpening teeth on the bleeding edge, there's a decent chance that spacemaps have suffered on more pools than people may realize. While digging through #3094 i'm seeing that i am not the only one who has spacemap refcount mismatch errors which are showing up in zdb -m, but not registering as errors in a scrub. As @behlendorf pointed out, this is far from good since
I highly encourage anyone reading this to check their pools with zdb -m or better yet zdb -mc presuming you have the resources to checksum your metadata (another -c will checksum data as well).
Since scrubs dont flag this, and a quick grep through the git logs shows a bunch of space map related changes since ZoL went 0.6.2, there's a decent chance that people other than the "adventurous" lot who used #2909 may be looking at similar issues.
To resolve this, i suggest we teach scrub, either implicitly, or through a CLI flag, to keep track of data and metadata sizing as it traverses the metaslabs (used space accounting). It should occasionally compare this with the space map, and upon deviation recompute the space map for every metaslab affected. The new space maps should only commit once the blocks they describe have been verified. Scrub should also probably ring some alarm bells when it detects this condition, as i imagine that a write to a block which the SM presented as available will actually look valid despite having overwritten existing data since there's a valid pointer to it in the tree.
With the on-disk format changing all the time, and the data structures becoming more complex and interdependent, it may be worth revisiting what scrub should be doing, as opposed to what it does today.