openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.72k forks source link

Feature request: allow scrub to recalculate space maps #3111

Open sempervictus opened 9 years ago

sempervictus commented 9 years ago

With all the changes and portage going on in the codebase, especially by people like me who insist on tanning their hide and sharpening teeth on the bleeding edge, there's a decent chance that spacemaps have suffered on more pools than people may realize. While digging through #3094 i'm seeing that i am not the only one who has spacemap refcount mismatch errors which are showing up in zdb -m, but not registering as errors in a scrub. As @behlendorf pointed out, this is far from good since

if a spacemap were somehow wrong it could result in permanent damage to a file. For example, if the spacemap indicated a block was free when it wasn't then a new write could overwrite existing data in a snapshot.

I highly encourage anyone reading this to check their pools with zdb -m or better yet zdb -mc presuming you have the resources to checksum your metadata (another -c will checksum data as well).

Since scrubs dont flag this, and a quick grep through the git logs shows a bunch of space map related changes since ZoL went 0.6.2, there's a decent chance that people other than the "adventurous" lot who used #2909 may be looking at similar issues.

To resolve this, i suggest we teach scrub, either implicitly, or through a CLI flag, to keep track of data and metadata sizing as it traverses the metaslabs (used space accounting). It should occasionally compare this with the space map, and upon deviation recompute the space map for every metaslab affected. The new space maps should only commit once the blocks they describe have been verified. Scrub should also probably ring some alarm bells when it detects this condition, as i imagine that a write to a block which the SM presented as available will actually look valid despite having overwritten existing data since there's a valid pointer to it in the tree.

With the on-disk format changing all the time, and the data structures becoming more complex and interdependent, it may be worth revisiting what scrub should be doing, as opposed to what it does today.

dswartz commented 9 years ago

You got me concerned about my 3 pools on 2 servers, so I went to check them. Maybe I am out to lunch, but I can't get this to work at all:

[root@centos7-ha3 ~]# zpool status pool: tank-copy state: ONLINE scan: scrub repaired 0 in 1h44m with 0 errors on Sun Feb 15 01:44:07 2015 config:

    NAME                      STATE     READ WRITE CKSUM
    tank-copy                 ONLINE       0     0     0
      mirror-0                ONLINE       0     0     0
        wwn-5000c5002e38e0eb  ONLINE       0     0     0
        wwn-5000c5002e3aa680  ONLINE       0     0     0
    logs
      wwn-55cd2e404b4cd14f    ONLINE       0     0     0

errors: No known data errors

but:

[root@centos7-ha3 ~]# zdb -mc tank-copy zdb: can't open 'tank-copy': No such file or directory

this fails for all 3 pools. centos 7 using a code drop from back in early December. I checked and double-checked the command syntax from the man page and other folks emails. wth?

dswartz commented 9 years ago

this is strange. so i went to my dev testbed and created a scratch pool, a dataset in it and some files in that dataset and ran 'zdb -c foo':

[root@zolbuild hmailserver]# zdb -c foo

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 48 of 127 ...

    No leaks (block sum matches space maps exactly)

    bp count:            1835
    bp logical:      62722048      avg:  34180
    bp physical:     60032512      avg:  32715     compression:   1.04
    bp allocated:    60284416      avg:  32852     compression:   1.04
    bp deduped:             0    ref>1:      0   deduplication:   1.00
    SPA allocated:   60284416     used:  0.35%

    additional, non-pointer bps of type 0:        506

e.g. all seems ok. this seems to be the same code drop from december 5th as the other two servers. so why doesn't 'zdb -c' work there? are my pools borked in some manner?

DeHackEd commented 9 years ago

It means you don't have a /etc/zfs/zpool.cache file for your tank-copy pool. zdb depends on it for imported pools. If the pool is exported you can use zdb -e instead.

dswartz commented 9 years ago

It means you don't have a /etc/zfs/zpool.cache file for your tank-copy pool. zdb depends on it for imported pools. If the pool is exported you can use zdb -e instead.

ah, thanks. i understand 'tank' being that way - i am using it on a 2-head jbod via pacemaker/corosync so it isn't in the cachefile. not sure why the others weren't. i will check. thanks again!

dswartz commented 9 years ago

something else going on: 'windows' on primary server and 'tank-copy' on backup server ARE in /etc/zfs/zpool.cache, but are not found by zdb...

dswartz commented 9 years ago

dunno why those two pools aren't findable normally, but i was able to with:

zdb -m -e -p /dev/disk/by-vdev tank-copy

synopsis:

space map refcount mismatch: expected 313 != actual 303

this is a live pool - i hope this doesn't indicate a problem?

dswartz commented 9 years ago

and on the production server:

zdb -m -e -p /dev/disk/by-vdev windows

(snip)

space map refcount mismatch: expected 56 != actual 53

and:

zdb -m -e -p /dev/disk/by-vdev tank

(snip)

[no discrepancies]

sempervictus commented 9 years ago

Spoke with @ryao in #OpenZFS briefly and he confirms that it may be possible to recompute the maps, but there may be other unforeseen consequences to this path.

Would be nice to find out roughly what percentage of users have space map problems without causing pandemonium in the user community... would help guage the real-world urgency of a change to the scrub procedures.

behlendorf commented 9 years ago

@sempervictus I absolutely agree we need a tool which can rebuild the space maps.

The good news is that space maps can absolutely be rebuilt, this is effectively what zdb does today to check for leaks. It walks the entire block tree constructing the space maps in memory and them compares them to the ones stored on disk. If they differ it reports the leak. The bad news is that it's much harder, but not impossible, to do this operation on a live pool where blocks are constantly being allocated and freed.

The easiest way is to extend zdb, or even better zhack, so that it can write out new correct space maps after it has calculated them. This would be an offline operation which isn't ideal but at least there would be a utility available. In my opinion extending zhack is also preferable because zdb is by design a read-only utility which means it's always safe to run. zhack on the other hand already has the ability to modify existing pools.

A more complicated, and preferable, solution would be to build this functionality in to scrub which also walks the block tree. We could do something like have scrub build up new space maps from scatch as it traverses all the blocks. All new allocs and frees while the scrub was running would need to update the usual in-memory space maps and additionally the ones being constructed by scrub. Once the scrub completes it should then be able to authoritatively compare both sets and space maps in a txg sync. If they don't match they can be replaced with those generated by scrub.

The devil's going to be in the details here so by default I doubt we want it to repair any damage. It's probably safely just to take any damaged spacemaps offline, which is already supported internally, and then use zdb / zhack offline to verify the diagnosis. Once we're absolutely sure that's working correctly we could enable an online repair.

dweeezil commented 9 years ago

I'd like to add a little followup to @behlendorf's commentary: It kept popping into my head while reading the recent spate of spacemap issues that some people are under the impression that zfs scrub is somehow analogous to fsck when, of course, they're completely different beasts.

A tool which reconstructs spacemaps would be a step in the direction of implementing the mythical fsck.zfs. There are, of course, a myriad of other semantic and structural issues which could be fixed or worked around by such a tool. For example, it wouldn't be terribly difficult to repair many classes of dnode damage as we've seen caused by the various SA issues.

Given how critical the spacemaps are; frequently fully traversed and ultimately, the database used for block allocation, they're clearly a hot spot and seem to be one of more common places for corruption to occur. This would clearly be a good step toward a more comprehensive fsck.zfs.

gofman commented 8 years ago

I am having the same error message: space map refcount mismatch: expected 369 != actual 273

zdb -mc reports that nothing leaked (and the same refcount mismatch at the end)

I found the discussion of the same issue here: https://forums.freenas.org/index.php?threads/problems-with-freenas-9-2-1-5.20879/

Citation: "dlavigne, Jun 10, 2014

I asked our in-house ZFS guru who said:

We have never seen this on FreeBSD, it's possibly a ZFS on Linux bug.

It seems to be caused by bad accounting for spacemap_histrogram feature. I don't think it's big deal though, the feature is active and stays active for the lifetime of pool and therefore the refcount no longer matters."

Indeed, "expected 369" in my case is shown as com.delphix:spacemap_histogram = 369 in zhack feature stat output.

So the questions are:

  1. Is it really not harmful in ZoL also (is that spacamap refcount used somehow)?
  2. Is it safe to fix this refcount using zhack to get rid of annoying error message?

Thanks, Paul.

kernelOfTruth commented 8 years ago

referencing:

adding :exclamation: marks since this patch is dangerous

:exclamation: http://lists.open-zfs.org/pipermail/developer/2014-July/000732.html https://github.com/wesolows/illumos-joyent/commit/dc4d7e06c8e0af213619f0aa517d819172911005 :exclamation: maybe something inspired by the patch above could be created ?

jay-to-the-dee commented 8 years ago

Just thought I'd try and provide what info I have on this issue to help the developers. I've only been using ZFS for less than a week which means I've just created a brand new RAIDZ pool. I'm using Ubuntu 15.10 (amd64) and my ZFS is from the Ubuntu repos meaning I'm currently on and always have been on ZFS version 0.6.4.2 (Ubuntu package version 0.6.4.2-0ubuntu1.2). Thought this might be interesting to rule out only previous ZFSonLinux version's causing this issue.

While I've only written around 200GB of data to my pool so far, I have already encountered this space map mismatch issue when running the zdb -b command. Before scrubbing I received the following:

sudo zdb -b tank

Traversing all blocks to verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 50 of 130 ...
 225G completed (4433MB/s) estimated time remaining: 0hr 00min 01sec        
    No leaks (block sum matches space maps exactly)

    bp count:         1522626
    bp logical:    196977910784      avg: 129367
    bp physical:   163064331264      avg: 107094     compression:   1.21
    bp allocated:  247859871744      avg: 162784     compression:   0.79
    bp deduped:             0    ref>1:      0   deduplication:   1.00
    SPA allocated: 247859871744     used:  2.77%

    additional, non-pointer bps of type 0:        964

space map refcount mismatch: expected 15 != actual 9

I then scrubbed (with apparently 0 errors being detected or repaired) and subsequently ran the zdb -mc command as suggested:

sudo zdb -mc tank

Metaslabs:
    vdev          0
    metaslabs   130   offset                spacemap          free      
    ---------------   -------------------   ---------------   -------------
    metaslab      0   offset            0   spacemap     38   free    3.54G
    metaslab      1   offset   1000000000   spacemap     61   free    6.21G
    metaslab      2   offset   2000000000   spacemap     68   free    26.6G
    metaslab      3   offset   3000000000   spacemap     73   free    27.8G
    metaslab      4   offset   4000000000   spacemap     74   free    50.4G
    metaslab      5   offset   5000000000   spacemap     69   free    42.4G
    metaslab      6   offset   6000000000   spacemap     75   free    60.3G
    metaslab      7   offset   7000000000   spacemap      0   free      64G
    metaslab      8   offset   8000000000   spacemap      0   free      64G
    metaslab      9   offset   9000000000   spacemap      0   free      64G
    metaslab     10   offset   a000000000   spacemap      0   free      64G
    metaslab     11   offset   b000000000   spacemap      0   free      64G
    metaslab     12   offset   c000000000   spacemap      0   free      64G
    metaslab     13   offset   d000000000   spacemap      0   free      64G
    metaslab     14   offset   e000000000   spacemap      0   free      64G
    metaslab     15   offset   f000000000   spacemap      0   free      64G
    metaslab     16   offset  10000000000   spacemap      0   free      64G
    metaslab     17   offset  11000000000   spacemap      0   free      64G
    metaslab     18   offset  12000000000   spacemap      0   free      64G
    metaslab     19   offset  13000000000   spacemap      0   free      64G
    metaslab     20   offset  14000000000   spacemap      0   free      64G
    metaslab     21   offset  15000000000   spacemap      0   free      64G
    metaslab     22   offset  16000000000   spacemap      0   free      64G
    metaslab     23   offset  17000000000   spacemap      0   free      64G
    metaslab     24   offset  18000000000   spacemap      0   free      64G
    metaslab     25   offset  19000000000   spacemap     37   free    63.8G
    metaslab     26   offset  1a000000000   spacemap      0   free      64G
    metaslab     27   offset  1b000000000   spacemap      0   free      64G
    metaslab     28   offset  1c000000000   spacemap      0   free      64G
    metaslab     29   offset  1d000000000   spacemap      0   free      64G
    metaslab     30   offset  1e000000000   spacemap      0   free      64G
    metaslab     31   offset  1f000000000   spacemap      0   free      64G
    metaslab     32   offset  20000000000   spacemap      0   free      64G
    metaslab     33   offset  21000000000   spacemap      0   free      64G
    metaslab     34   offset  22000000000   spacemap      0   free      64G
    metaslab     35   offset  23000000000   spacemap      0   free      64G
    metaslab     36   offset  24000000000   spacemap      0   free      64G
    metaslab     37   offset  25000000000   spacemap      0   free      64G
    metaslab     38   offset  26000000000   spacemap      0   free      64G
    metaslab     39   offset  27000000000   spacemap      0   free      64G
    metaslab     40   offset  28000000000   spacemap      0   free      64G
    metaslab     41   offset  29000000000   spacemap      0   free      64G
    metaslab     42   offset  2a000000000   spacemap      0   free      64G
    metaslab     43   offset  2b000000000   spacemap      0   free      64G
    metaslab     44   offset  2c000000000   spacemap      0   free      64G
    metaslab     45   offset  2d000000000   spacemap      0   free      64G
    metaslab     46   offset  2e000000000   spacemap      0   free      64G
    metaslab     47   offset  2f000000000   spacemap      0   free      64G
    metaslab     48   offset  30000000000   spacemap      0   free      64G
    metaslab     49   offset  31000000000   spacemap      0   free      64G
    metaslab     50   offset  32000000000   spacemap     36   free    64.0G
    metaslab     51   offset  33000000000   spacemap      0   free      64G
    metaslab     52   offset  34000000000   spacemap      0   free      64G
    metaslab     53   offset  35000000000   spacemap      0   free      64G
    metaslab     54   offset  36000000000   spacemap      0   free      64G
    metaslab     55   offset  37000000000   spacemap      0   free      64G
    metaslab     56   offset  38000000000   spacemap      0   free      64G
    metaslab     57   offset  39000000000   spacemap      0   free      64G
    metaslab     58   offset  3a000000000   spacemap      0   free      64G
    metaslab     59   offset  3b000000000   spacemap      0   free      64G
    metaslab     60   offset  3c000000000   spacemap      0   free      64G
    metaslab     61   offset  3d000000000   spacemap      0   free      64G
    metaslab     62   offset  3e000000000   spacemap      0   free      64G
    metaslab     63   offset  3f000000000   spacemap      0   free      64G
    metaslab     64   offset  40000000000   spacemap      0   free      64G
    metaslab     65   offset  41000000000   spacemap      0   free      64G
    metaslab     66   offset  42000000000   spacemap      0   free      64G
    metaslab     67   offset  43000000000   spacemap      0   free      64G
    metaslab     68   offset  44000000000   spacemap      0   free      64G
    metaslab     69   offset  45000000000   spacemap      0   free      64G
    metaslab     70   offset  46000000000   spacemap      0   free      64G
    metaslab     71   offset  47000000000   spacemap      0   free      64G
    metaslab     72   offset  48000000000   spacemap      0   free      64G
    metaslab     73   offset  49000000000   spacemap      0   free      64G
    metaslab     74   offset  4a000000000   spacemap      0   free      64G
    metaslab     75   offset  4b000000000   spacemap      0   free      64G
    metaslab     76   offset  4c000000000   spacemap      0   free      64G
    metaslab     77   offset  4d000000000   spacemap      0   free      64G
    metaslab     78   offset  4e000000000   spacemap      0   free      64G
    metaslab     79   offset  4f000000000   spacemap      0   free      64G
    metaslab     80   offset  50000000000   spacemap      0   free      64G
    metaslab     81   offset  51000000000   spacemap      0   free      64G
    metaslab     82   offset  52000000000   spacemap      0   free      64G
    metaslab     83   offset  53000000000   spacemap      0   free      64G
    metaslab     84   offset  54000000000   spacemap      0   free      64G
    metaslab     85   offset  55000000000   spacemap      0   free      64G
    metaslab     86   offset  56000000000   spacemap      0   free      64G
    metaslab     87   offset  57000000000   spacemap      0   free      64G
    metaslab     88   offset  58000000000   spacemap      0   free      64G
    metaslab     89   offset  59000000000   spacemap      0   free      64G
    metaslab     90   offset  5a000000000   spacemap      0   free      64G
    metaslab     91   offset  5b000000000   spacemap      0   free      64G
    metaslab     92   offset  5c000000000   spacemap      0   free      64G
    metaslab     93   offset  5d000000000   spacemap      0   free      64G
    metaslab     94   offset  5e000000000   spacemap      0   free      64G
    metaslab     95   offset  5f000000000   spacemap      0   free      64G
    metaslab     96   offset  60000000000   spacemap      0   free      64G
    metaslab     97   offset  61000000000   spacemap      0   free      64G
    metaslab     98   offset  62000000000   spacemap      0   free      64G
    metaslab     99   offset  63000000000   spacemap      0   free      64G
    metaslab    100   offset  64000000000   spacemap      0   free      64G
    metaslab    101   offset  65000000000   spacemap      0   free      64G
    metaslab    102   offset  66000000000   spacemap      0   free      64G
    metaslab    103   offset  67000000000   spacemap      0   free      64G
    metaslab    104   offset  68000000000   spacemap      0   free      64G
    metaslab    105   offset  69000000000   spacemap      0   free      64G
    metaslab    106   offset  6a000000000   spacemap      0   free      64G
    metaslab    107   offset  6b000000000   spacemap      0   free      64G
    metaslab    108   offset  6c000000000   spacemap      0   free      64G
    metaslab    109   offset  6d000000000   spacemap      0   free      64G
    metaslab    110   offset  6e000000000   spacemap      0   free      64G
    metaslab    111   offset  6f000000000   spacemap      0   free      64G
    metaslab    112   offset  70000000000   spacemap      0   free      64G
    metaslab    113   offset  71000000000   spacemap      0   free      64G
    metaslab    114   offset  72000000000   spacemap      0   free      64G
    metaslab    115   offset  73000000000   spacemap      0   free      64G
    metaslab    116   offset  74000000000   spacemap      0   free      64G
    metaslab    117   offset  75000000000   spacemap      0   free      64G
    metaslab    118   offset  76000000000   spacemap      0   free      64G
    metaslab    119   offset  77000000000   spacemap      0   free      64G
    metaslab    120   offset  78000000000   spacemap      0   free      64G
    metaslab    121   offset  79000000000   spacemap      0   free      64G
    metaslab    122   offset  7a000000000   spacemap      0   free      64G
    metaslab    123   offset  7b000000000   spacemap      0   free      64G
    metaslab    124   offset  7c000000000   spacemap      0   free      64G
    metaslab    125   offset  7d000000000   spacemap      0   free      64G
    metaslab    126   offset  7e000000000   spacemap      0   free      64G
    metaslab    127   offset  7f000000000   spacemap      0   free      64G
    metaslab    128   offset  80000000000   spacemap      0   free      64G
    metaslab    129   offset  81000000000   spacemap      0   free      64G

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 50 of 130 ...
 209G completed (1978MB/s) estimated time remaining: 0hr 00min 11sec         
    No leaks (block sum matches space maps exactly)

    bp count:         1522627
    bp logical:    196978038272      avg: 129367
    bp physical:   163064331264      avg: 107094     compression:   1.21
    bp allocated:  247859896320      avg: 162784     compression:   0.79
    bp deduped:             0    ref>1:      0   deduplication:   1.00
    SPA allocated: 247859896320     used:  2.77%

    additional, non-pointer bps of type 0:        964

space map refcount mismatch: expected 18 != actual 12

As you can see I still have the mismatch error, but what's interesting is the numbers reported are now different after the scrub, but the difference on both counts is still 6.

I'm afraid I'm not too knowledgeable on the internals of file systems, but I'm quite happy to run further tests and report pack if the developers wish, so please just let me know. :)

kernelOfTruth commented 8 years ago

@jay-to-the-dee

Hi,

please also provide information on your harddrives, mainboard, RAM (ECC ?), cpu,

and layers in between (cryptsetup/luks ? lvm ? ecryptfs or others ?)

jay-to-the-dee commented 8 years ago

Sure, no problem @kernelOfTruth :) In brief I'm running 3 x 3TB drives in RAIDZ on a workstation setup.

2 x 8GB NON-ECC RAM i5-4670K CPU Gigabyte GA-Z87X-D3H Motherboard 3 x brand new WD30EZRZ WD Blue 3TB hard disks for my ZFS pool in a RAIDZ configuration (the OS itself is ran from an SSD)

No LUKS, LVM, eCryptFS are being used. LZ4 compression is enabled, dedup is disabled. All the drives are attached to the motherboard directly.

uname -a output: Linux haswell 4.2.0-30-generic #36-Ubuntu SMP Fri Feb 26 00:58:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

mooinglemur commented 8 years ago

I apologize for the noise if this comment isn't related to the problem, but my invocation of zdb -mc is failing with an assert. There might be corruption here.

-[~:#]- cat /sys/module/zfs/version 
0.6.5-317_g669cf0a

-[~:#]- cat /sys/module/spl/version 
0.6.5-63_g5ad98ad
-[~:#]- zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0 in 358h2m with 0 errors on Fri Jun 17 23:28:08 2016
config:

        NAME                                                  STATE     READ WRITE CKSUM
        rpool                                                 ONLINE       0     0     0
          raidz2-0                                            ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J859GF-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J801SF-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J80ZSF-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J71V4F-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J85DJF-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J7ZSWF-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J7YV5F-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J81BYF-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J7Y1ZF-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK11A8B9J816ZF-part3  ONLINE       0     0     0
            ata-Hitachi_HDS722020ALA330_JK1171YAGYKL7S-part3  ONLINE       0     0     0
        logs
          mirror-1                                            ONLINE       0     0     0
            ata-OCZ-AGILITY3_OCZ-M935L9UP3HLO32NL-part1       ONLINE       0     0     0
            ata-M4-CT256M4SSD2_000000001221090B4FE5-part1     ONLINE       0     0     0

errors: No known data errors
-[~:#]- zdb -mc rpool          

Metaslabs:
        vdev          0
        metaslabs   158   offset                spacemap          free      
        ---------------   -------------------   ---------------   -------------
        metaslab      0   offset            0   spacemap 2792989   free    12.9G
        metaslab      1   offset   2000000000   spacemap 3257787   free    19.7G
        metaslab      2   offset   4000000000   spacemap 2702229   free     538M
        metaslab      3   offset   6000000000   spacemap 2213359   free    28.2G
        metaslab      4   offset   8000000000   spacemap 2262135   free    34.2G
        metaslab      5   offset   a000000000   spacemap 4292047   free    12.4G
        metaslab      6   offset   c000000000   spacemap 3171610   free    4.23G
        metaslab      7   offset   e000000000   spacemap 112980   free    32.0G
        metaslab      8   offset  10000000000   spacemap 2180939   free    29.7G
        metaslab      9   offset  12000000000   spacemap 2678626   free    33.8G
        metaslab     10   offset  14000000000   spacemap 2705532   free    22.4G
        metaslab     11   offset  16000000000   spacemap 2755856   free    11.1G
        metaslab     12   offset  18000000000   spacemap 2240895   free    14.1G
        metaslab     13   offset  1a000000000   spacemap  92380   free    27.4G
        metaslab     14   offset  1c000000000   spacemap 2138606   free    29.0G
        metaslab     15   offset  1e000000000   spacemap 376935   free    1.31G
        metaslab     16   offset  20000000000   spacemap 376936   free    16.0G
        metaslab     17   offset  22000000000   spacemap 4207089   free    23.0G
        metaslab     18   offset  24000000000   spacemap 4722403   free    30.8G
        metaslab     19   offset  26000000000   spacemap 2751652   free    42.2G
        metaslab     20   offset  28000000000   spacemap 4286587   free    34.6G
        metaslab     21   offset  2a000000000   spacemap 376937   free    3.60G
        metaslab     22   offset  2c000000000   spacemap 376938   free    13.7G
        metaslab     23   offset  2e000000000   spacemap 376939   free    27.8G
        metaslab     24   offset  30000000000   spacemap 4208777   free    25.6G
        metaslab     25   offset  32000000000   spacemap 2731937   free    17.1G
        metaslab     26   offset  34000000000   spacemap 2724927   free     219M
        metaslab     27   offset  36000000000   spacemap 4292046   free    25.2G
        metaslab     28   offset  38000000000   spacemap 112515   free    39.6G
        metaslab     29   offset  3a000000000   spacemap 2116887   free    47.9G
        metaslab     30   offset  3c000000000   spacemap 2213061   free    29.4G
        metaslab     31   offset  3e000000000   spacemap 2705531   free    23.8G
        metaslab     32   offset  40000000000   spacemap 173557   free    44.3G
        metaslab     33   offset  42000000000   spacemap 4749475   free    48.4G
        metaslab     34   offset  44000000000   spacemap 2698341   free    22.8G
        metaslab     35   offset  46000000000   spacemap 4753356   free    28.2G
        metaslab     36   offset  48000000000   spacemap 2804927   free    16.2G
        metaslab     37   offset  4a000000000   spacemap 3148668   free    40.2G
        metaslab     38   offset  4c000000000   spacemap 3290174   free    28.9G
        metaslab     39   offset  4e000000000   spacemap 4765705   free    30.9G
        metaslab     40   offset  50000000000   spacemap 376940   free    24.5G
        metaslab     41   offset  52000000000   spacemap 376941   free    2.24G
        metaslab     42   offset  54000000000   spacemap 376942   free    14.8G
        metaslab     43   offset  56000000000   spacemap 376943   free     378M
        metaslab     44   offset  58000000000   spacemap 3155472   free    29.1G
        metaslab     45   offset  5a000000000   spacemap 3226000   free    32.3G
        metaslab     46   offset  5c000000000   spacemap 2669520   free    32.6G
        metaslab     47   offset  5e000000000   spacemap  92144   free     278M
        metaslab     48   offset  60000000000   spacemap 2678627   free    31.5G
        metaslab     49   offset  62000000000   spacemap 3237314   free    17.1G
        metaslab     50   offset  64000000000   spacemap 2224359   free    21.2G
        metaslab     51   offset  66000000000   spacemap 2700636   free    21.1G
        metaslab     52   offset  68000000000   spacemap 3211357   free    31.0G
        metaslab     53   offset  6a000000000   spacemap 4208546   free    25.1G
        metaslab     54   offset  6c000000000   spacemap 3148667   free    33.7G
        metaslab     55   offset  6e000000000   spacemap 3200913   free    39.5G
        metaslab     56   offset  70000000000   spacemap 3696013   free     414M
        metaslab     57   offset  72000000000   spacemap 3713172   free     504M
        metaslab     58   offset  74000000000   spacemap 2691520   free    39.3G
        metaslab     59   offset  76000000000   spacemap 4749026   free    13.0G
        metaslab     60   offset  78000000000   spacemap 2702230   free    5.13G
        metaslab     61   offset  7a000000000   spacemap 2118476   free    34.4G
        metaslab     62   offset  7c000000000   spacemap 1825661   free    35.9G
        metaslab     63   offset  7e000000000   spacemap 3169421   free    37.5G
        metaslab     64   offset  80000000000   spacemap 3158397   free    26.0G
        metaslab     65   offset  82000000000   spacemap 4763567   free    16.7G
        metaslab     66   offset  84000000000   spacemap 2798415   free    30.1G
        metaslab     67   offset  86000000000   spacemap 2254521   free    38.7G
        metaslab     68   offset  88000000000   spacemap 376944   free    2.32G
        metaslab     69   offset  8a000000000   spacemap 376945   free    1.81G
        metaslab     70   offset  8c000000000   spacemap 376946   free    53.1G
        metaslab     71   offset  8e000000000   spacemap 2179835   free    39.4G
        metaslab     72   offset  90000000000   spacemap 2224360   free    39.6G
        metaslab     73   offset  92000000000   spacemap 376947   free    37.3G
        metaslab     74   offset  94000000000   spacemap 109978   free     570M
        metaslab     75   offset  96000000000   spacemap 2213978   free     108M
        metaslab     76   offset  98000000000   spacemap 376948   free    40.8G
        metaslab     77   offset  9a000000000   spacemap 4785341   free    20.7G
        metaslab     78   offset  9c000000000   spacemap 376949   free    7.00G
        metaslab     79   offset  9e000000000   spacemap 376950   free    37.7G
        metaslab     80   offset  a0000000000   spacemap 376951   free    12.8G
        metaslab     81   offset  a2000000000   spacemap 376952   free    14.2G
        metaslab     82   offset  a4000000000   spacemap 376973   free    36.3G
        metaslab     83   offset  a6000000000   spacemap 376974   free    45.2G
        metaslab     84   offset  a8000000000   spacemap 376975   free    8.80G
        metaslab     85   offset  aa000000000   spacemap 376953   free    27.7G
        metaslab     86   offset  ac000000000   spacemap 376954   free    37.4G
        metaslab     87   offset  ae000000000   spacemap 376955   free     357M
        metaslab     88   offset  b0000000000   spacemap 376956   free    41.7G
        metaslab     89   offset  b2000000000   spacemap 376957   free    49.4G
        metaslab     90   offset  b4000000000   spacemap 376958   free    30.9G
        metaslab     91   offset  b6000000000   spacemap 376959   free    48.3G
        metaslab     92   offset  b8000000000   spacemap 376960   free    1.92G
        metaslab     93   offset  ba000000000   spacemap 376976   free    49.2G
        metaslab     94   offset  bc000000000   spacemap 376961   free    18.8G
        metaslab     95   offset  be000000000   spacemap 376977   free    4.25G
        metaslab     96   offset  c0000000000   spacemap 376962   free    3.84G
        metaslab     97   offset  c2000000000   spacemap 376978   free    19.9G
        metaslab     98   offset  c4000000000   spacemap 376979   free    11.5G
        metaslab     99   offset  c6000000000   spacemap 376980   free    41.9G
        metaslab    100   offset  c8000000000   spacemap 376981   free    34.7G
        metaslab    101   offset  ca000000000   spacemap 376963   free    42.0G
        metaslab    102   offset  cc000000000   spacemap 376964   free    45.2G
        metaslab    103   offset  ce000000000   spacemap 4758199   free    31.3G
        metaslab    104   offset  d0000000000   spacemap 2241127   free    36.6G
        metaslab    105   offset  d2000000000   spacemap 2103132   free    47.1G
        metaslab    106   offset  d4000000000   spacemap 2731942   free    30.1G
        metaslab    107   offset  d6000000000   spacemap 376965   free    5.19G
        metaslab    108   offset  d8000000000   spacemap 4765186   free    15.7G
        metaslab    109   offset  da000000000   spacemap 4818001   free    4.80G
        metaslab    110   offset  dc000000000   spacemap 376966   free    4.27G
        metaslab    111   offset  de000000000   spacemap 376982   free    7.42G
        metaslab    112   offset  e0000000000   spacemap 3696012   free    25.0G
        metaslab    113   offset  e2000000000   spacemap 376983   free    5.52G
        metaslab    114   offset  e4000000000   spacemap 376967   free    10.1G
        metaslab    115   offset  e6000000000   spacemap 4762304   free    7.84G
        metaslab    116   offset  e8000000000   spacemap 3200912   free    35.9G
        metaslab    117   offset  ea000000000   spacemap 376984   free    2.26G
        metaslab    118   offset  ec000000000   spacemap 376968   free    35.7G
        metaslab    119   offset  ee000000000   spacemap 376985   free    5.70G
        metaslab    120   offset  f0000000000   spacemap 376986   free    3.61G
        metaslab    121   offset  f2000000000   spacemap 376987   free     686M
        metaslab    122   offset  f4000000000   spacemap 376988   free    1.47G
        metaslab    123   offset  f6000000000   spacemap 376989   free    1.05G
        metaslab    124   offset  f8000000000   spacemap 376990   free     961M
        metaslab    125   offset  fa000000000   spacemap 376991   free    3.39G
        metaslab    126   offset  fc000000000   spacemap 376992   free    1.05G
        metaslab    127   offset  fe000000000   spacemap 376993   free    32.2G
        metaslab    128   offset 100000000000   spacemap 376994   free    30.4G
        metaslab    129   offset 102000000000   spacemap 110854   free    53.2G
        metaslab    130   offset 104000000000   spacemap 376969   free    25.8G
        metaslab    131   offset 106000000000   spacemap 376995   free    13.7G
        metaslab    132   offset 108000000000   spacemap 376996   free     664M
        metaslab    133   offset 10a000000000   spacemap 4772027   free    51.6G
        metaslab    134   offset 10c000000000   spacemap 376970   free    44.4G
        metaslab    135   offset 10e000000000   spacemap 376997   free    58.3G
        metaslab    136   offset 110000000000   spacemap 376998   free    15.7G
        metaslab    137   offset 112000000000   spacemap 4765185   free    10.9G
        metaslab    138   offset 114000000000   spacemap 4722425   free    33.6G
        metaslab    139   offset 116000000000   spacemap 2258865   free    44.1G
        metaslab    140   offset 118000000000   spacemap 3290173   free    35.4G
        metaslab    141   offset 11a000000000   spacemap 2705590   free    40.0G
        metaslab    142   offset 11c000000000   spacemap 376971   free    2.95G
        metaslab    143   offset 11e000000000   spacemap 3226001   free    29.8G
        metaslab    144   offset 120000000000   spacemap 2247699   free    49.9G
        metaslab    145   offset 122000000000   spacemap 109977   free    51.7G
        metaslab    146   offset 124000000000   spacemap 2191863   free    32.3G
        metaslab    147   offset 126000000000   spacemap 2628237   free    24.5G
        metaslab    148   offset 128000000000   spacemap 376972   free    7.72G
        metaslab    149   offset 12a000000000   spacemap 113602   free    51.5G
        metaslab    150   offset 12c000000000   spacemap 2179836   free    34.1G
        metaslab    151   offset 12e000000000   spacemap 2116843   free    58.9G
        metaslab    152   offset 130000000000   spacemap 1717792   free    51.6G
        metaslab    153   offset 132000000000   spacemap 2716506   free    47.4G
        metaslab    154   offset 134000000000   spacemap 3176173   free    28.9G
        metaslab    155   offset 136000000000   spacemap 1824803   free    33.0G
        metaslab    156   offset 138000000000   spacemap 4268866   free    36.2G
        metaslab    157   offset 13a000000000   spacemap 3220982   free    51.4G

        vdev          1
        metaslabs   127   offset                spacemap          free      
        ---------------   -------------------   ---------------   -------------
        metaslab      0   offset            0   spacemap  92118   free    63.9M
        metaslab      1   offset      4000000   spacemap 114010   free      64M
        metaslab      2   offset      8000000   spacemap  92117   free      64M
        metaslab      3   offset      c000000   spacemap 2115079   free      64M
        metaslab      4   offset     10000000   spacemap 2115078   free      64M
        metaslab      5   offset     14000000   spacemap 155616   free      64M
        metaslab      6   offset     18000000   spacemap 155626   free      64M
        metaslab      7   offset     1c000000   spacemap 155625   free    64.0M
        metaslab      8   offset     20000000   spacemap 155624   free      64M
        metaslab      9   offset     24000000   spacemap 155623   free      64M
        metaslab     10   offset     28000000   spacemap 155622   free      64M
        metaslab     11   offset     2c000000   spacemap 155621   free      64M
        metaslab     12   offset     30000000   spacemap 155620   free      64M
        metaslab     13   offset     34000000   spacemap 155648   free      64M
        metaslab     14   offset     38000000   spacemap 155647   free      64M
        metaslab     15   offset     3c000000   spacemap 155646   free      64M
        metaslab     16   offset     40000000   spacemap 155645   free      64M
        metaslab     17   offset     44000000   spacemap 155644   free      64M
        metaslab     18   offset     48000000   spacemap 155643   free      64M
        metaslab     19   offset     4c000000   spacemap 155642   free      64M
        metaslab     20   offset     50000000   spacemap 155641   free      64M
        metaslab     21   offset     54000000   spacemap 155640   free      64M
        metaslab     22   offset     58000000   spacemap 155639   free      64M
        metaslab     23   offset     5c000000   spacemap 155638   free      64M
        metaslab     24   offset     60000000   spacemap 155637   free      64M
        metaslab     25   offset     64000000   spacemap 155636   free      64M
        metaslab     26   offset     68000000   spacemap 155635   free      64M
        metaslab     27   offset     6c000000   spacemap 155634   free      64M
        metaslab     28   offset     70000000   spacemap 155633   free      64M
        metaslab     29   offset     74000000   spacemap 155632   free      64M
        metaslab     30   offset     78000000   spacemap 155631   free      64M
        metaslab     31   offset     7c000000   spacemap 155630   free      64M
        metaslab     32   offset     80000000   spacemap 155629   free      64M
        metaslab     33   offset     84000000   spacemap 155628   free      64M
        metaslab     34   offset     88000000   spacemap 155627   free      64M
        metaslab     35   offset     8c000000   spacemap 155739   free      64M
        metaslab     36   offset     90000000   spacemap 155738   free      64M
        metaslab     37   offset     94000000   spacemap 155737   free      64M
        metaslab     38   offset     98000000   spacemap 155736   free      64M
        metaslab     39   offset     9c000000   spacemap 155735   free      64M
        metaslab     40   offset     a0000000   spacemap 155734   free      64M
        metaslab     41   offset     a4000000   spacemap 155733   free      64M
        metaslab     42   offset     a8000000   spacemap 155732   free      64M
        metaslab     43   offset     ac000000   spacemap 155731   free      64M
        metaslab     44   offset     b0000000   spacemap 155730   free      64M
        metaslab     45   offset     b4000000   spacemap 155729   free      64M
        metaslab     46   offset     b8000000   spacemap 155728   free      64M
        metaslab     47   offset     bc000000   spacemap 155727   free      64M
        metaslab     48   offset     c0000000   spacemap 155726   free      64M
        metaslab     49   offset     c4000000   spacemap 155725   free      64M
        metaslab     50   offset     c8000000   spacemap 155724   free    64.0M
        metaslab     51   offset     cc000000   spacemap 155720   free      64M
        metaslab     52   offset     d0000000   spacemap 155719   free      64M
        metaslab     53   offset     d4000000   spacemap 155718   free      64M
        metaslab     54   offset     d8000000   spacemap 155717   free    63.9M
        metaslab     55   offset     dc000000   spacemap 155715   free      64M
        metaslab     56   offset     e0000000   spacemap 155714   free      64M
        metaslab     57   offset     e4000000   spacemap 155713   free      64M
        metaslab     58   offset     e8000000   spacemap 155712   free      64M
        metaslab     59   offset     ec000000   spacemap 155711   free      64M
        metaslab     60   offset     f0000000   spacemap 155710   free      64M
        metaslab     61   offset     f4000000   spacemap 155709   free    64.0M
        metaslab     62   offset     f8000000   spacemap 155708   free      64M
        metaslab     63   offset     fc000000   spacemap 155707   free      64M
        metaslab     64   offset    100000000   spacemap 155706   free      64M
        metaslab     65   offset    104000000   spacemap 155705   free      64M
        metaslab     66   offset    108000000   spacemap 155704   free      64M
        metaslab     67   offset    10c000000   spacemap 155703   free      64M
        metaslab     68   offset    110000000   spacemap 155701   free      64M
        metaslab     69   offset    114000000   spacemap 155700   free      64M
        metaslab     70   offset    118000000   spacemap 155694   free      64M
        metaslab     71   offset    11c000000   spacemap 155691   free      64M
        metaslab     72   offset    120000000   spacemap 155690   free      64M
        metaslab     73   offset    124000000   spacemap 155689   free      64M
        metaslab     74   offset    128000000   spacemap 155688   free      64M
        metaslab     75   offset    12c000000   spacemap 155687   free      64M
        metaslab     76   offset    130000000   spacemap 155686   free      64M
        metaslab     77   offset    134000000   spacemap 155685   free    64.0M
        metaslab     78   offset    138000000   spacemap 155684   free      64M
        metaslab     79   offset    13c000000   spacemap 155683   free    64.0M
        metaslab     80   offset    140000000   spacemap 155682   free    63.1M
        metaslab     81   offset    144000000   spacemap 155681   free      64M
        metaslab     82   offset    148000000   spacemap 155680   free      64M
        metaslab     83   offset    14c000000   spacemap 4348072   free      64M
        metaslab     84   offset    150000000   spacemap 4348080   free      64M
        metaslab     85   offset    154000000   spacemap 4348082   free      64M
        metaslab     86   offset    158000000   spacemap 4348096   free      64M
        metaslab     87   offset    15c000000   spacemap 4348099   free      64M
        metaslab     88   offset    160000000   spacemap 4348107   free      64M
        metaslab     89   offset    164000000   spacemap 4348112   free      64M
        metaslab     90   offset    168000000   spacemap 4348117   free      64M
        metaslab     91   offset    16c000000   spacemap 4348122   free      64M
        metaslab     92   offset    170000000   spacemap 4348131   free      64M
        metaslab     93   offset    174000000   spacemap 4348144   free      64M
        metaslab     94   offset    178000000   spacemap 4348146   free      64M
        metaslab     95   offset    17c000000   spacemap 4348148   free      64M
        metaslab     96   offset    180000000   spacemap 4348149   free      64M
        metaslab     97   offset    184000000   spacemap 4348151   free      64M
        metaslab     98   offset    188000000   spacemap 4348152   free      64M
        metaslab     99   offset    18c000000   spacemap 4348216   free      64M
        metaslab    100   offset    190000000   spacemap 4348227   free      64M
        metaslab    101   offset    194000000   spacemap 4348235   free      64M
        metaslab    102   offset    198000000   spacemap 4348239   free      64M
        metaslab    103   offset    19c000000   spacemap 4348256   free      64M
        metaslab    104   offset    1a0000000   spacemap 4348290   free      64M
        metaslab    105   offset    1a4000000   spacemap 4348302   free      64M
        metaslab    106   offset    1a8000000   spacemap 4348374   free      64M
        metaslab    107   offset    1ac000000   spacemap 4348376   free      64M
        metaslab    108   offset    1b0000000   spacemap 4348378   free      64M
        metaslab    109   offset    1b4000000   spacemap 4348379   free      64M
        metaslab    110   offset    1b8000000   spacemap 4348380   free      64M
        metaslab    111   offset    1bc000000   spacemap 4348381   free      64M
        metaslab    112   offset    1c0000000   spacemap 4348415   free      64M
        metaslab    113   offset    1c4000000   spacemap 4348448   free      64M
        metaslab    114   offset    1c8000000   spacemap 4348467   free      64M
        metaslab    115   offset    1cc000000   spacemap 4348474   free      64M
        metaslab    116   offset    1d0000000   spacemap 4348486   free      64M
        metaslab    117   offset    1d4000000   spacemap 4348488   free      64M
        metaslab    118   offset    1d8000000   spacemap 4348494   free      64M
        metaslab    119   offset    1dc000000   spacemap 4348497   free      64M
        metaslab    120   offset    1e0000000   spacemap 4348509   free      64M
        metaslab    121   offset    1e4000000   spacemap 4348512   free      64M
        metaslab    122   offset    1e8000000   spacemap 4348513   free      64M
        metaslab    123   offset    1ec000000   spacemap 4348514   free      64M
        metaslab    124   offset    1f0000000   spacemap 4348516   free      64M
        metaslab    125   offset    1f4000000   spacemap 4348517   free      64M
        metaslab    126   offset    1f8000000   spacemap 4348519   free      64M

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 2, metaslab 23 of 158 ...space_map_load(msp->ms_sm, msp->ms_tree, SM_ALLOC) == 0 (0x34 == 0x0)
ASSERT at zdb.c:2668:zdb_leak_init()Aborted
kleini commented 6 years ago

I have a broken space map, too. Is there some way to get the data of the filesystem?

root@ubuntu:~# zdb -b -e rpool

Traversing all blocks to verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 0 of 159 ...zdb: ../../module/zfs/range_tree.c:261: Assertion `rs->rs_start <= start (0xda000 <= 0<0)' failed.
Aborted
root@ubuntu:~#

Importing the pool always fails:

root@ubuntu:~# zpool import -N -f -R /mnt rpool
[ 1277.024109] VERIFY(rs == NULL) failed
[ 1277.024273] PANIC at range_tree.c:186:range_tree_add()
sempervictus commented 6 years ago

Not that I know. IIUC, the issue is that ZFS can only traverse block pointers when scrubbing, going from one block to another. Seeing as the free space isn't data blocks and has no ptrs to it, it can't be verified as free the same way data is verified as accurate. We originally found the bug a few years back, and I believe it was the maker himself who told me this was a very non trivial issue. Offline scrubs or fsck are probably the way to go here since the FSM can't change during an offline op outside the locks held by the scrub.

mabod commented 6 years ago

@behlendorf wrote in Feb. 2015:

I absolutely agree we need a tool which can rebuild the space maps.

3 years later I have a couple of question about this:

  1. This refcount mismatch bug is still open. It looks like the Devs all agree that fixing this bug is not a priority. Does that mean that it is not critical for data integrity? Can somebody please share a risk assessment.

  2. Fixing refcount mismatches with an extra tool is one thing. And I would highly appreciate a tool like this. But why do these refcount mismatches exist in the first place? What is going wrong?

bud4 commented 6 years ago

Il 24 feb 2018 10:28, "mabod" notifications@github.com ha scritto:

@behlendorf https://github.com/behlendorf worte in Feb. 2015:

I absolutely agree we need a tool which can rebuild the space maps.

+1

3 years later I have a couple of question about this:

1.

This refcount mismatch bug is still open. It looks like the Devs all agree that fixing this bug is not a priority. Does that mean that it is not critical for data integrity? Can somebody please share a risk assessment. 2.

Fixing refcount mismatches with an extra tool is one thing. And I would highly appreciate a tool like this. But why do these refcount mismatches exist in the first place? What is going wrong?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/3111#issuecomment-368214880, or mute the thread https://github.com/notifications/unsubscribe-auth/AOSfJcGfo5YUruMX9SiQ805V-BHLlxh8ks5tX9ZOgaJpZM4Dg6lG .

Orfheo commented 6 years ago

I've around 7 pools, of different sizes, of different ages, on 7 different systems.

Five of them has feature@spacemap_histogram disabled, while 2 of them have this feature enabled.

Both the systems with feature@spacemap_histogram enabled, one gentoo stable, ZFS-v0.7.8-r0-gentoo and one Scientific-Linux 6.8, ZFS-v0.6.5.8-1, show a "space map refcount mismatch".

The two systems are completely different, one is old, about six years, the other is more or less new, less than one year, the first hasn't ECC RAM, the second has, both share the same mismatch. The pool of the Gentoo system has been created years ago, with a completely different version of ZFS and the pool has been upgraded along the way.

I can't see any other error on my systems, beside this mismatch, scrub is clean, no hardware errors, "zdb -mc" doesn't report any leak.

Of course I've "zfs send" backups. Should I worry?

dioni21 commented 6 years ago

@sempervictus If I understood correctly, you asked for scrub to fix spacemaps online. Somehow I understood that they could be fixed offline. Did I understand correctly? If so, how?

Some other people here reported assertions in their ZDB runs. I have a pool with this right now. Not only assertion (that I could bypass with -AAA), but also segmentation fault, always after the same amount of space:

# pwd
/root/zfs/zfs/cmd/zdb/
# ./zdb -cccvvAAAs -I 400 tank

Traversing all blocks to verify checksums and verify nothing leaked ...

loading concrete vdev 1, metaslab 14 of 15 .....
22.1G completed (   2MB/s) estimated time remaining: 257hr 51min 50sec        Segmentation fault (core dumped)

Again:

# time ./zdb -cccvAAAs -I 100 tank

Traversing all blocks to verify checksums and verify nothing leaked ...

loading concrete vdev 1, metaslab 14 of 15 .....
22.1G completed (   2MB/s) estimated time remaining: 294hr 17min 18sec        Segmentation fault (core dumped)

real    262m49.257s
user    47m40.745s
sys     11m27.985s

Again:

# time ./zdb -cccvAAAs -I 50 tank

Traversing all blocks to verify checksums and verify nothing leaked ...

loading concrete vdev 1, metaslab 14 of 15 .....
82.4M completed (  26MB/s) estimated time remaining: 28hr 18min 00sec        zdb_blkptr_cb: Got error 52 reading <0, 0, 0, 0>  -- skipping
 158M completed (   2MB/s) estimated time remaining: 319hr 26min 24sec        zdb_blkptr_cb: Got error 52 reading <0, 12, 1, 0>  -- skipping
 186M completed (   1MB/s) estimated time remaining: 381hr 23min 27sec        zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 4>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 5>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 7>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 8>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 6>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 9>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, a>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, b>  -- skipping
 188M completed (   1MB/s) estimated time remaining: 387hr 44min 15sec        zdb_blkptr_cb: Got error 52 reading <0, 62, 0, c>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, d>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, e>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, f>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 10>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 11>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 12>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 13>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 14>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 15>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 16>  -- skipping
zdb_blkptr_cb: Got error 52 reading <0, 62, 0, 17>  -- skipping
22.2G completed (   1MB/s) estimated time remaining: 733hr 09min 24sec        Segmentation fault (core dumped)

real    554m37.720s
user    35m25.028s
sys     12m24.917s

Again, but this time I think I triggered something very strange:

time ./zdb -cccvvvvvvvvvvvvvAAAs -I 800 tank

Traversing all blocks to verify checksums and verify nothing leaked ...

loading concrete vdev 0, metaslab 116 of 145 ...space_map_load(msp->ms_sm, msp->ms_allocatable, maptype) == 0 (0x34 == 0x0)
ASSERT at zdb.c:3715:load_concrete_ms_allocatable_trees()Aborted (core dumped)

real    6m1.323s
user    0m45.443s
sys     0m21.463s

And this is not an old pool. I have just created it, and send/recv filesystems from an old pool, and found file with wrong checksum.