Closed mabod closed 6 years ago
Is this pool in trouble?
It depends. This may be harmless if you're running zdb
on an active imported pool. To determine for certain if there's an issue make sure you export the pool then run zdb
on it and see if it detects the same problem.
How am I supposed to deal with this? Any cure?
If you still observe the issue it means that the spacemaps describing the free space in your pool have somehow been damaged. Currently the only guarenteed cure for this is to use zfs send/recv
to copy the contents of the exiting pool to a newly created pool on different devices.
export the pool then run zdb on it
I dont know what you mean. When I export the pool and run zbd -u zstore
zdb: can't open 'zstore': No such file or directory
and zdb -ue zstore
says:
zdb: can't open 'zstore': File exists
If I move /etc/zfs/zpool.cache
and try zdb -ue zstore
again it gives the same error:
Uberblock:
magic = 0000000000bab10c
version = 5000
txg = 162321
guid_sum = 7197309111221853808
timestamp = 1519278560 UTC = Thu Feb 22 06:49:20 2018
mmp_magic = 00000000a11cea11
mmp_delay = 0
space map refcount mismatch: expected 240 != actual 232
I succeeded somehow to get zdb output from the exported pool. I moved /etc/zfs/zpool.cache
and ran zdb -eC zstore
: This gives the same space map refcount mismatch
:
1# zdb -eC zstore
MOS Configuration:
version: 5000
name: 'zstore'
state: 1
txg: 162390
pool_guid: 14393956364711311496
errata: 0
hostid: 4294967295
hostname: 'rakete'
com.delphix:has_per_vdev_zaps
vdev_children: 2
vdev_tree:
type: 'root'
id: 0
guid: 14393956364711311496
children[0]:
type: 'mirror'
id: 0
guid: 6083005762724255950
metaslab_array: 259
metaslab_shift: 35
ashift: 12
asize: 4000771997696
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 129
children[0]:
type: 'disk'
id: 0
guid: 2998250447430390224
path: '/dev/disk/by-vdev/WD-WCC4E5HF3P4S-part1'
devid: 'ata-WDC_WD40EZRZ-00WN9B0_WD-WCC4E5HF3P4S-part1'
phys_path: 'pci-0000:00:17.0-ata-2'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 130
children[1]:
type: 'disk'
id: 1
guid: 8762276478183427200
path: '/dev/disk/by-vdev/WD-WCC4E1SSP28F-part1'
devid: 'ata-WDC_WD40EZRZ-00WN9B0_WD-WCC4E1SSP28F-part1'
phys_path: 'pci-0000:00:17.0-ata-3'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 131
children[1]:
type: 'mirror'
id: 1
guid: 3605884340580067442
metaslab_array: 256
metaslab_shift: 35
ashift: 12
asize: 4000771997696
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 132
children[0]:
type: 'disk'
id: 0
guid: 15809553155304475366
path: '/dev/disk/by-vdev/WD-WCC4E1SSP6NC-part1'
devid: 'ata-WDC_WD40EZRZ-00WN9B0_WD-WCC4E1SSP6NC-part1'
phys_path: 'pci-0000:00:17.0-ata-4'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 133
children[1]:
type: 'disk'
id: 1
guid: 10884614783416580978
path: '/dev/disk/by-vdev/WD-WCC7K7EK9VC4-part1'
devid: 'ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K7EK9VC4-part1'
phys_path: 'pci-0000:00:17.0-ata-1'
whole_disk: 1
create_txg: 4
com.delphix:vdev_zap_leaf: 134
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
space map refcount mismatch: expected 240 != actual 232
So you say this pool is in trouble? What next?
This whole thing is making me nervous. And obviously I am not alone. There are more issues raised about the same thing. E.g. Issue #3094 is open since Feb. 2015 but nowhere an explanation why this is happening and if it dangerous. Possible solutions are discussed since 2015 either. See issue #3111 which is still open.
I am reading all this and can not get my arms around it. If this space map refcount mismatch
is really a problem, and @behlendorf and others suggest it can be a problem, then why are issues open for it since 2015 without any final decision what to do with it? On the other hand, if space map refcount mismatch
is not a problem why arent the developers saying so and closing all these open issue and stop zdb reporting it in the first place. I am confused about how this issue is handled.
The proposed solution
copy the contents of the exiting pool to a newly created pool on different devices.
sounds odd to me. This is not the zero-maintenance experience I was looking for when moving to ZFS.
I want to understand the root cause first before I destroy the pool. I am afraid that the issue happens again shortly after the new pool is in place. In fact, my pool here is just a few weeks old built with new discs. If it takes ZFS just a few weeks to mess up a pool it is not worth creating a new one. And issue #6985 shows evidence that it happens even with newly created pools.
I can only think of 2 root causes:
Either this is a hardware issue and then I need to get new discs or
this is a ZFS issue and ZFS messed it up. If that is the case, then this is a bug in ZFS
So what is it?
This isn't a hardware issue but rather an issue with ZFS. What follows is just my understanding of the matter, which may or may not be correct. There isn't a clear reproducer that can be used to understand why this issue happens, and that's why there's not been much progress in fixing it. There aren't tools to perform offline scanning and correction of issues like this and similar that could rebuild the spacemaps, so the only "fix" is to create a new pool and move all the data to it through zfs send.
Ok. So it does not look like that this gets fixed soon. More than 3 years old and no hope. What a negative surprise! Super-duper, zero maintenance ZFS filesystem has a nasty bug which can corrupt your data and nobody is able to fix it. To me, this is a nightmare which even exceeds the btrfs stories I have heard. I can hardly believe that. Sorry if that sounds rude or polemic but I am currently going through a deep depression ;-)
Ok. Enough wining. Back to the facts and one of my initial questions:
How dangerous is this bug for the data? What must happen so that it causes data corruption? Can somebody please do a risk assessment for this bug?
Hit the same problem after playing with zfs on Gentoo (4.14.21/x86-64
kernel, 0.7.6
zfs/spl)
# zpool status
pool: z
state: ONLINE
scan: resilvered 2.93M in 0h0m with 0 errors on Fri Feb 23 21:00:19 2018
config:
NAME STATE READ WRITE CKSUM
z ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD80EFZX-68UW8N0_xxxxxxxx ONLINE 0 0 0
ata-WDC_WD80EFZX-68UW8N0_xxxxxxxx ONLINE 0 0 0
ata-WDC_WD80EFZX-68UW8N0_xxxxxxxx ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-WDC_WD80EFZX-68UW8N0_xxxxxxxx ONLINE 0 0 0
ata-WDC_WD80EFZX-68UW8N0_xxxxxxxx ONLINE 0 0 0
ata-WDC_WD80EFZX-68UW8N0_xxxxxxxx ONLINE 0 0 0
Trivial to reproduce:
mirror-1
in degraded mode, copy some more data.zdb -m z
shows refcount mismatch
....
space map refcount mismatch: expected 60 != actual 54
Interesting, @ebsebs ! I just realized that my pool - RAID10 - was also imported once with one drive missing. May be that introduced the refcount mismatch for me too.
I'm really curious what devs think about this - either I'm doing something weird (which is possible, I spent only several days trying zfs out) or I don't understand how this can be called production ready...
I don't understand how this can be called production ready.
This is exactly my point. And what I find really odd is that this bug is known since several years and I can not find any risk assessment from any ZFS developer. They have a skeleton in the closet and nobody is talking about it.
Good chance it stores more data than any other disk-level FS, including at well known OS/device/cloud vendors for the everyday nonsense people keep on their devices and unwittingly clone to these systems on regular basis. Its more production ready than anything else meeting the capacity and functional requirements it has... Your options for alternatives can be described as limited or laughable, and i personally suggest you look at the commit logs for EXT4 (not coming close to the capabilities in ZFS) in the Linux kernel, and get a spoon for your humble pie before badmouthing this body of work.
That said, this is annoying, there have been quite a few public discussions on this (skeletons in closets are hidden, for contrast) - see my own comments in issues and IRC, but its not a fatal flaw. The space map referenced in that error you're seeing is the free space map, blocks not used or marked as free during operations (a block which was deleted and not referenced by any snapshots). This isnt a data error, its a "how much space do i have available?" error. There are some other implications to this which you can find in the issues you referenced, but in practice, this doesnt really kill production-grade systems for a few reasons. First off, production engineers tend to have release/deployment tiers which detect operational faults even when functional testing in CI does not. Secondly, a few blocks of free space matter less to the production consumer than you'd think - cost is calculated at the $/GB ratio, so 6 blocks of mismatch are not relevant (unless something actually bad happens). Lastly, they manage their pools at the data layer, not block layer - fragmentation, snaps, disk failure, all of that eventually makes pools "less than optimal" and their contents can be streamed to other pools which implicitly defrags and optimizes the data layout. Sort of pouring a mixture from one glass to another to better distribute the contents.
ZFS can be used for personal systems, home servers, etc. I do, have for years. Its designed for the datacenter however (which is where we use it to back OpenStack clouds with all their fun and varied needs), which includes datacenter engineers. The community is making it more user accessible by the day, and adding more interfaces friendly to the educated end-user, but at it's core, ZFS (nor any system with critical responsibilities) is not a fire-and-forget solution which solves all the problems of a domain for you. There are suggestions about for how this can be addressed, if you've the bandwidth to tackle them, it would be awesome to close out all the issues listed.
@sempervictus, I know what production means, thank you very much, I work for a company in Mountain View. I've read all previous discussions I could find and your take on this problem directly contradicts what @behlendorf said elsewhere that this bug can actually cause data loss. I personally could care less if this refcount inconsistency only causes some free space loss or "less than optimal" data layout.
could, in theory, if you manage to map a block as having been freed when it isn't. If this was a serious problem experienced in the field, in prod, it'd be addressed by the companies who sell ZFS commercially. Since that's not the case, they're still making money, and have not been sued into the dirt for losing data, one may conclude that it's a nuisance not a critical bug. We have run petabytes of data through ZoL and BSD, would be complaining a bunch more if it was being eaten. If you have the time, a free space verification tool along the lines of the offline scrub PR would be useful.
@sempervictus : Thank you for this clarification. My concerns regarding this bug stem from the fact that only very few experts have actually assesst the risk associated with this bug. One of those few was @behlendorf who wrote in #3094:
if a spacemap were somehow wrong it could result in permanent damage to a file. For example, if the spacemap indicated a block was free when it wasn't then a new write could overwrite existing data in a snapshot.
He actually did this in a direct reply to you. For me this contradicts what you are stating here:
this is annoying ... but ... its not a fatal flaw ... This isnt a data error
I hope you can understand that this is confusing for amateurs like me. Therefor I very much appreciate that experts like you or behlendorf and others come to a final, mutually agreed risk assessment of this issue. That was my intention when I opened this issue.
Sorry for the confusion, this is a problem which can, mathematically result in such a condition. My commentary is intended to point out that practical collisions between a block marked as free when its not and allocated to other data are not common. Situations where you have allocation pressure coupled with these issues could get bad I imagine. I could see a nearly full pool doing file erasures and writing other data while having mirror members removed this way actually cause an issue if the removal can result in a mismatch claiming free when used as opposed to not marking free when its not actually used anymore. May merit some testing to see if we can actually merc data. Betcha that'll light a fire to fix the root cause.
-------- Original Message -------- From: mabod notifications@github.com Sent: Saturday, February 24, 2018 04:17 PM To: zfsonlinux/zfs zfs@noreply.github.com Subject: Re: [zfsonlinux/zfs] Question about "refcount mismatch" (#7204) CC: RageLtMan rageltman@sempervictus.com,Mention mention@noreply.github.com
@sempervictus : Thank you for this clarification. My concerns regarding this bug stem from the fact that only very few experts have actually assesst the resik associated with this bug. One of those few was @behlendorf who wrote in #3094:
if a spacemap were somehow wrong it could result in permanent damage to a file. For example, if the spacemap indicated a block was free when it wasn't then a new write could overwrite existing data in a snapshot.
He actually did this in a direct reply to you. For me this contradicts what you are stating here:
this is annoying ... but ... its not a fatal flaw ... This isnt a data error
I hope you can understand that this is confusing for amateurs like me. Therefor I very much appreciate that experts like you or behlendorf and others come to a final, mutually agreed risk assessment of this issue. That was my intention when I opened this issue.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/7204#issuecomment-368260988
@mabod @ebsebs believe me I share your concern about this issue. I'd like nothing more than to get to the root cause of the problem and resolve it. Unfortunately, reproducing this specific issue so it can be understood has proven to be challenging. Despite having good test coverage for this in the test suite (including ztest
which verifies the space maps frequently) the issue has not been reproducible. In practice, we also don't see it manifesting itself on the vast majority of systems. It's exceptionally rare given the number of systems deployed. For example, I've never seen it on any of our 1000+ systems which have been running for many years now.
As for implementing workarounds like rebuilding the space maps after they've been damaged there's no doubt that would be a convenient feature. However, we'd much rather get to the underlying root cause and resolve whatever is causing this so that's never needed.
@ebsebs thank you for trying to put together a reproducer for this. You got my hopes up there. Unfortunately I wasn't able to reproduce the issue using your method with 0.7.6 or zfs-0.7.0-340-g4af6873af6 on CentOS 7.4. If someone is able to reproduce this issue in a VM I'd love to dig in to it and get it resolved. But until it can somehow be reproduced in a controlled test environment there's not much which can be done.
I'm unlocking this issue and closing it as a duplicate #3094. Again I'm happy to investigate if someone can figure out how to reproduce it reliably.
@behlendorf, I didn't mention but I had spacemap_histogram feature enabled by default on zpool when I was able to reproduce the problem. If I don't have this feature enabled, I cannot reproduce it anymore. What kind of setup you'd prefer for a repro? I can try in Gentoo VM (vmware fusion)
@ebsebs pretty much any distribution I can easily install in a VM locally would be fine. I'm not particularly familar with Gentoo so I'd prefer one of the other major distributions with a stock kernel if that's possible. But the key thing would be the specific steps to reproduce it since then we'll hopefully be able to distill it down in to a test case which can be added to the full test suite. Incidentally, I did have the spacemap_histogram feature enabled and couldn't reproduce it.
OK, I'll try to come up with something in VM with loop devices. BTW, I do use unmodified upstream 4.14.21/x86-64 kernel in Gentoo, if it matters...
@ebsebs thanks. For what it's worth, there's nothing too special about using sparse files for vdevs, that might make it a little easier to work up a reproducer.
So far my attempts to reproduce this using sparse files directly or through loop devices were unsuccessful. Same goes for virtual SCSI disks under VMware Fusion... I'll keep trying
@behlendorf , @ebsebs
I was not able to reproduce with loop device but I was able to reproduce it 3 times with an arch linux installation in a virtualbox. I am using virutalbox 5.2.6 r120293 on Manjaro. Installed in virtualbox is arch linux xfce desktop. The steps to reproduce:
0.) Install arch linux + xfce + zfs in virtualbox and bring up-to-date. arch linux is setup to import pools during boot.
1.) In virtualbox create 4 additonal disc for the arch linux image: zt1 zt2 zt3 zt4.
Each 2 GB in size.
2.) Boot arch linux and create the pool:
zpool create testz mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde
3.) check with zdb -m testz
that everything is ok
4.) copy some files to the pool: cp /sbin/* /testz
5.) poweroff
6.) in Virtualbox GUI remove disc zt4
and boot
7.) zpool status
shows degraded pool. zdb -m testz
is ok.
8.) copy more files: cp /bin/* /testz
9.) poweroff
10.) add disc zt4
again and boot
11.) zpool status shows that pool resilvered automatically
12.) delete some files: rm /testz/l*
13.) reboot
14.) zdb -m testz
shows space map refcount mismatch: expected 105 != actual 102
I did created and destroyed the pool 3 times and could reproduce the behaviour 3 times. It looks like the issue is happening when I am deleting files from a freshly resilvered pool. Only after reboot the refcount mismatch occurs.
arch linux is setup to import pools during boot.
@mabod "scan" import or "cachefile" import? Can you please post the full output of zpool history $poolname
?
Its cachefile: `` History for 'testz': 2018-02-27.22:15:07 zpool create testz mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde 2018-02-27.22:16:35 zpool import -c /etc/zfs/zpool.cache -aN 2018-02-27.22:18:26 zpool import -c /etc/zfs/zpool.cache -aN 2018-02-27.22:21:59 zpool import -c /etc/zfs/zpool.cache -aN 2018-02-27.22:25:08 zpool import -c /etc/zfs/zpool.cache -aN
arch is setup with the following zfs related services:
zfs-import-cache.service loaded active exited Import ZFS pools by cache file
zfs-mount.service loaded active exited Mount ZFS filesystems
zfs-zed.service loaded active running ZFS Event Daemon (zed)
zfs-import.target loaded active active ZFS pool import target
zfs.target loaded active active ZFS startup target
It is even easier to reproduce. Here is the command history of my last tests:
223 zpool create testz mirror /dev/sdb /dev/sdc mirror /dev/sdd /dev/sde
224 cp /bin/* /testz/
225 zpool status # all ok
226 zdb -m testz # all ok
227 poweroff # remove drive zt4 and boot
228 zpool status # see degraded pool
229 poweroff # add missing zt4 and boot
230 zpool status # see resilvered pool
231 zdb -m testz # all ok
232 reboot
233 zpool status # see resilvered pool
234 zdb -m testz # refcount mismatch showing up
So it has to do with resilver and reboot. There are no file operations involved while the pool is degraded or after it has been resilvered.
By default, virtualbox virtual disks ignore cache flush commands. Since many filesystems, including ZFS, rely on cache flushes to work, you'll want to enable them and repeat your test. https://www.virtualbox.org/manual/ch12.html#idp47569020608528
@richardelling I mean, whether the disks ignore cache flush or not, they should flush completely before VM teardown, so that shouldn't really matter here?
@richardelling I followed your advice with the virtualbox cache flush. Setting
VBoxManage setextradata "VM name" "VBoxInternal/Devices/ahci/0/LUN#[0-5]/Config/IgnoreFlush" 0
for all discs. But that is somehow breaking zfs functionality. With this setting zfs does not resilver anymore. It says that the drive is FAULTED and I have to do a zpool replace
instead which is not giving refcount mismatches.
With the original settings I was able to reproduce the issue also with raidz1 also.
And I also followed @kpande advice and I exported the pools and removed the cache file before doing a zdb -me testz
@rincebrain I think if cache flushes are ignored, then all bets are off because there is no way to verify the cache gets flushed.
@mabod this too is troubling. Since others have trouble recreating, perhaps testing with a solution other than VBox can eliminate one variable?
FWIW, I cannot reproduce with VMware Fusion.
I sacrificed one of my JBODs to do the test with my real PC. JBOD with 4 discs as RAID10, pool name testz
like before.
I have executed the same steps as mentioned before and I was able to reproduce the refcount mismatch although with different zfs behaviour.
zfs-import-cache.service
is giving the following error:Mär 01 07:44:29 rakete zpool[313]: cannot import 'testz': one or more devices is currently unavailable
Mär 01 07:44:31 rakete systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
Mär 01 07:44:31 rakete systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
Mär 01 07:44:31 rakete systemd[1]: Failed to start Import ZFS pools by cache file.
I imported the pool manually. But did not do any operations with it.
The pool is not automatically resilvered. I have to import manually. It says the a device is FAULTY:
pool: testz
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
testz DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
mirror-1 DEGRADED 0 0 7
sdh ONLINE 0 0 7
3899446523646515953 FAULTED 0 0 0 was /dev/sdh1
errors: No known data errors
In this stage the pool is showing refcount mismatches.
zpool export testz
zdb -me testz
...
metaslab 114 offset 7200000000 spacemap 0 free 4G
metaslab 115 offset 7300000000 spacemap 0 free 4G
space map refcount mismatch: expected 12 != actual 11
More info:
The pool is not resilvering when reattaching the device. So i dettached the device from the pool and attached it again. This works. Pool is resilvering and ready. zpool clear testz
needed .
But the refcount mismatch persists with higher numbers:
space map refcount mismatch: expected 19 != actual 18
By the way, I did this test multiple times also with device names by-id instead of sdh, sdi, etc.. And it makes no difference
@mabod The problem reported by "space map refcount mismatch
" is with the count of space maps that are using the spacemap_histogram
feature flag. An incorrect count here has essentially zero impact on your pool. You can safely continue using the pool without consequence.
We'll investigate the problem. I suspect it has to do with DTL's, which are implemented with spacemaps, and exist on unhealthy pools. The first step will be to determine whether the problem is with the verification code in zdb, or with the actual kernel code. Could you send us the output of zdb -dddd <pool>
?
@kpande I've only skimmed this thread -- can you clarify how the (perhaps spurious) warning message from ZDB is related to a kernel panic (which is obviously much more concerning)?
@kpande no, the space map refcount mismatch
is not at all what freeing free segment
or spacemap walk did not equal the allocated space
means. They are totally orthogonal. The former is an issue with the feature flag refcount (happens to be the feature flag for spacemap_histogram
), with the latter are issues with the contents of the spacemap (which is much more concerning).
@ahrens : I have exported the pool and uploaded the output of zdb -edddd testz
to dropbox:
https://www.dropbox.com/s/pqyc07cqy1d472c/testz-zdb-dddd.txt.gz?dl=0
@mabod Thanks for that. It looks like what's happened is that a space map object has been leaked. So the spacemap_histogram
feature flag refcount is actually correct (there are 22 spacemaps using this feature), but ZDB only located 21 of them. The last one (object 6) doesn't seem to be referenced anywhere. I suspect that it's a DTL object that has been leaked. The leaked object is empty. There is no impact on the pool and you can safely continue using it (or other pools with similar errors). I'll continue investigating to see how this object could be leaked.
I'm assuming this pool was generated with the steps described in your previous comment: https://github.com/zfsonlinux/zfs/issues/7204#issuecomment-369134056
@ahrens : Thanks for the feedback. The output was generated from the JBOD pool I mentioned a few comments later. https://github.com/zfsonlinux/zfs/issues/7204#issuecomment-369494935
Please share with me how you did determine the issue about Object 6 since I want to debug my real productive pool the same way.
Object lvl iblk dblk dsize dnsize lsize %full type
6 1 128K 4K 0 512 4K 100.00 SPA space map
320 bonus SPA space map header
dnode flags: USED_BYTES
dnode maxblkid: 0
Is this the Object you are refering to? And why is Object 6 the "last one"? There are a lot more. And what do you mean when you say "leaked"? What does that term mean in this context?
@mabod Yes, that's the object. I found it by matching up all the other SPA space map
objects with the ones referenced by metaslabs or DTL's; see examples below.
"Leaked" means that this (space map) object in the MOS is not referenced by any other data structures in the pool. So there isn't any way for it to be freed. Assuming that it should have been freed, the only impact is a tiny bit of space (hundreds of bytes) being unnecessarily used.
Below, the spacemap XXX
is the object ID of the spacemap for that metaslab:
Metaslabs:
vdev 0
metaslabs 116 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 139 free 16.0G
metaslab 1 offset 400000000 spacemap 2 free 15.1G
metaslab 2 offset 800000000 spacemap 3 free 16.0G
metaslab 3 offset c00000000 spacemap 7 free 16.0G
metaslab 4 offset 1000000000 spacemap 10 free 16.0G
And below, the DTL: XXX
is the object ID of the DTL for that vdev:
Object lvl iblk dblk dsize dnsize lsize %full type
58 1 128K 16K 4.50K 512 16K 100.00 packed nvlist
8 bonus packed nvlist size
...
children[0]:
type: 'disk'
id: 0
guid: 15069111281487308802
path: '/dev/disk/by-id/usb-WDC_WD20_EARX-00PASB0_152D00539000-0:0-part1'
devid: 'usb-WDC_WD20_EARX-00PASB0_152D00539000-0:0-part1'
phys_path: 'pci-0000:00:14.0-usb-0:8:1.0-scsi-0:0:0:0'
whole_disk: 1
DTL: 514
If you find the leaked objects on your production pool, could you send me the ZDB output for those objects too? They might provide a clue as to when/how they were leaked.
FYI, I've augmented ZDB to find these kinds of MOS object leaks automatically, and we'll be upstreaming that soon.
I am afraid I can not handle my own debugging. The zdb -edddd
dump for my main pool is 3 GB. Is there any way to reduce the amount of debug info but still find what I am looking for?
@mabod you can definitely remove everything after the MOS part of that zdb output (unfortunately I don't think there's a good way to tell it to dump the MOS only - that would be a nice, straightforward enhancement). Specifically, if you search for Dataset
, you'll see the MOS and then the dataset with the same name as the pool. You can cut everything after that. For example:
...
Dataset mos [META], ID 0, cr_txg 4, 105K, 64 objects, rootbp DVA[0]=<1:400010400:200> DVA[1]=<0:1000010400:200> DVA[2]=<1:1a00010000:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique triple size=800L/200P birth=153L/153P fill=64 cksum=845a87cf3:3615bb07b1d:b4283f1d7d3f:197c12dafded98
...
Dataset testz [ZPL], ID 51, cr_txg 1, 1.86G, 5809 objects, rootbp DVA[0]=<1:40000a400:200> DVA[1]=<0:100000a400:200> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=148L/148P fill=5809 cksum=1059db9b91:573cfbd47e7:f65ef305870c:1e7a7968d3a443
====== remove everything after the above line ===========
...
@ahrens : I created a python script to strip the zdb output. It is taking everyting between Dataset mos [META]
and Dataset poolname [ZPL]
. That still creates a file of 3 GB. Not much is left out.
Here is the python script I use. The logfile is coming via STDIN, the poolname is a commandline argument:
#!/usr/bin/python
import sys
import re
if(len(sys.argv)<=1):
print("pool name missing")
sys.exit()
start = "Dataset mos \[META\]"
end = "Dataset " + sys.argv[1] + " \[ZPL\]"
for line in sys.stdin:
if( re.search(start, line) ):
print("Found start: " + start)
sys.stdout.write(line)
for line in sys.stdin:
sys.stdout.write(line)
if( re.search(end, line) ):
sys.stdout.write("Found end: " + end)
sys.exit()
@ahrens @mabod any news on the matter? I have two pools which shows the very same problem.
The first shows:
space map refcount mismatch: expected 370 != actual 328
and the second one:
space map refcount mismatch: expected 138 != actual 114
What can/should I report to aid in the debug? From what I read above, full debug output is going to be massive (in the order of few GBs). It is possible to estimate how much free space (in bytes or blocks) leaked?
@shodanshok This message doesn't indicate that space is leaked. There's no practical impact of this bug.
@ahrens Thanks for reply. I was assuming some free space was leaked/lost based on you previous reply:
Assuming that it should have been freed, the only impact is a tiny bit of space (hundreds of bytes) being unnecessarily used
and based on that reddit reply by txgsync
So, I misunderstanded something? While the message cause no harm to the pool, I would really like to understand what it means and why it shows (side note: I also tried with an exported pool, with identical results).
I got same "space map refcount mismatch:" error, but can't find any solution yet.
So after one year discussion, is there any way to get rid of this error now?
Creating a new pool should not be like a solution for production server.
aha, this is really a good solution.... I had some scripts to get dedup information by zdb. But with error, the return code always none-zero. That will make script confused.
bump. I have this bug in 2020-06-30
Bump. I have this problem as well with all of my production devices including disk images that are single dev pools with no issues.
System information
I find a handful of issues here related to "refcount mismatch" but reading them was not helping me to understand the urgency of a message like
space map refcount mismatch: expected 240 != actual 232
Describe the problem you're observing
My pool zstore is not having issues according to zpool status:
But
zdb -u
is reporting arefcount mismatch
I excecuted
zdb -mc zstore
in the hope that this issue gets fixed but it doesnt help. The output at the end looks like this:So my questions are:
Thanks Matthias