Closed lundman closed 9 years ago
Looks like a bug specific to ZoL; not present in illumos, where i and last_nonzero are declared as signed numbers. I would guess that when porting to Linux, they changed this code (because they won't use C99) and broke it.
Yes, I can see your signed int in zdb.c and can certainly fix our code here. (We can-do and want, C99, as we only use clang, but there is no upstream repo to switch to yet :) Getting tired of changing all zsb back to zfsvfs.. :) )
But I still need to work out why zdb trips over the assert about large_blocks refcount, since the pool does not have large_blocks enabled, it should skip that feature. I will compare what I have against the original patch and see if anything stands out.
@ahrens Using just IllumOS kernels, I can make this happen:
Create pool with zpool without large_block support:
ftp01.dw:~$ uname -a
SunOS ftp01 5.11 oi_151a6 i86pc i386 i86pc Solaris
root@ftp01:~# mkfile 100M /var/tmp/filepool.bin
root@ftp01:~# zpool create BOOM /var/tmp/filepool.bin
root@ftp01:~# mkdir /BOOM/helloworld
root@ftp01:~# zpool export BOOM
root@ftp01:~# scp /var/tmp/filepool.bin ssh01.newer:/var/tmp/
Import pool on system with large_block support, but do not upgrade
root@ssh01:/export/home/lundman# uname -a
SunOS ssh01 5.11 omnios-170cea2 i86pc i386 i86pc
root@ssh01:/export/home/lundman# zpool import -d /var/tmp/ BOOM
root@ssh01:/export/home/lundman# zdb -ddddddd BOOM
[snip]
Indirect blocks:
0 L0 0:ba00:200 0:120ba00:200 200L/200P F=1 B=7/7
segment [0000000000000000, 0000000000000200) size 512
large_blocks feature refcount mismatch: expected 0 != actual -2748781167608
@behlendorf @ahrens So changing just
for (i = 0; i <= last_nonzero; i++) {
to
for (i = 0; (int64_t)i <= last_nonzero; i++) {
Make it work, so it is interesting that my compiler silently opts to do an unsigned comparison, and IllumOS presumably stays as signed. Either way, since it is so subtle, perhaps we should avoid code that relies on the compiler preference?
But that aside, that above is not what the issue is about. That part is easy to fix, its the problem that comes after fixing it
large_blocks feature refcount mismatch: expected 0 != actual 140489770666256
The procedure to make zdb fail on just IllumOS is listed above. It applies to all versions, and we were hoping for some assistance in fixing it :)
Initially, the stack on OSX is too large with largeblock support, so we have to change to using kmem_alloc()
d516622b0bc0e26301a2f7a34033e4691e44264e
But after that, it can coredump, for example:
This is caused in:
because last_nonzero can be -1 so the for loop will never terminate. Changing it to
for (i = 0; (last_nonzero >= 0) && (i <= last_nonzero); i++) {
will at least avoid this core dump, but we still fail, with the assert being triggered:
So it would seem zdb, and/or, the userland compile of the largeblock support is missing. This seems to be the case on ZOL as well.
Passing the buck upstream in case they can solve it faster than me, but I will take a deep look tomorrow.