Closed terinjokes closed 10 months ago
I'm also hit by this bug, happened after upgrade to ZFS 2.2.0. Mounting tmpfs on portage TMPDIR (build directory for go) solves the issue. Currently on kernel 6.5.9, originally happened on 6.5.7.
No idea if it's worth noting, but the remaining data in the files seems to be a repetition of the same base64 encoded data. Decoding it gave me nothing comprehensible.
After downgrading coreutils from 9.3 to 8.32, I am no longer able to reproduce this corruption. My understanding is 8.32 predates the switch to automatically using reflink?
No idea if it's worth noting, but the remaining data in the files seems to be a repetition of the same base64 encoded data. Decoding it gave me nothing comprehensible.
This is the Go BuildID. Compare the output of a non-corrupted build:
$ file /usr/lib/go/pkg/tool/linux_amd64/compile
/usr/lib/go/pkg/tool/linux_amd64/compile: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=AZG4j339IZ-OAfBcmzF6/X0YMZagYV4o9m0AWIUeg/mhcjufvYjNl4p0WQNIA_/Zm3BcmNOJl0wBrJMwAMl, not stripped
With the data remaining in the file:
$ hexdump -C /var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/compile
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000fa0 00 00 00 00 00 00 00 00 00 00 00 00 41 5a 47 34 |............AZG4|
00000fb0 6a 33 33 39 49 5a 2d 4f 41 66 42 63 6d 7a 46 36 |j339IZ-OAfBcmzF6|
00000fc0 2f 58 30 59 4d 5a 61 67 59 56 34 6f 39 6d 30 41 |/X0YMZagYV4o9m0A|
00000fd0 57 49 55 65 67 2f 6d 68 63 6a 75 66 76 59 6a 4e |WIUeg/mhcjufvYjN|
00000fe0 6c 34 70 30 57 51 4e 49 41 5f 2f 5a 6d 33 42 63 |l4p0WQNIA_/Zm3Bc|
00000ff0 6d 4e 4f 4a 6c 30 77 42 72 4a 4d 77 41 4d 6c 00 |mNOJl0wBrJMwAMl.|
00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00ac9280 41 5a 47 34 6a 33 33 39 49 5a 2d 4f 41 66 42 63 |AZG4j339IZ-OAfBc|
00ac9290 6d 7a 46 36 2f 58 30 59 4d 5a 61 67 59 56 34 6f |mzF6/X0YMZagYV4o|
00ac92a0 39 6d 30 41 57 49 55 65 67 2f 6d 68 63 6a 75 66 |9m0AWIUeg/mhcjuf|
00ac92b0 76 59 6a 4e 6c 34 70 30 57 51 4e 49 41 5f 2f 5a |vYjNl4p0WQNIA_/Z|
00ac92c0 6d 33 42 63 6d 4e 4f 4a 6c 30 77 42 72 4a 4d 77 |m3BcmNOJl0wBrJMw|
00ac92d0 41 4d 6c 00 00 00 00 00 00 00 00 00 00 00 00 00 |AMl.............|
00ac92e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
01139380 00 00 00 00 00 00 00 00 00 |.........|
01139389
Had the same issue here... Upgraded from 2.1.13 to 2.2.0 last month right after it came out. I'm on Ubuntu and built the ZFS package from the 2.2.0 tag.
I am discovering random files that were being stored after the 2.2.0 upgrade have suffered silent corruption and either have repetitious data like @thulle reported or have large blocks of contiguous zeroes instead of the data that they should be filled with. I don't use or build go
like the other posters.
zpool scrub
reports no errors, zpool status
reports no errors. Let me know if there is any other information needed.
After downgrading coreutils from 9.3 to 8.32, I am no longer able to reproduce this corruption.
This seems to solve the corruption issue on my end too.
https://github.com/openzfs/zfs/issues/11900#issuecomment-927568640 onwards is highly relevant. See also #14753 and #11900 as a whole.
It would be interesting to know if people here have all upgraded their pool for the new block cloning feature or if there's a mix here.
feature@block_cloning active here
My understanding is that block cloning is enabled by default when upgrading to 2.2.0, but my not be getting used unless an application like coreutils
is actually leveraging it.
Can people post the result of this:
zpool get all tank | grep bclone
(where tank
is the name of their pool with the corruption)
kc3000 bcloneused 442M -
kc3000 bclonesaved 1.42G -
kc3000 bcloneratio 4.30x -
My understanding is this: If the result is 0
for both bcloneused
and bclonesaved
then it's safe to say that you don't have silent corruption.
Any non-zero number for either field and you may OR may not have silent corruption.
That is likely but not really safe to say yet, we don't have enough data.
but my not be getting used unless an application like coreutils is actually leveraging it.
Is it accurate to say it's being used when the FICLONE ioctl is used, and possibly when copy_source_range
?
I wouldn't even feel comfortable saying that yet, I agree it's likely, but I'd really like more certainty before declaring "you're safe if not this".
Agreed, we don't know what we don't know, we need to:
coreutils
and building go
.I don't know if there's a more robust way of figuring this out outside of that. And of course I'm handwaving 2 which could be a heavy lift.
I'd assume, not having looked, that it's something like copy_file_range isn't dirtying things so the thing that triggers a force txg sync on SEEK_DATA/SEEK_HOLE with a dirty thing isn't firing.
But as I'm somewhat occupied recovering from being badly ill at home, my time to test and debug this is rather finite, so I wouldn't assume I'll be doing that in a timely fashion.
I agree with @rincebrain on the rough theory. I did look, and landed on code I've looked at before while trying to get my head around corner cases in block cloning. I have been suspicious of it for a while now, and remain so.
When we start to clone a block, we indicate intent to modify the target dbuf by calling dmu_buf_will_clone
, which is a special case that unwinds any changes on the dbuf, and then sets it to DB_NOFILL
.
In dmu.c
:
void
dmu_buf_will_clone(dmu_buf_t *db_fake, dmu_tx_t *tx)
{
dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake;
/*
* Block cloning: We are going to clone into this block, so undirty
* modifications done to this block so far in this txg. This includes
* writes and clones into this block.
*/
mutex_enter(&db->db_mtx);
VERIFY(!dbuf_undirty(db, tx));
ASSERT0P(dbuf_find_dirty_eq(db, tx->tx_txg));
if (db->db_buf != NULL) {
arc_buf_destroy(db->db_buf, db);
db->db_buf = NULL;
}
mutex_exit(&db->db_mtx);
dmu_buf_will_not_fill(db_fake, tx);
}
void
dmu_buf_will_not_fill(dmu_buf_t *db_fake, dmu_tx_t *tx)
{
dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake;
mutex_enter(&db->db_mtx);
db->db_state = DB_NOFILL;
DTRACE_SET_STATE(db, "allocating NOFILL buffer");
mutex_exit(&db->db_mtx);
dbuf_noread(db);
(void) dbuf_dirty(db, tx);
}
It seems there's a window there where the lock is down and the dbuf is not-dirty. That may be a place where a second thread can add a change to that block, which then gets trampled.
Unfortunately dbuf_dirty
and dbuf_undirty
are extremely complex, and the gap is tiny, and without a tight test case its gonna take me a lot of time to track this down, which I don't have at the moment[^1]. So this is an unhelpful comment but maybe it gives someone else an idea of where to look.
[^1]: but you can buy some of my time :wink:
https://github.com/openzfs/zfs/issues/15526#issuecomment-1810800004
My understanding is that block cloning is enabled by default when upgrading to 2.2.0, …
With FreeBSD 14.0-RELEASE, there's complementary use of the vfs.zfs.bclone_enabled
kernel state, 0
(zero):
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # sysctl -d vfs.zfs.bclone_enabled
vfs.zfs.bclone_enabled: Enable block cloning
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # sysctl vfs.zfs.bclone_enabled
vfs.zfs.bclone_enabled: 0
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # freebsd-version -kru ; uname -aKU
14.0-RELEASE
14.0-RELEASE
14.0-RELEASE
FreeBSD mowa219-gjp4-freebsd-14-zfs-vm 14.0-RELEASE FreeBSD 14.0-RELEASE releng/14.0-n265380-f9716eee8ab GENERIC amd64 1400097 1400097
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # pkg upgrade -r FreeBSD-base
Updating FreeBSD-base repository catalogue...
FreeBSD-base repository is up to date.
All repositories are up to date.
Checking for upgrades (0 candidates): 100%
Processing candidates (0 candidates): 100%
Checking integrity... done (0 conflicting)
Your packages are up to date.
root@mowa219-gjp4-freebsd-14-zfs-vm:~ # exit
logout
grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ % pkg -vv | grep -e url -e enabled
url : "pkg+https://pkg.freebsd.org/FreeBSD:14:amd64/latest",
enabled : yes,
url : "pkg+https://pkg.freebsd.org/FreeBSD:14:amd64/base_release_0",
enabled : yes,
grahamperrin@mowa219-gjp4-freebsd-14-zfs-vm:~ %
Whilst 14.0 is not yet announced, I can't imagine a change to the value at this time.
Time to upstream that and add Linux support, with how broken this is.
I am curious if it is broken in the same way on FreeBSD. Has anyone managed to reproduce this with intentional reflink copies? From what I recall the implementer's main platform was FreeBSD.
Also some of these stack traces are blown assertions in the ZIL. Are we certain all of these are in regard to the BRT feature? I know a lot of the more recent optimizations with regard to ZIL messed with lock granularity, do we know that these things haven't happened with BRT not being leveraged?
Here is probably not the right place to debate whether the other bugs around the ZIL are or are not from BRT. At least in 15529, I only tagged the ones where it was explicitly the case that block cloning was happening.
I'm also not sure what you mean by "intentional reflink support" here - in both FreeBSD and Linux's case, they're calling a copy_file_range analogue that is going to reflink if it can and do a boring copy otherwise, nobody here is explicitly trying FICLONE, which has last I checked no analogue on FreeBSD at all.
The cp
from coreutils-9.3 is using FICLONE
, and falling back to copy_file_range
if it fails.
By intentional I mean doing cp with a reflink argument, assuming the BSD variant of cp has it. I'd expect that by default it doesn't try to use it automatically like coreutils' --reflink=auto behavior. When something like make is copying things around with quickly generated build artifacts, I expect you're dealing with code that is likely to contend race conditions more than a file that's been stable on disk for a few seconds and is being copied once with the BRT backed clone feature.
My understanding is that cp on FreeBSD just invokes copy_file_range in every case, and as I said, has no FICLONE analogue.
Ah gotcha, so the benefits of the reduced syscall are there without requiring repeated userspace / kernelspace round trips, but the BRT based shortcut and space savings are prohibited behind that sysctl. Makes sense. I am curious if FICLONE is the issue here and copy_file_range worked just fine?
Since the code, if I'm reading it right, for FICLONE just invokes the same backend function, it shouldn't be FICLONE specifically. I would guess differing semantics about how easy races are to trigger between the two platforms' memory management, but I really don't know.
Pinging @pjd on block cloning.
I'm seeing if I can make a non-gentoo reproducer.
Since I have 1.42GB of possibly affected files I thought I'd check for stretches of zeroes in the other files to see if anything other than compiling go might have triggered this. The block cloning commit only seem to change zdb code regarding the ZIL, so I just want to verify that we have no way of dumping the BRT to check which files contain cloned blocks, right?
Currently dumping a list of all file blocks to check for multiple references to the same block.
I don't think zdb knows how to dump it as it is now, no. I don't think it'd be hard to teach it, though.
Its not hard to dump, but also it doesn't really show what you want. Each entry is just the offset within a vdev (half a DVA), and the number of references to it. It doesn't know what a file is, nor even really what a block pointer is.
Probably the dumbest version is to extend the in-memory mini-BRT used in zdb -b
to note the object in which cloned blocked were seen, and then dump those out at the end. It does mean a full scan though (about the same heft as a scrub).
Hmm, now I think about it, it might not help anyway. If the problem is that there was a race, and we applied a change in the wrong order, such that zeroes got written, then the clone itself was never performed - we did a real write, just with zero content. So there's no clone to look for.
@robn The path can be shown in the output with a list of all the DVA:s though (-ddddd), and then it's just to work backwards to show all files for them. Having a dump of the BRT would've helped in the process. The idea was that whatever has produced those files are using a syscall that is susceptible to this bug.
I've checked most files with cloned blocks now though, nothing incorrect. The only ones remaining to verify are all copies made by steam, but since it's windows .dlls and the like chunks of zeroes are quite common, so I'll have to verify them some other way.
The next idea is to check all files modified since enabling block cloning and compare them with earlier versions in snapshots to see if chunks of zeroes have appeared in files that used to have none.
Unfortunately, I've been unable to reproduce this by rebuilding go
on Fedora 39 (kernel 6.5.11-300) with coreutil-9.3 and OpenZFS 2.2. Observing the builds BRT is definitely being exercised, but I haven't been able to trigger this in over 50 builds. @terinjokes @thulle could you share some more details about the exact procedure your using to trigger this.
I don't think I'm doing anything unusual. I'm building go with Gentoo Portage and can reliably reproduce with:
emerge -C go
(to remove the current version of Go, not necessarily if the current package works).emerge -1av go
, which fetches go-bootstrap-1.18
and uses it to build Go 1.21.4.I was also able to verify that navigating to the ebuild and executing ebuild go-1.21.4.ebuild clean install
in place of step 2 above results in corrupted files in the image DESTDIR directory under /var/tmp/portage
, so these are being corrupted as part of the build, and not part of the final copy into the system directories.
I'm using the pre-built "gentoo-kernel-bin" kernel on both of the machines I can reproduce this with.
I wonder if portage's sandboxing for the build process does anything special / weird with hardlinks that could cause this.
It's important to note that it's sometimes very hard to convince ZFS to enable the BRT feature on the pool, and it'll just fail to reflink things then. So I'd check that the feature is actually marked active before trying to reproduce this.
(Specifically I mean that a lot of the time if the pool is even slightly busy it'll return EAGAIN trying to go from enabled to active initially. Or sometimes even if it's already active it'll think the file is busy and return EAGAIN and fall back to a normal copy if you used copy_file_range
. I tried making a reproducer out of the existing SEEK_HOLE test cases but gave up after about an hour because it was so inconsistent at enabling the feature and kept mysteriously erroring out trying to make clones.)
I wonder if portage's sandboxing for the build process does anything special / weird with hardlinks that could cause this.
I don't think so (although I wouldn't say it's impossible) -- someone can try FEATURES="-sandbox -usersandbox"
to confirm. Further, at least that wasn't the case in the other bug. But that other bug was extremely timing sensitive for me and I could only ever hit it on one machine, and not many other Gentoo ones.
I discussed with @thesamesam on IRC, and rebuilt go with FEATURES="-usersandbox -sandbox"
to disable sandboxing, and I still have the same corruption.
I'm trying to think of anything else portage does that might be different from a usual compilation outside of portage, but not a whole lot comes to mind. It could just be that you are reliably hitting a data race with reliably bad timing.
Portage:
@thesamesam I'm sure this is leaving a lot out, these are things just off the top of my head that it does.
I'm not doing anything in particular either, only difference from terinjokes is that instead of removing go and rebuilding I've just emerged an older binary package to recover.
I'm not using the prebuild gentoo-kernel, but my own build of gentoo-sources-6.5.9 currently.
Setting FEATURES="-sandbox -usersandbox"
and xattr=off
doesn't improve things.
Setting sync=always
seems to change something, using file usr/lib/go/pkg/tool/linux_amd64/* | cut -c 1-72
to verify I've now successfully built twice.
edit: 4 successful builds, 2 while having enabled sandbox
and usersandbox
again. But as soon as I set xattr=sa
again I got a failure. Doing more builds to check if this keeps true.
That makes sense.
IIRC Go builds do a lot of cping things around, which is why this was so good at triggering issues, from looking into why this came up in #11900.
I made a new fs with sync=always
and mounted it as the Portage TMPDIR, and while it improved the situation (likely by changing the timing), I was still able to reproduce this issue.
/var/tmp/portage/dev-lang/go-1.21.4/image # file usr/lib/go/pkg/tool/linux_amd64/vet
usr/lib/go/pkg/tool/linux_amd64/vet: data
/var/tmp/portage/dev-lang/go-1.21.4/image # hexdump -C usr/lib/go/pkg/tool/linux_amd64/vet
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000fa0 00 00 00 00 00 00 00 00 00 00 00 00 69 74 6a 6e |............itjn|
00000fb0 6f 69 72 46 39 78 39 66 6b 4c 73 59 4d 77 47 50 |oirF9x9fkLsYMwGP|
00000fc0 2f 59 69 50 4c 58 72 47 65 55 53 57 56 65 4c 76 |/YiPLXrGeUSWVeLv|
00000fd0 4d 4c 6f 74 36 2f 44 53 30 46 37 55 4d 66 5f 6c |MLot6/DS0F7UMf_l|
00000fe0 45 66 72 32 68 6e 6b 62 71 32 2f 47 57 4e 70 78 |Efr2hnkbq2/GWNpx|
00000ff0 5f 68 2d 43 78 38 69 6d 63 34 76 6b 37 74 62 00 |_h-Cx8imc4vk7tb.|
00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
003954a0 69 74 6a 6e 6f 69 72 46 39 78 39 66 6b 4c 73 59 |itjnoirF9x9fkLsY|
003954b0 4d 77 47 50 2f 59 69 50 4c 58 72 47 65 55 53 57 |MwGP/YiPLXrGeUSW|
003954c0 56 65 4c 76 4d 4c 6f 74 36 2f 44 53 30 46 37 55 |VeLvMLot6/DS0F7U|
003954d0 4d 66 5f 6c 45 66 72 32 68 6e 6b 62 71 32 2f 47 |Mf_lEfr2hnkbq2/G|
003954e0 57 4e 70 78 5f 68 2d 43 78 38 69 6d 63 34 76 6b |WNpx_h-Cx8imc4vk|
003954f0 37 74 62 00 00 00 00 00 00 00 00 00 00 00 00 00 |7tb.............|
00395500 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00595460 00 00 00 00 00 00 00 |.......|
00595467
Edit to add:
/var/tmp/portage/dev-lang/go-1.21.4/image # findmnt /var/tmp/portage/
TARGET SOURCE FSTYPE OPTIONS
/var/tmp/portage zroot/local/portage zfs rw,relatime,xattr,posixacl,casesensitive
/var/tmp/portage/dev-lang/go-1.21.4/image # zfs get sync zroot/local/portage
NAME PROPERTY VALUE SOURCE
zroot/local/portage sync always local
A delay between writing files to disk to copy remove the bug
the following bash for loops mimics the portage behaviour with and without a delay between phases. Log files are written with the listing of the broken files we can desume that:
1) It happend only when file are copied from the build directory to the image directory ---2) a mere 10 seconds delay is enough to remove the bug entirely--- < false, even 30 seconds do not make a difference 2.1) I can tell you that when using emerge not this mimic the broken files are more so this thing is very time sensitive
Storage is NVME
> pippo
for i in clean compile install merge ; do
ebuild /var/db/repos/gentoo/dev-lang/go/go-1.21.4.ebuild $i
file /var/tmp/portage/dev-lang/go-1.21.4/{work,image/usr/lib}/go/pkg/tool/linux_amd64/* \
| grep -e '\bdata\b' >> pippo
done
cat pippo
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/cover: data
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/vet: data
> pluto
for i in clean compile install merge ; do
ebuild /var/db/repos/gentoo/dev-lang/go/go-1.21.4.ebuild $i
file /var/tmp/portage/dev-lang/go-1.21.4/{work,image/usr/lib}/go/pkg/tool/linux_amd64/* \
| grep -e '\bdata\b' >> pluto
sleep 10
done
wc pluto
0 0 0 pluto
txg sync timeout default is 5 seconds, so 10 seconds making it not reproduce tracks.
That's helpful thanks. Since you're able to reproduce it could you try setting zfs_dmu_offset_next_sync=0
and see if it reproduces. This will effectively disable hole reporting while will hopefully help us narrow this down.
Did a loop of 5 builds each,
with sync=always
& xattr=off
all 5 are ok. (on top of the 4 successful builds in the edit above)
with sync=always
& xattr=sa
3 builds failed. (on top of the 1 failed build in the edit above)
txg sync timeout default is 5 seconds, so 10 seconds making it not reproduce tracks.
mmh, I've lowered txg to 1s and tried again, lowering also the sleep to 2s. confusing results at best
cat /sys/module/zfs/parameters/zfs_txg_timeout
1
> pippo
for i in clean compile install ; do
ebuild /var/db/repos/gentoo/dev-lang/go/go-1.21.4.ebuild $i
file /var/tmp/portage/dev-lang/go-1.21.4/{work,image/usr/lib}/go/pkg/tool/linux_amd64/* \
| grep -e '\bdata\b' >> pippo
done
cat pippo
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/compile: data
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/cover: data
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/vet: data
> pluto
for i in clean compile install ; do
ebuild /var/db/repos/gentoo/dev-lang/go/go-1.21.4.ebuild $i
file /var/tmp/portage/dev-lang/go-1.21.4/{work,image/usr/lib}/go/pkg/tool/linux_amd64/* \
| grep -e '\bdata\b' >> pluto
sleep 2
done
cat pluto
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/asm: data
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/cgo: data
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/compile: data
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/cover: data
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/link: data
/var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/vet: data
That's helpful thanks. Since you're able to reproduce it could you try setting
zfs_dmu_offset_next_sync=0
and see if it reproduces. This will effectively disable hole reporting while will hopefully help us narrow this down.
sorry; same loops as before, pippo has 0 delay pluto has 10 seconds delay
cat /sys/module/zfs/parameters/zfs_txg_timeout 5 cat /sys/module/zfs/parameters/zfs_dmu_offset_next_sync 0 cat pippo /var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/cover: data cat pluto /var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/cover: data /var/tmp/portage/dev-lang/go-1.21.4/image/usr/lib/go/pkg/tool/linux_amd64/vet: data
I've got a Gentoo VM built that reproduces this readily building Go, so,
And Brian, that tunable won't save us here, I think, because the bug is, I believe, that it's marking the buffer as no longer dirty, so in this block:
rw_enter(&dn->dn_struct_rwlock, RW_READER);
if (dnode_is_dirty(dn)) {
/*
* If the zfs_dmu_offset_next_sync module option is enabled
* then hole reporting has been requested. Dirty dnodes
* must be synced to disk to accurately report holes.
*
* Provided a RL_READER rangelock spanning 0-UINT64_MAX is
* held by the caller only a single restart will be required.
* We tolerate callers which do not hold the rangelock by
* returning EBUSY and not reporting holes after one restart.
*/
if (zfs_dmu_offset_next_sync) {
rw_exit(&dn->dn_struct_rwlock);
dnode_rele(dn, FTAG);
if (restarted)
return (SET_ERROR(EBUSY));
txg_wait_synced(dmu_objset_pool(os), 0);
restarted = 1;
goto restart;
}
err = SET_ERROR(EBUSY);
} else {
err = dnode_next_offset(dn, DNODE_FIND_HAVELOCK |
(hole ? DNODE_FIND_HOLE : 0), off, 1, 1, 0);
}
rw_exit(&dn->dn_struct_rwlock);
I believe the first check returns 0, and the tunable is academic.
I think that adding a bit for "was cloned" to the object and checking that here would do the right thing, but I don't know if that's the only problem, and I'll test it.
I can't reproduce it any more easily with https://github.com/rincebrain/zfs/commit/3f9688eb36023f6f69b98ffbc30267ba24d33ad8 so have fun testing that, if you like, I make absolutely no promises it won't burn your house down and salt the earth underneath.
Nice, that helps to validate the theory. I'll try to tighten it up a little and throw it at the whole test suite over the weekend.
@thulle Having a dump of the BRT would've helped in the process.
FWIW, I just posted #15541, adding an option to zdb
to dump the BRT.
System information
Describe the problem you're observing
When installing the Go compiler with Portage, many of the internal compiler commands have been corrupted by having most of the files replaced by zeros.
I'm able to reproduce on two separate machines running 6.5.11 and ZFS 2.2.0.
ZFS does not see any errors with the pool.
Describe how to reproduce the problem
emerge -1 dev-lang/go
, where Portage's TMPDIR is on ZFS./usr/lib/go/pkg/tool/linux_amd64/compile
are corrupted.I was able to reproduce with and without Portage's "native-extensions" feature. I was unable to reproduce after changing Portage's TMPDIR to another filesystem (such as tmpfs).
Include any warning/errors/backtraces from the system logs