Closed johnpyp closed 1 month ago
You're wrong.
See #16025 and #15615 for some more context.
@rincebrain Hmm - do you have insights on how those PRs are related to BRT? To my knowledge, both of those are trying to improve llseek vs. fixing real bclone bugs?
Unless I've misplaced the thread entirely, they still fell into the header of "yes, it's not a problem with BRT, but triggering the problem in question with BRT is substantially easier", no?
Yes, the problem is not technically BRT's code, but I still wouldn't turn it on until that's moot.
(I could have also pointed to #15139, those were just what came to mind first.)
For #15526, as I recall, coreutils-9.2
was the main factor give or take? That changed copying behavior so that llseek(SEEK_HOLE/SEEK_DATA)
vs. sync ctx races triggered more often. bclone/BRT was a red herring, I believe. To that end, both #16025 and #15615 should be unrelated to BRT and out of scope for enabling bclone again. (But as aside: llseek
still has possible data races that need fixing, even on x86.)
As for collecting bclone blockers proper, I'll nominate both #16012 and #16014. There should probably be a github Project created to collect and track the blocking open FRs/PRs/Issues.
To the stability criteria, that's a question to pose to the leadership group. I'd suggest the project needs to set quantifiable success metrics in conversation with the authors and contributors such as:
IIRC, that one was primarily the problem with zn_has_cached_data
being racey, which bites you in both the llseek
and not cases, since zpl_clone_range
depends on the same check as zfs_holey_common
(the backing function for SEEK_HOLE
/SEEK_DATA
checks).
Hey @robn @thesamesam I imagine you both have opinions on this.
Thanks both for the info, agree on the suggestion for a github project to track the ongoing blockers... I did not realize there were so many (potentially) blocking issues, and will stop turning it back on on my pools 😓
That kind of stabilization criteria sounds good too. I'd hope for point 3 that this can be timeboxed appropriately and have explicit buy-in from companies/whoever, such that it doesn't end up stretching out over long periods of time needlessly due to flaking on the testing / deprioritized the work / etc.
I maybe have an idea how I could do some stress testing: A few months ago I implemented a simple rust tool for block based deduplication using FIDEDUPERANGE. It is still more or less experimental and needs a lot of polishing but it works for me and it is pretty fast. I mostly like to use it to deduplicate vm-images on a zvol formatted with xfs, so there are many blocks being deduplicated. My hope was to continue working on it once ZFS block cloning is stable and get rid of the extra xfs-layer. I'd also like to add cross-dataset-deduplication and (if supported by ZFS) deduplication between datasets and zvols. Or maybe even deduplicating between already existing snapshots? So I cannot use FIDEDUPERANGE for this. But what else do I use? Maybe someone can give me a hint where to start. Would I need to modify and compile ZFS by myself, or is there some kind of interface that ZFS provides that I could use with my "external" tool? I'd prefer an interface similar to FIDEDUPERANGE that doublechecks that the data are really identical so that I cannot do anything stupid or maybe just get it wrong because of a race condition. Should I maybe open a separate issue for this? Thanks!
@MichaelDietzel #11065 is the existing FR for FIDEDUPERANGE, and #15393 has an prototype implementation. I suspect that once finished, it would be a small lift to wire up an alternate non-kernel entry point to support different kinds of targets. We need the functionality first though, so those tickets are probably where to focus your efforts initially.
Since I was asked directly: I personally consider block cloning good/safe enough to be enabled since 2.2.4. A lot of work has been done to stabilise it since 2.2.2 especially; I am not aware of any substantial outstanding issues directly involving block cloning; I have it enabled on the few machines I maintain; I have suggested to certain sysadmins that know and trust me that it's safe-enough to enable; and, were I still in charge of a fleet of 100+ machines (previous life), I would be enabling it there.
However, I recognise that I'm in a position that many others are not, in that I run my systems with a ton of redundancy and I have the skills to understand and fix problems when they come up, so I can afford to be somewhat blasé about. That said, I've not had to roll up my sleeves in this way on anything related to block cloning.
So I stop short of a full-throated recommendation, but if asked I do say "it's probably fine". If there was a rough show-of-hands consensus vote on whether or not to enable it for 2.2.5 and I was at the table, I would say yes.
(I have no idea if such a vote is useful or wanted, or if I would be invited; I'm just trying to get across where my gut feel is).
I agree that more specific process for deciding on stability generally would be great, and I'll support anyone who wants to push on getting buy-in and implementing it, but I won't be taking point on that kind of project this year at least - too many other plates in the air at the moment. In the absence of such a thing, I suspect something close to "rough show-of-hands consensus vote" is about the best we can hope for at the moment.
While not a bugfix, we may want to wait for https://github.com/openzfs/zfs/pull/16208 before re-enabling block cloning
There should probably be a github Project created to collect and track the blocking open FRs/PRs/Issues.
Apparently GitHub doesn't want to create new projects today so I've created a BRT label instead and applied it.
For reference, to turn on block cloning on Linux for all pools with feature@block_cloning enabled, the "tunable" is a zfs kernel module parameter: /sys/module/zfs/parameters/zfs_bclone_enabled
To clarify, are all of the fixes that make 91bd12dfeb9abe432530615eaf2295d76da24b54 true already present in an existing release?
(Apologies if this is covered somewhere else, I didn't see any issues or discussions related to it)
Describe the feature would like to see added to OpenZFS
Turn on or remove
zfs_bclone_enabled
, given that most of the large issues have seemingly been resolved.How will this feature improve OpenZFS?
It will enable block cloning for everyone, of course! :)
Additional context
So, my understanding is that
zfs_bclone_enabled
was added as a stop-gap measure to make sure that the corruption related to block cloning, what's the criteria by which ZFS can either enable this by default or remove it?Following the many recent PRs related to block cloning, it seems that all(?) of the outstanding corruption issues have been resolved (please correct me if this is wrong).
Given the severity of the issue at the time, I can understand the hesitance to re-enable it by default, so I'm mostly wondering what the threshold is. Does it need "bake time"? Does it need higher profile customers using it? Or, is it actually ready to get re-enabled now?