The key insight from out of #16631 is that blocks data must not cross physical page boundaries. So, this updates the alignment check function, then changes the fallback allocation method to ensure page alignment.
I'm not very happy with abd_alloc_scatter(). It feels like a big call to just create it and assert that its always page-aligned. But it's probaby right, and the ABD interface as a whole is kinda of a mess these days, and there really isn't an easier allocator to reach for that will do page alignment (even using alloc_pages() needs a bunch of ceremony that abd_alloc_chunks() already does. So it doesn't seem too bad.
Update: @amotin educated me on why the existing abd_alloc_for_io() was actually what I wanted. So this has been changed for that, and abd_alloc_scatter() no longer included.
I did notice through this that we don't honour the "minimum IO" device limit at all. Most things don't care, but as usual, dm-crypt does, because the block size is part of the crypto. So creating an ashift=9 pool over a 4K dm-crypt device just throws IO errors. I think it always did though (I can't see how it couldn't have), so I'm not going to worry too much for now. It does show that we need to work to minimum limits, which will mean eg not doing <4K accesses, but that seems like it's gonna be work all over the stack (labels, gang headers, etc)... some other time!
Update: Turns out this was a long-standing bug with ZFS, in that it should never be possible to use an ashift smaller than LBS. So now we do check for this case, and #16690 will make it impossible to get into that situation.
How Has This Been Tested?
The test case in #16631 doesn't trip the bug anymore. If I set zfs_vdev_disk_classic=1 it blows up pretty quick. My test cases from #15588 (forcing ganging) continue to work correctly.
Full ZTS run in progress, but it's all looking pretty good so far.
Update: ZTS ran to completion.
Update: I've now run the #16631 test case with various combinations of --sector-size= to cryptsetup luksCreate and -o ashift= to zpool create (always with ashift larger than sector size). Nothing tripped.
Types of changes
[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Performance enhancement (non-breaking change which improves efficiency)
[ ] Code cleanup (non-breaking change which makes code smaller or more readable)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
[ ] Documentation (a change to man pages or other documentation)
Last push removed the TODO comment, added a check to ensure that the data len within the page is a multiple of LBS, and adjusted the page_alignment test to match.
[Sponsors: Klara, Inc., Wasabi Technology, Inc.]
Motivation and Context
Ongoing IO alignment issues on Linux with dm-crypt.
Closes: #16631, #15646, #15533, #14533. Maybe also: #10094
Description
The key insight from out of #16631 is that blocks data must not cross physical page boundaries. So, this updates the alignment check function, then changes the fallback allocation method to ensure page alignment.
Update: @amotin educated me on why the existing
abd_alloc_for_io()
was actually what I wanted. So this has been changed for that, andabd_alloc_scatter()
no longer included.Update: Turns out this was a long-standing bug with ZFS, in that it should never be possible to use an ashift smaller than LBS. So now we do check for this case, and #16690 will make it impossible to get into that situation.
How Has This Been Tested?
The test case in #16631 doesn't trip the bug anymore. If I set
zfs_vdev_disk_classic=1
it blows up pretty quick. My test cases from #15588 (forcing ganging) continue to work correctly.Update: ZTS ran to completion.
Update: I've now run the #16631 test case with various combinations of
--sector-size=
tocryptsetup luksCreate
and-o ashift=
tozpool create
(always with ashift larger than sector size). Nothing tripped.Types of changes
Checklist:
Signed-off-by
.