Closed SimonSapin closed 4 years ago
What are the use cases for changing the alignment? One I can think of is if you have an allocation for, say an array of u32s, that you got from somewhere, and you want to ensure they are aligned to e.g. 16 bytes to do SIMD stuff to them. But I think most algorithms dealing with things like this just deal with the unaligned parts separately, instead of reallocating.
IIRC this lack of known use case might have contributed to the choice of not supporting this in GlobalAlloc
.
So we stick to the current interface for now?
Alloc
:
unsafe fn realloc(
&mut self,
ptr: NonNull<u8>,
layout: Layout,
new_size: usize
) -> Result<NonNull<u8>, AllocErr>
GlobalAlloc
:
unsafe fn realloc(
&self,
ptr: *mut u8,
layout: Layout,
new_size: usize
) -> *mut u8
@glandium If you have an allocation for storing T
s, and you are done with it, but want to reuse it, e.g., for storing U
s, then you might want to reallocate to fit your U
s, which might require an allocation that's bigger or smaller, and has a higher or a lower alignment.
So a general purpose reallocation function might be useful.
Some allocators do support this, for example, jemalloc rallocx
takes a pointer, the old size, the new size, and a flags parameter that allows specifying the alignment required for the resulting allocation: https://github.com/jemalloc/jemalloc/blob/dev/test/integration/rallocx.c#L150
Some allocators do not support this, e.g., those using a standard C API, but I don't think that using two Layouts complicates the implementation for these significantly. Right now, they can't just always call C's realloc(ptr, size)
because that does not necessarily preserve the alignment (e.g. if ptr
was allocated with posix_memalign
to a higher alignment), so they already need to have quite a bit of logic to check the old layout alignment, and figure out what to do. Sure, adding more logic for comparing that with the new layouts alignment adds another branch in the hot path, but they are already branchy:
So its basically just another check to decide whether to go to the fallback or not.
If we decide to lift the alignment restrictions, wouldn't it make sense to also lift the requirements for grow_in_place
and shrink_in_place
?
adding more logic for comparing that with the new layouts alignment adds another branch in the hot path
Branches in the hot path a never good IMO. It'd be possible to add methods for it though so we end up with these signatures (all fn
s are unsafe
and return a reasonable Result<_, _>):
fn realloc(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn realloc_aligned(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
fn grow_in_place(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn grow_in_place_aligned(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
fn shrink_in_place(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn shrink_in_place_aligned(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
or
fn realloc_unaligned(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn realloc(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
fn grow_in_place_unaligned(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn grow_in_place(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
fn shrink_in_place_unaligned(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn shrink_in_place(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
The former has the better names, the latter is analogue to GlobalAlloc
.
An example implementation for the former case:
#[inline]
fn is_aligned(ptr: NonNull<u8>, align: usize) -> bool {
ptr.as_ptr() as usize % align == 0
}
#[inline]
unsafe fn naive_realloc<A: ?Sized + Alloc>(
alloc: &mut A,
ptr: NonNull<u8>,
old_layout: Layout,
new_layout: Layout,
) -> Result<NonNull<u8>, AllocErr> {
let new_ptr = alloc.alloc(new_layout)?;
ptr::copy_nonoverlapping(
ptr.as_ptr(),
new_ptr.as_ptr(),
cmp::min(old_layout.size(), new_layout.size()),
);
alloc.dealloc(ptr, old_layout);
Ok(new_ptr)
}
#[inline]
unsafe fn realloc_in_place_aligned<A: ?Sized + Alloc>(
alloc: &mut A,
ptr: NonNull<u8>,
layout: Layout,
new_size: usize,
) -> Result<(), CannotReallocInPlace> {
let old_size = layout.size();
if new_size == old_size {
Ok(())
} else if new_size > old_size {
alloc.grow_in_place_aligned(ptr, layout, new_size)
} else {
alloc.shrink_in_place_aligned(ptr, layout, new_size)
}
}
pub unsafe trait Alloc {
// ...
unsafe fn realloc(
&mut self,
ptr: NonNull<u8>,
old_layout: Layout,
new_layout: Layout,
) -> Result<NonNull<u8>, AllocErr> {
if is_aligned(ptr, new_layout.align())
&& realloc_in_place_aligned(self, ptr, old_layout, new_layout.size()).is_ok()
{
return Ok(ptr);
}
naive_realloc(self, ptr, old_layout, new_layout)
}
unsafe fn realloc_aligned(
&mut self,
ptr: NonNull<u8>,
layout: Layout,
new_size: usize,
) -> Result<NonNull<u8>, AllocErr> {
if realloc_in_place_aligned(self, ptr, layout, new_size).is_ok() {
return Ok(ptr);
}
naive_realloc(
self,
ptr,
layout,
Layout::from_size_align_unchecked(new_size, layout.align()),
)
}
#[inline]
unsafe fn grow_in_place(
&mut self,
ptr: NonNull<u8>,
old_layout: Layout,
new_layout: Layout,
) -> Result<(), CannotReallocInPlace> {
debug_assert!(new_layout.size() >= old_layout.size());
if is_aligned(ptr, new_layout.align()) {
self.grow_in_place_aligned(ptr, old_layout, new_layout.size())
} else {
Err(CannotReallocInPlace)
}
}
unsafe fn grow_in_place_aligned(
&mut self,
ptr: NonNull<u8>,
layout: Layout,
new_size: usize,
) -> Result<(), CannotReallocInPlace> {
// same as old `grow_in_place`
}
#[inline]
unsafe fn shrink_in_place(
&mut self,
ptr: NonNull<u8>,
old_layout: Layout,
new_layout: Layout,
) -> Result<(), CannotReallocInPlace> {
debug_assert!(new_layout.size() <= old_layout.size());
if is_aligned(ptr, new_layout.align()) {
self.shrink_in_place_aligned(ptr, old_layout, new_layout.size())
} else {
Err(CannotReallocInPlace)
}
}
unsafe fn shrink_in_place_aligned(
&mut self,
ptr: NonNull<u8>,
layout: Layout,
new_size: usize,
) -> Result<(), CannotReallocInPlace> {
// same as old `shrink_in_place`
}
}
Branches in the hot path a never good IMO.
Looking at the code, I think it would still be one branch, but the branch condition would be slightly more complicated, e.g., it would contain an extra && old_layout.align() == new_layout.align()
.
I don't know what the perf cost of that would be, and without benchmarking that it might be hard to tell (both layouts could be inlined), but I can't imagine that the perf cost is worth the API cost of adding 3 trait methods to work around that, particularly without having more concrete use cases of code that actually needs to do this, and that we could benchmark to make an informed decision.
This proposal is around for quite a while now. Do you think the extra branch (or the more complicated check) is worth?
Realloc is ultimately a shorthand and you can live without it.
If it doesn't support an alignment change that's fine, and a particular allocator can choose to have a non-trait method to do that if it has such an ability.
I have to come back to this again. While implementing an AffixAlloc
, which reserves storage before and after the allocation, I must implement grow
and shrink
, otherwise the suffix is not copied to the new memory. Changing the alignment here is complicated as user, as you cannot simply alloc-copy-dealloc, but have to copy the suffix, store it in a temporary variable, grow/shrink the buffer, and if the grow/shrink succeeded, copy the content back to the new suffix.
With this in mind (and an affix-allocator is rather simple) we probably should allow changing the alignment.
I don't understand. If you already have a suffix, doesn't that mean that the original block was already suitably aligned? Why would you need to change the alignment while growing/shrinking?
Yes, the block is properly aligned after alloc
. Actually, the prefix and the suffix are aligned to the same alignment as well. For growing and shrinking for most allocators you can just use the classic realloc fallback (alloc-copy-dealloc), but in this case, this is not possible because copying will also copy the suffix, but the suffix has to be copied to the end of the new block. This issue was closed because of the assumption, that grow/shrink is just a shorthand, but that's not true for all allocators (if there is one, there will be many). I reopened this, so we can at least take this point into account before eventually closing this again.
Since this issue was opened the AllocRef
trait was changed quite a lot. I propose to support reallocating to a different alignment like described in the OP (adjusted to the current AllocRef
trait).
The issue was closed once because of the assumption, that realloc
is ultimatively a shorthand, which was proved to not be true in all cases (an affix-allocator requires to copy the suffix). Additionally, there are allocators around (jemalloc), which have a function which is able to change the alignment (rallocx
), which might be more performant than alloc-copy-dealloc. Please also see https://github.com/rust-lang/wg-allocators/issues/5#issuecomment-489330695.
There are two usecases listed in this issue:
https://github.com/rust-lang/wg-allocators/issues/5#issuecomment-489008968:
One I can think of is if you have an allocation for, say an array of u32s, that you got from somewhere, and you want to ensure they are aligned to e.g. 16 bytes to do SIMD stuff to them. But I think most algorithms dealing with things like this just deal with the unaligned parts separately, instead of reallocating.
https://github.com/rust-lang/wg-allocators/issues/5#issuecomment-489330695:
If you have an allocation for storing
T
s, and you are done with it, but want to reuse it, e.g., for storingU
s, then you might want to reallocate to fit yourU
s, which might require an allocation that's bigger or smaller, and has a higher or a lower alignment.
Especially the latter use case seems reasonable enough to support reallocating to a different alignment, as reallocation may be faster than alloc-copy-dealloc.
With this proposal and #69 combined, we wouldn't have an issue, that two layouts have to be passed to grow
or shrink
, which might would be confusing.
Please react to this post with 👍 or 👎 . Please note, that 👎 without any comment isn't very useful at all 😉
cc @Amanieu @glandium @gnzlbg@Lokathor @scottjmaddox @SimonSapin @Wodann
In most cases, I expect the implementation to be something like:
unsafe fn grow(
&mut self,
memory: NonNull<[u8]>,
old_align: usize,
new_layout: Layout
) -> Result<NonNull<[u8]>, AllocErr> {
if align == layout.align() {
MyAlloc::realloc(memory, new_layout);
} else {
MyAlloc::realloc_align(memory, old_align, new_layout);
}
}
As new_layout
and old_align
are probably going to be compile-time constants most of the time, I don't expect any performance impact.
If that assumption is correct, I agree that this is a more generic API that can cover the range of use cases better.
As
new_layout
andold_align
are probably going to be compile-time constants most of the time, I don't expect any performance impact.
There will be a very small compile-time regression, but runtime-wise it should be exactly the same (it get's optimized out, similar to https://github.com/rust-lang/rust/issues/75264, the static case show the optimization very well)
unsafe fn grow(&mut self, memory: NonNull<[u8]>, old_align: usize, new_layout: Layout)
This is excactly the signature I like to come up with 👍
Edit: Regarding the compile-time regression: We already had similar compile-time regressions with #44, PR was https://github.com/rust-lang/rust/pull/70362
The
GlobalAlloc
trait is stable with this method:(And similarly the
std::alloc::realloc
function.)Note
new_size: usize
instead ofnew_layout: Layout
. This is because this method does not support changing the alignment.Do we want to lift this restriction in the
Alloc
trait?The two parameters of
realloc
would beold_size: usize, new_layout: Layout
. For allocators that do not support this, this would shift the responsibility to trait impls to have a fallback based onalloc
+copy_nonoverlapping
+dealloc
.