Support reallocating to a different alignment?

SimonSapin commented 5 years ago

The GlobalAlloc trait is stable with this method:

unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8

(And similarly the std::alloc::realloc function.)

Note new_size: usize instead of new_layout: Layout. This is because this method does not support changing the alignment.

Do we want to lift this restriction in the Alloc trait?

The two parameters of realloc would be old_size: usize, new_layout: Layout. For allocators that do not support this, this would shift the responsibility to trait impls to have a fallback based on alloc + copy_nonoverlapping + dealloc.

glandium commented 5 years ago

What are the use cases for changing the alignment? One I can think of is if you have an allocation for, say an array of u32s, that you got from somewhere, and you want to ensure they are aligned to e.g. 16 bytes to do SIMD stuff to them. But I think most algorithms dealing with things like this just deal with the unaligned parts separately, instead of reallocating.

SimonSapin commented 5 years ago

IIRC this lack of known use case might have contributed to the choice of not supporting this in GlobalAlloc.

TimDiekmann commented 5 years ago

So we stick to the current interface for now?

Alloc:

unsafe fn realloc(
    &mut self,
    ptr: NonNull<u8>,
    layout: Layout,
    new_size: usize
) -> Result<NonNull<u8>, AllocErr>

GlobalAlloc:

unsafe fn realloc(
    &self,
    ptr: *mut u8,
    layout: Layout,
    new_size: usize
) -> *mut u8

gnzlbg commented 5 years ago

@glandium If you have an allocation for storing Ts, and you are done with it, but want to reuse it, e.g., for storing Us, then you might want to reallocate to fit your Us, which might require an allocation that's bigger or smaller, and has a higher or a lower alignment.

So a general purpose reallocation function might be useful.

Some allocators do support this, for example, jemalloc rallocx takes a pointer, the old size, the new size, and a flags parameter that allows specifying the alignment required for the resulting allocation: https://github.com/jemalloc/jemalloc/blob/dev/test/integration/rallocx.c#L150

Some allocators do not support this, e.g., those using a standard C API, but I don't think that using two Layouts complicates the implementation for these significantly. Right now, they can't just always call C's realloc(ptr, size) because that does not necessarily preserve the alignment (e.g. if ptr was allocated with posix_memalign to a higher alignment), so they already need to have quite a bit of logic to check the old layout alignment, and figure out what to do. Sure, adding more logic for comparing that with the new layouts alignment adds another branch in the hot path, but they are already branchy:

https://github.com/rust-lang/rust/blob/9ebf47851a357faa4cd97f4b1dc7835f6376e639/src/libstd/sys/unix/alloc.rs#L41

https://github.com/rust-lang/rust/blob/9ebf47851a357faa4cd97f4b1dc7835f6376e639/src/libstd/sys_common/alloc.rs#L24

So its basically just another check to decide whether to go to the fallback or not.

TimDiekmann commented 5 years ago

If we decide to lift the alignment restrictions, wouldn't it make sense to also lift the requirements for grow_in_place and shrink_in_place?

adding more logic for comparing that with the new layouts alignment adds another branch in the hot path

Branches in the hot path a never good IMO. It'd be possible to add methods for it though so we end up with these signatures (all fns are unsafe and return a reasonable Result<_, _>):

fn realloc(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn realloc_aligned(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
fn grow_in_place(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn grow_in_place_aligned(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
fn shrink_in_place(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn shrink_in_place_aligned(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);

or

fn realloc_unaligned(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn realloc(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
fn grow_in_place_unaligned(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn grow_in_place(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);
fn shrink_in_place_unaligned(&mut self, ptr: NonNull<u8>, old_layout: Layout, new_layout: Layout);
fn shrink_in_place(&mut self, ptr: NonNull<u8>, layout: Layout, new_size: usize);

The former has the better names, the latter is analogue to GlobalAlloc.

An example implementation for the former case:


#[inline]
fn is_aligned(ptr: NonNull<u8>, align: usize) -> bool {
    ptr.as_ptr() as usize % align == 0
}

#[inline]
unsafe fn naive_realloc<A: ?Sized + Alloc>(
    alloc: &mut A,
    ptr: NonNull<u8>,
    old_layout: Layout,
    new_layout: Layout,
) -> Result<NonNull<u8>, AllocErr> {
    let new_ptr = alloc.alloc(new_layout)?;
    ptr::copy_nonoverlapping(
        ptr.as_ptr(),
        new_ptr.as_ptr(),
        cmp::min(old_layout.size(), new_layout.size()),
    );
    alloc.dealloc(ptr, old_layout);
    Ok(new_ptr)
}

#[inline]
unsafe fn realloc_in_place_aligned<A: ?Sized + Alloc>(
    alloc: &mut A,
    ptr: NonNull<u8>,
    layout: Layout,
    new_size: usize,
) -> Result<(), CannotReallocInPlace> {
    let old_size = layout.size();

    if new_size == old_size {
        Ok(())
    } else if new_size > old_size {
        alloc.grow_in_place_aligned(ptr, layout, new_size)
    } else {
        alloc.shrink_in_place_aligned(ptr, layout, new_size)
    }
}

pub unsafe trait Alloc {
    // ...
    unsafe fn realloc(
        &mut self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_layout: Layout,
    ) -> Result<NonNull<u8>, AllocErr> {
        if is_aligned(ptr, new_layout.align())
            && realloc_in_place_aligned(self, ptr, old_layout, new_layout.size()).is_ok()
        {
            return Ok(ptr);
        }
        naive_realloc(self, ptr, old_layout, new_layout)
    }

    unsafe fn realloc_aligned(
        &mut self,
        ptr: NonNull<u8>,
        layout: Layout,
        new_size: usize,
    ) -> Result<NonNull<u8>, AllocErr> {
        if realloc_in_place_aligned(self, ptr, layout, new_size).is_ok() {
            return Ok(ptr);
        }

        naive_realloc(
            self,
            ptr,
            layout,
            Layout::from_size_align_unchecked(new_size, layout.align()),
        )
    }

    #[inline]
    unsafe fn grow_in_place(
        &mut self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_layout: Layout,
    ) -> Result<(), CannotReallocInPlace> {
        debug_assert!(new_layout.size() >= old_layout.size());
        if is_aligned(ptr, new_layout.align()) {
            self.grow_in_place_aligned(ptr, old_layout, new_layout.size())
        } else {
            Err(CannotReallocInPlace)
        }
    }

    unsafe fn grow_in_place_aligned(
        &mut self,
        ptr: NonNull<u8>,
        layout: Layout,
        new_size: usize,
    ) -> Result<(), CannotReallocInPlace> {
        // same as old `grow_in_place` 
    }

    #[inline]
    unsafe fn shrink_in_place(
        &mut self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_layout: Layout,
    ) -> Result<(), CannotReallocInPlace> {
        debug_assert!(new_layout.size() <= old_layout.size());
        if is_aligned(ptr, new_layout.align()) {
            self.shrink_in_place_aligned(ptr, old_layout, new_layout.size())
        } else {
            Err(CannotReallocInPlace)
        }
    }

    unsafe fn shrink_in_place_aligned(
        &mut self,
        ptr: NonNull<u8>,
        layout: Layout,
        new_size: usize,
    ) -> Result<(), CannotReallocInPlace> {
        // same as old `shrink_in_place` 
    }
}

gnzlbg commented 5 years ago

Branches in the hot path a never good IMO.

Looking at the code, I think it would still be one branch, but the branch condition would be slightly more complicated, e.g., it would contain an extra && old_layout.align() == new_layout.align().

I don't know what the perf cost of that would be, and without benchmarking that it might be hard to tell (both layouts could be inlined), but I can't imagine that the perf cost is worth the API cost of adding 3 trait methods to work around that, particularly without having more concrete use cases of code that actually needs to do this, and that we could benchmark to make an informed decision.

TimDiekmann commented 4 years ago

This proposal is around for quite a while now. Do you think the extra branch (or the more complicated check) is worth?

Lokathor commented 4 years ago

Realloc is ultimately a shorthand and you can live without it.

If it doesn't support an alignment change that's fine, and a particular allocator can choose to have a non-trait method to do that if it has such an ability.

TimDiekmann commented 4 years ago

I have to come back to this again. While implementing an AffixAlloc, which reserves storage before and after the allocation, I must implement grow and shrink, otherwise the suffix is not copied to the new memory. Changing the alignment here is complicated as user, as you cannot simply alloc-copy-dealloc, but have to copy the suffix, store it in a temporary variable, grow/shrink the buffer, and if the grow/shrink succeeded, copy the content back to the new suffix.

With this in mind (and an affix-allocator is rather simple) we probably should allow changing the alignment.

Amanieu commented 4 years ago

I don't understand. If you already have a suffix, doesn't that mean that the original block was already suitably aligned? Why would you need to change the alignment while growing/shrinking?

TimDiekmann commented 4 years ago

Yes, the block is properly aligned after alloc. Actually, the prefix and the suffix are aligned to the same alignment as well. For growing and shrinking for most allocators you can just use the classic realloc fallback (alloc-copy-dealloc), but in this case, this is not possible because copying will also copy the suffix, but the suffix has to be copied to the end of the new block. This issue was closed because of the assumption, that grow/shrink is just a shorthand, but that's not true for all allocators (if there is one, there will be many). I reopened this, so we can at least take this point into account before eventually closing this again.

TimDiekmann commented 4 years ago

Since this issue was opened the AllocRef trait was changed quite a lot. I propose to support reallocating to a different alignment like described in the OP (adjusted to the current AllocRef trait).

The issue was closed once because of the assumption, that realloc is ultimatively a shorthand, which was proved to not be true in all cases (an affix-allocator requires to copy the suffix). Additionally, there are allocators around (jemalloc), which have a function which is able to change the alignment (rallocx), which might be more performant than alloc-copy-dealloc. Please also see https://github.com/rust-lang/wg-allocators/issues/5#issuecomment-489330695.

There are two usecases listed in this issue:

https://github.com/rust-lang/wg-allocators/issues/5#issuecomment-489008968:

One I can think of is if you have an allocation for, say an array of u32s, that you got from somewhere, and you want to ensure they are aligned to e.g. 16 bytes to do SIMD stuff to them. But I think most algorithms dealing with things like this just deal with the unaligned parts separately, instead of reallocating.
https://github.com/rust-lang/wg-allocators/issues/5#issuecomment-489330695:

If you have an allocation for storing Ts, and you are done with it, but want to reuse it, e.g., for storing Us, then you might want to reallocate to fit your Us, which might require an allocation that's bigger or smaller, and has a higher or a lower alignment.

Especially the latter use case seems reasonable enough to support reallocating to a different alignment, as reallocation may be faster than alloc-copy-dealloc.

With this proposal and #69 combined, we wouldn't have an issue, that two layouts have to be passed to grow or shrink, which might would be confusing.

Please react to this post with 👍 or 👎 . Please note, that 👎 without any comment isn't very useful at all 😉

cc @Amanieu @glandium @gnzlbg@Lokathor @scottjmaddox @SimonSapin @Wodann

Wodann commented 4 years ago

In most cases, I expect the implementation to be something like:

unsafe fn grow(
  &mut self,
  memory: NonNull<[u8]>,
  old_align: usize,
  new_layout: Layout
) -> Result<NonNull<[u8]>, AllocErr> {
  if align == layout.align() {
    MyAlloc::realloc(memory, new_layout);
  } else {
    MyAlloc::realloc_align(memory, old_align, new_layout);
  }
}

As new_layout and old_align are probably going to be compile-time constants most of the time, I don't expect any performance impact.

If that assumption is correct, I agree that this is a more generic API that can cover the range of use cases better.

TimDiekmann commented 4 years ago

As new_layout and old_align are probably going to be compile-time constants most of the time, I don't expect any performance impact.

There will be a very small compile-time regression, but runtime-wise it should be exactly the same (it get's optimized out, similar to https://github.com/rust-lang/rust/issues/75264, the static case show the optimization very well)

unsafe fn grow(&mut self, memory: NonNull<[u8]>, old_align: usize, new_layout: Layout)

This is excactly the signature I like to come up with 👍

Edit: Regarding the compile-time regression: We already had similar compile-time regressions with #44, PR was https://github.com/rust-lang/rust/pull/70362

rust-lang / wg-allocators

Support reallocating to a different alignment? #5