Ban zero-sized allocations

TimDiekmann commented 5 years ago

From https://github.com/rust-lang/rust/issues/32838#issuecomment-471610560:

For me, an important constraint is that if we ban zero-sized allocations, the Alloc methods should be able to assume that the Layout passed to them is not zero-sized.

There are multiple ways to achieve this. One would be to add another Safety clause to all Alloc methods stating that if the Layout is zero-sized the behavior is undefined.

Alternatively, we could ban zero-sized Layouts, then Alloc doesn't need to say anything about zero-sized allocations since these cannot safely happen, but doing that would have some downsides.

For example, , some types like HashMap build up the Layout from multiple Layouts, and while the final Layout might not be zero-sized, the intermediate ones might be (e.g. in HashSet). So these types would need to use "something else" (e.g. a LayoutBuilder type) to build up their final Layouts, and pay for a "non-zero-sized" check (or use an _unchecked) method when converting to Layout.

Making Layout only accept non-zero size is not possible anymore as it's stable. But banning zero-size allocation would simplify the safety clause on Alloc.

This would also unlock to move helper methods to an extension trait.

This change could easily be reverted in the future if actual use cases appear. Currently I don't see any, because we cannot rely on an implementation to allow zero-sized allocations, thus we have to check size_of::<T>() == 0 in any way.

gnzlbg commented 5 years ago

If we do this, it might make sense to change the types that Alloc takes from Layout to something else, e.g., NonZeroLayout or similar, and provide checked and unchecked conversions from/to Layout for that type.

That would basically turn Layout into the "LayoutBuilder" that I was talking about in the original post.

TimDiekmann commented 4 years ago

cc https://github.com/rust-lang/rust/issues/63291

Lokathor commented 4 years ago

I don't understand why zero sized allocations are banned if the output is a result anyway.

Can't an allocator that disallows zero sized allocation just return an err if given a zero sized layout?

petertodd commented 4 years ago

How about adding a SizedLayout, and changing the API to have alloc() and alloc_sized()? The former can then have a default implementation that returns a ptr set to the alignment if size == 0.

While most allocators have no reason to deal with ZST's, a minority will want to intercept such allocations. Equally, for most code it's must easier if the allocator API can handle ZST's and treat them like any other type. So I think a default implementation that works for 99% of use-cases will save everyone some work.

petertodd commented 4 years ago

(feel free to bikeshed the name; SizedLayout isn't quite the right term here)

Amanieu commented 4 years ago

My personal preference is to always require allocators to support zero-sized allocations, which keeps things simple. Some allocators such as bumpalo don't need any special work to support this, while others can simply cast the alignment to a pointer and return that if the size is zero.

TimDiekmann commented 4 years ago

Shall we introduce a NonZeroLayout and use this in the alloc trait(s)?

Currently, zeroed allocation is pretty error prone and even experienced users may miss this: https://github.com/rust-lang/rust/issues/63291#issuecomment-538692745. On the other hand, we would restrict implementations which are fine to get zeroed layout like a bump allocator.

	ban zero-sized allocation	close
@Amanieu		:-1:
@Ericson2314
@glandium
@gnzlbg
@Lokathor		:-1:
@scottjmaddox		:confused:
@TimDiekmann		:-1:
@Wodann

Please vote with

:+1: ban zero-sized allocation
:-1: close

Wodann commented 4 years ago

Disclaimer: I am just spitballing to get some feedback.

In general, Rust's philosophy is to provide safe, zero-cost abstractions; while also allowing unsafe behavior at the cost of some additional effort from the programmer. According to that philosophy, the basic behavior of an allocator should detect and prevent mistakes that are easily overlooked by the programmer.

Zero-sized allocations seem like an instance of the aforementioned, so it might be a good idea to create a special syntax for it.

TimDiekmann commented 4 years ago

I think this would be a good new issue if this proposal gets rejected. For methods like alloc_one and alloc_array it's possible at compile time to determine the size of the layout as long as the type is Sized. One way to get the best of both worlds, it may be possible to provide a small trait, which accepts a NonZeroLayout instead, so an allocator, which accepts a zeroed layout could implement both traits. This wouldn't necessarily to be implemented in liballoc.

Another way to solve this dilemma is to pass NonZeroLayout to AllocRef and lift the conditions on another trait.

I didn't test those approaches so far, these are only random thoughts, so take it with a grain of salt.

scottjmaddox commented 4 years ago

It's nonsensical to allocate a zero sized type. The AllocRef trait should not allow something nonsensical. It should be the lowest level cross-platform API to any allocator, including those that don't account for zero sized allocations.

It might make sense to add an extension trait with some sort of logic for handling zero sized allocations, but what should it do? Return a unique pointer for each allocation? Return a unique pointer for each distinct zero-sized layout? It's application specific.

Amanieu commented 4 years ago

Allocating a zero sized type is perfectly well defined. Each distinct memory address can contain an infinite number of ZSTs, so the allocator can just return any arbitrary address if the allocation size is zero. The only constraint is that ZSTs may have an alignment requirement (e.g. zero-length array), which is why the simplest way to handle them is to cast the alignment part of the Layout to a pointer and return that. This is a two-line change which is absolutely trivial to add.

scottjmaddox commented 4 years ago

That assumes you have nothing at those memory locations that you care about, which is a bad assumption on embedded devices. And, in my opinion, part of the goal of this working group should be to support allocation on embedded devices.

Amanieu commented 4 years ago

No, it literally doesn't matter what is at those memory locations because that memory is never accessed. Dereferencing a ZST doesn't actually access any memory.

Amanieu commented 4 years ago

Also keep in mind that it's perfectly fine for multiple ZSTs to share the same memory address.

scottjmaddox commented 4 years ago

I guess I just don't understand the semantics of ZSTs in Rust. What's the best place for me to read up on them?

Amanieu commented 4 years ago

https://doc.rust-lang.org/nomicon/vec-zsts.html seems to have a pretty good coverage of the topic. There's also https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts as a more general introduction to ZSTs.

Lokathor commented 4 years ago

There is not now, nor has there ever been, a reason to make zero-sized allocations UB. GlobalAlloc::alloc is simply a very poorly designed method, that we should not emulate the design of going forward.

scottjmaddox commented 4 years ago

Okay, thank you for the links. After reading them, I agree that the standard alloc method should accept and appropriately handle zero sized types, since there's a simple, sound approach to handling such a request.

That being said, I don't think we should require implementors to handle that case. So I propose providing a default impl for alloc that calls into a new "abstract" method alloc_nonzero_unchecked (modulo the name), which assumes a non-zero size. This would provide the same optimal performance, while exposing a clear and ZST-aware API.

Lokathor commented 4 years ago

If we did a split like that would the main alloc method then be marked safe? I don't want complexity for no reason, but if it lets us give the "normal" allocation path a safe route that'd be worth it.

Otherwise, I can't imagine it's any sort of performance penalty. If the size requested is zero you cast the alignment to a pointer and off you go. I don't think it's worth complicating the API for a single if statement. This isn't like slice indexing, you're not doing it multiple times per loop on the hot path.

TimDiekmann commented 4 years ago

@scottjmaddox Do I understand correctly you want to introduce at least two methods: alloc_nonzero and alloc_nonzero_unchecked? I'd omit the unchecked variant and make alloc_nonzero accept NonZeroLayout instead. Then it's up to the user to construct the NonZeroLayout.

@Lokathor The alloc(_zeroed) methods are only unsafe because it's undefined behavior, if a zeroed layout is passed to the methods. If we either explicitly allow to pass zeroed layouts or have a methods only accepting non-zeroed layouts, the unsafe can be removed (like I did in alloc-wg). In either case the realloc and dealloc methods remain unsafe, as the pointer must point to a location allocated by this allocator.

I think we should close this proposal and make a new one for the splitting things. With that proposal we may explicitly allow to pass zero to alloc and introduce another alloc_nonzero methods. One thing I don't like about alloc_nonzero is that we should introduce something like alloc_zeroed_nonzero, so if someone would come up with a better name... :confused:

Lokathor commented 4 years ago

alloc_nonempty alloc_zeroed_nonempty

qwerty19106 commented 4 years ago

If the size requested is zero you cast the alignment to a pointer and off you go.

In bare metal not all pointers is allowed. In particular on stm32 the 0x00000000 - 0x08000000 is reserved and can cause HardFault.

The easiest way is to return a pointer to the left heap border taking into account the alignment.

Amanieu commented 4 years ago

In bare metal not all pointers is allowed. In particular on stm32 the 0x00000000 - 0x08000000 is reserved and can cause HardFault.

That is irrelevant: ZSTs have a size of 0 bytes and therefore never access memory when dereferenced. It doesn't matter what the address is, as long as it is nonzero and properly aligned.

That being said, I don't think we should require implementors to handle that case.

I disagree. This is very easy to handle, and I don't want to make the already complex AllocRef trait even more by adding more methods.

scottjmaddox commented 4 years ago

@scottjmaddox Do I understand correctly you want to introduce at least two methods: alloc_nonzero and alloc_nonzero_unchecked? I'd omit the unchecked variant and make alloc_nonzero accept NonZeroLayout instead. Then it's up to the user to construct the NonZeroLayout.

No, I think NonZeroLayout is an unnecessary complexity. I think allocators should correctly handle zero-sized types. But, since there's one proper way to do so, we should provide a default implementation of that, and only require AllocRef implementers to implement a alloc_nonzero_unchecked (modulo the name) method, which can safely assume non-zero types. This also has the benefit of providing a method that you can directly call if you're sure the type is non-zero-sized, allowing you to avoid the extra branch condition.

scottjmaddox commented 4 years ago

If we did a split like that would the main alloc method then be marked safe? I don't want complexity for no reason, but if it lets us give the "normal" allocation path a safe route that'd be worth it.

If the only reason for it being marked unsafe is that there's UB if you call it with a zero-sized-layout (which is the only reason described in the documentation), then yes it would/should be marked safe.

Otherwise, I can't imagine it's any sort of performance penalty. If the size requested is zero you cast the alignment to a pointer and off you go. I don't think it's worth complicating the API for a single if statement. This isn't like slice indexing, you're not doing it multiple times per loop on the hot path.

Checking for zero size is an extra branch condition. For some workloads, the branch predictor might render the cost negligible, but that won't always be the case. Modern processors are... complicated...

To reiterate what I said in another thread, the Alloc/AllocRef trait should be the low-level, cross-platform interface to heap allocation. Requiring unnecessary overhead is unacceptable. If there's unnecessary overhead with no mechanism for bypassing it, then people who care about performance will have to shun the Rust core/stdlib, like they do the C++ stdlib. We should strive to provide multiple levels of API granularity in order to avoid that outcome.

Edit: And to be clear: allocation is often on the hot path, even when performance-conscious developers strive to avoid it being so. So an extra branch condition can easily cause a significant performance penalty.

scottjmaddox commented 4 years ago

That being said, I don't think we should require implementors to handle that case.

I disagree. This is very easy to handle, and I don't want to make the already complex AllocRef trait even more by adding more methods.

It's easy to handle if you understand Rust's ZST semantics. We can easily make that unnecessary (and self documenting), by only requiring implementers to implement alloc_nonzero_unchecked (modulo the name).

Adding alloc_nonzero_unchecked and providing a default impl for alloc makes AllocRef less complicated to implement, and more flexible to use. In my opinion, the complexity added by introducing another method is completely overwhelmed by those advantages.

TimDiekmann commented 4 years ago

Adding alloc_nonzero_unchecked and providing a default impl for alloc makes AllocRef less complicated to implement, and more flexible to use.

IMO this is a pretty important point. Implementors shouldn't have to implement more than two or three methods at most (e.g. alloc, dealloc, and maybe realloc, but this should provide a default implementation as well).

TimDiekmann commented 4 years ago

As of the review above https://github.com/rust-lang/wg-allocators/issues/16#issuecomment-580337610 I close this proposal. Banning zero sized allocation is too restrictive for the rust std library (core/liballoc/std). I'll leave the issue open anyway, until we have a new proposal for adding methods handling non-zero allocations. There we can decide, if and how we want to add more methods to AllocRef.

Wodann commented 4 years ago

Even though the topic has already been closed, I just wanted to say that I fully agree with @scottjmaddox. Users of the API should have the flexibility to use it for any use case, be it performance (i.e. no branching) or ease of use (i.e. single entry-point). Having a second trait function with a default implementation allows for both.

Also, maybe alloc_sized might be a good name?

TimDiekmann commented 4 years ago

I like alloc_sized more than alloc_zeroed or alloc_nonempty. Would you use NonZeroLayout for this? @Wodann

TimDiekmann commented 4 years ago

How would a default implementation behave, if the layout has a zeroed size?

// Safety: alloc requires non-zero layout
unsafe fn alloc(Layout) -> Result<NonNull<u8>, _> {
    if layout.size() == 0 {
        Err(_)
    } else {
        Self::alloc_sized(NonZeroLayout::from_layout_unchecked(layout))
    }
}

fn alloc(Layout) -> Result<NonNull<u8>, _> {
    if layout.size() == 0 {
        Ok(NonNull::dangling())
    } else {
        Self::alloc_sized(NonZeroLayout::from_layout_unchecked(layout))
    }
}

Lokathor commented 4 years ago

an allocator should have sentinel values for ZST inputs, and usually those sentinel values should be the alignment requested.

Of course if you potentially allocate out of the first 16 bytes or whatever you'd have to pick some other sentinels.

Edit: or just give up and Err, but usually you should try not to Err if you don't have to, obviously.

petertodd commented 4 years ago

Note how a default alloc impl for potentially zero sizes allocations needs a corresponding default impl in dealloc that does nothing.

On February 1, 2020 6:27:44 PM EST, Lokathor notifications@github.com wrote:

an allocator should have sentinel values for ZST inputs, and usually those sentinel values should be the alignment requested.

Of course if you potentially allocate out of the first 16 bytes or whatever you'd have to pick some other sentinels.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/wg-allocators/issues/16#issuecomment-581080429

TimDiekmann commented 4 years ago

Note how a default alloc impl for potentially zero sizes allocations needs a corresponding default impl in dealloc that does nothing.

The trait is unsafe anyway so an implementor have to follow the safety guidelines on the traits in either case.

TimDiekmann commented 4 years ago

an allocator should have sentinel values for ZST inputs, and usually those sentinel values should be the alignment requested.

I think a sentinal value may be used as const value, as different allocators may have different sentinals. This sentinal value have to be checked in dealloc, if alloc isn't implemented. IMO this is fine.

scottjmaddox commented 4 years ago

Hmm... So supporting zero sized allocation requires an extra branch condition in alloc and an arbitrary number of extra branches (or maybe a jump table?) in dealloc to check for sentinel values... Sounds kind-of expensive. Maybe supporting zero-sized allocation isn't such a great idea...

petertodd commented 4 years ago

You don't need a sentinel. Just mark dealloc unsafe and do nothing if the size is zero. If the outer alloc is marked inline that'll be optimized out in most cases.

On February 1, 2020 6:39:35 PM EST, Scott J Maddox notifications@github.com wrote:

Hmm... So supporting zero sized allocation requires an extra branch condition in alloc and an arbitrary number of extra branches (or maybe a jump table?) in dealloc to check for sentinel values... Sounds kind-of expensive. Maybe supporting zero-sized allocation isn't such a great idea...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/wg-allocators/issues/16#issuecomment-581081111

petertodd commented 4 years ago

(and actually, or course it's unsafe already)

On February 1, 2020 6:39:35 PM EST, Scott J Maddox notifications@github.com wrote:

Hmm... So supporting zero sized allocation requires an extra branch condition in alloc and an arbitrary number of extra branches (or maybe a jump table?) in dealloc to check for sentinel values... Sounds kind-of expensive. Maybe supporting zero-sized allocation isn't such a great idea...

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/wg-allocators/issues/16#issuecomment-581081111

scottjmaddox commented 4 years ago

Oh, right, I forgot the dealloc takes a Layout, too. Okay, so just an extra branch in alloc and in dealloc. Still not free, but as long as there's a way to bypass the branches, it should be fine.

Wodann commented 4 years ago

How would a default implementation behave, if the layout has a zeroed size?

The former. As dangling is generic over type T it becomes (source from bumpalo):

    #[inline(always)]
    unsafe fn alloc_zero_sized_type(&self, align: usize) -> NonNull<u8> {
        // We want to use NonNull::dangling here, but that function uses mem::align_of::<T>
        // internally. For our use-case we cannot call dangling::<T>, since we are not generic
        // over T; we only have access to the Layout of T. Instead we re-implement the
        // functionality here.
        //
        // See https://github.com/rust-lang/rust/blob/9966af3/src/libcore/ptr/non_null.rs#L70
        // for the reference implementation.
        let ptr = align as *mut u8;
        NonNull::new_unchecked(ptr)
    }

Note how a default alloc impl for potentially zero sizes allocations needs a corresponding default impl in dealloc that does nothing.

I forgot about dealloc needing to handle this case too (and by extension realloc probably too)

At this point you basically have a set of safe Sized trait functions and a set of non-Sized trait functions.

Lokathor commented 4 years ago

How often could you possibly be allocating that you need to worry about a single if.

Surely the rest of the allocator isn't branchless anyway.

Wodann commented 4 years ago

The way I see it alloc_zero_sized_type and alloc_sized are orthogonal in behaviour. They are completely separate cases of allocation that have no overlap. The choice to provide a shared alloc function is merely to provide a "usability" wrapper that allows people to specify a single entry-point for all of their allocation needs - that comes at the expense of performance.

As such I'd argue for something along the lines of:

trait AllocRef {
    fn alloc_sized(self, layout: NonZeroLayout) -> NonNull<u8>;

    /// Has default implementation
    #[inline(always)]
    unsafe fn alloc_zero_sized_type(self, align: usize) -> NonNull<u8> {
        // We want to use NonNull::dangling here, but that function uses mem::align_of::<T>
        // internally. For our use-case we cannot call dangling::<T>, since we are not generic
        // over T; we only have access to the Layout of T. Instead we re-implement the
        // functionality here.
        //
        // See https://github.com/rust-lang/rust/blob/9966af3/src/libcore/ptr/non_null.rs#L70
        // for the reference implementation.
        let ptr = align as *mut u8;
        NonNull::new_unchecked(ptr)
    }

    unsafe fn alloc(self, layout: Layout) -> NonNull<u8> {
        if layout.size() == 0 {
            self.alloc_zero_sized_type(layout.align())
        } else {
            self.alloc_sized(NonZeroLayout::from_layout_unchecked(layout))
        }
    }

Devs can choose the performant route; implementing and calling alloc_sized andalloc_zero_sized_type directly, or using alloc if they don't care about bare metal performance.

Wodann commented 4 years ago

Surely the rest of the allocator isn't branchless anyway.

Optimization in a hot code path (such as an allocator) means everything. Branching is a big part of that. E.g. https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html

scottjmaddox commented 4 years ago

How often could you possibly be allocating that you need to worry about a single if.

Surely the rest of the allocator isn't branchless anyway.

Bump allocators only need a few (very fast) integer ops and a branch to check if there's enough memory available. An extra branch would be significant.

TimDiekmann commented 4 years ago

@Wodann I think your approach is matured enough to be summed up as a proposal. Do you want to make one and file a new issue? While I don't like to have too many functions in AllocRef, those absolutely makes sense to me.

JelteF commented 4 years ago

Surely the rest of the allocator isn't branchless anyway.

Optimization in a hot code path (such as an allocator) means everything. Branching is a big part of that. E.g. https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html

Not denying this, but I would assume this branch is optimized away by the compiler most of the time, since often constant sizes are allocated. So constant folding should remove the check, assuming the check is inlined.

Especially in @Wodann case. If you put #[inline] on that alloc method, then performance should be the same most of the time.

Ixrec commented 4 years ago

Bump allocators only need a few (very fast) integer ops and a branch to check if there's enough memory available. An extra branch would be significant.

Would a bump allocator need to explicitly check for zero to support zero-sized allocations? IIUC its alloc() would be little more than ptr += size; and if (ptr >= (start + capacity)) { ... } which seems like it'd do the right thing even if size is 0.

(I couldn't personally come up with an allocator usage where the performance of a single branch in alloc() matters and supporting ZSTs would add an additional branch, only one or the other, which is why I buy the arguments others in this thread have been making for requiring ZST support to keep the Alloc API simple)

Wodann commented 4 years ago

Would a bump allocator need to explicitly check for zero to support zero-sized allocations?

The returned pointer needs to have the proper alignment, which could require the ptr value to be offset. As that is the case, you still need to perform the branch statement.

edit: My response didn't make it clear that "branch statement" refers to the bounds check. Indeed for the bump allocator a if layout.size() == 0 would not be necessary, because the normal logic applies for zero-sized types too.

[..] I would assume this branch is optimized away by the compiler most of the time, since often constant sizes are allocated.

For any compile-time constants, that's true but not for runtime-specified layouts. Whether that is a common, I can't tell.

Do you want to make one and file a new issue

Yes, I can do that. What would be favorable, us adding the additional functions guaranteeing a zero-cost API for both compile-time constants and runtime variables? Or making the API simpler?

Lokathor commented 4 years ago

But checking for alignment and needing to offset by some amount from the current bump location is done regardless of the allocation size, so i don't think that counts.

EDIT: also most Zero-sized types are alignment 1.

JelteF commented 4 years ago

For any compile-time constants, that's true but not for runtime-specified layouts. Whether that is a common, I can't tell.

I can see it happening mostly for datastructures that want to grow their buffer. I'm not sure but in those cases the compiler might still be able to assert that it's non-zero in most cases.

Do you want to make one and file a new issue

Yes, I can do that. What would be favorable, us adding the additional functions guaranteeing a zero-cost API for both compile-time constants and runtime variables? Or making the API simpler?

I think @TimDiekmann meant your 3 method approach, where only alloc_sized doesn't have a default implementation. My suggestions/questions for those methods:

Add #[inline(always)] to alloc
What's the reason you marked alloc and alloc_zero_sized_type as unsafe? Only because alloc_zero_sized_type doesn't work when passing 0? If so you can use NonZeroUsize

rust-lang / wg-allocators

Ban zero-sized allocations #16