rust-lang / wg-allocators

Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!
http://bit.ly/hello-wg-allocators
205 stars 9 forks source link

Add `Vec::with_capacity_zeroed` #32

Closed TimDiekmann closed 4 years ago

TimDiekmann commented 4 years ago

RawVec also provides a _zeroed variant. This may also be added to Vec ~(as unsafe fn)~.

Link to Zulip discussion

gnzlbg commented 4 years ago

I don't see how this would be a good idea, since if this matter you probably want a different collection type any ways, or you need to be very careful with your code and use Vec::{push_zeroed, resize_zeroed, extend_zeroed, insert_zeroed, ...} because any operation that might end up reallocating the vector needs to know that new memory must be zeroed.

I kind of expect that this is something that a custom allocator can solve:

struct AlwaysZeroed<A: AllocRef>(A);
impl AllocRef for AlwaysZeroed {
    fn alloc(self, ...) -> ... { self.alloc_zeroed(...) }
    fn realloc(self, ...) -> ... { self.realloc_zeroed(...) } // or manually zero mem here
    ...
}

// and then you just do:
type ZeroedVec<T, A = alloc::System> = Vec<T, AlwaysZeroed<A>>;

Does that make sense?

Lokathor commented 4 years ago

What would be more broadly helpful is docs guidance (similar to Box) on how a user can correctly call an allocator (zeroed or not) and then make that allocation into a Vec that will continue to work properly with that allocator.

TimDiekmann commented 4 years ago

@gnzlbg While this would solve the initialization, ZeroedVec would always use the zero-allocator, even when pushing elements to it, unless we introduce an API, which converts the underlying allocator to another one (please no).

Allocating a zeroed buffer does not affects anything of the Vec API, as with non-unsafe methods, it's not possible to access the underlying buffer. You can still use push, resize, extend as before. But you can safely use set_len and get a well defined vector.

gnzlbg commented 4 years ago

Maybe this should start with explaining which problem it wants to solve.

The only problem I can imagine that this solved is having the memory owned by the Vec to always be zero-initialized, as opposed to be potentially uninitialized, so that you can pass the "tail" of the vector as a buffer to some API (e.g. bytes), and then just use Vec::set_len to resize or similar.

Apparently, the problem that you are interested in solving is just zeroing the memory once, when the vector is initialized, and if a Vec::push resizes the vector, then it is fine to re-allocate without zero-initializing the memory. Why is this abstraction useful ? Which applications require the initial allocation to be zero-initialized, but are fine with future allocations being uninitialized ?

gnzlbg commented 4 years ago

@Lokathor

What would be more broadly helpful is docs guidance (similar to Box) on how a user can correctly call an allocator (zeroed or not) and then make that allocation into a Vec that will continue to work properly with that allocator.

You can just use Box<[T], A> to allocate a boxed-slice (or A::alloc, etc. directly), and then use Vec::from_raw_parts to go from that boxed-slice into a Vec<T, A> (with custom allocators, Vec::from_raw_parts needs to accept an allocator as well as input).

Lokathor commented 4 years ago

Often you just want a zeroed buffer on the heap and after that first size selection (which might be runtime dynamic) you don't plan to resize it, and even if you do you'd resize it with the normal API for pushing and popping.

I've wanted this Vec method like a hundred times.

TimDiekmann commented 4 years ago

If you have an API (like Vulkan), which has well defined states for a zeroed struct, may ask for an array of those structs. As @Lokathor pointed out, you could use alloc_zeroed and from_raw_parts, but with_capacity_zeroed would feel much more natural.

You can just use Box<[T], A> to allocate a boxed-slice

How is this related to zeroed memory?

I've wanted this Vec method like a hundred times.

Me too

gnzlbg commented 4 years ago

Often you just want a zeroed buffer on the heap and after that first size selection (which might be runtime dynamic) you don't plan to resize it,

How is this related to zeroed memory?

We have a type in the standard library already that represent a dynamically-sized fixed-size array, and that's a boxed slice (Box<[T]>). For that type, since it cannot be resized, it might make sense for the initial allocation to have to be zeroed for "reasons", and there are multiple ways to achieve that - this appears to me to be what you want, but since you have not described the problems you are solving, it's hard to tell.

Apparently, you are claiming that your problem requires: (1) a dynamically-resizable array (being able to grow the allocation), whose first allocation must be zeroed, while subsequent allocations must not be zeroed but left uninitialized instead.

This would mean that Box<[T]> doesn't solve your problems. Can any of you actually describe the problems that you have encountered with these constraints ? I am quite good at stretching my imagination and can't truly think of any. Sound like the perfect opportunity for me to learn something new.

Lokathor commented 4 years ago

Sure, the logic goes like this:

Ixrec commented 4 years ago

But why? What is the use case that leads to those requirements?

gnzlbg commented 4 years ago

I want the whole thing to start zeroed but after that I don't care because after the initial zeroing I will use the normal and safe Vec API.

If after the initial zeroing you are only using the safe Vec API, what do you care how the memory is allocated ? Or are you doing a Vec::set_len after the allocation to just change the initial length using the assumption that memory is zeroed ?

It's really as simple as "after creation I don't care what the other spare backing memory is", it's not that later allocations must be zeroed or must be uninit, there's no "must" at all.

So what are the constraints after that ? If you don't care about whether the memory is zeroed after wards or not, why do you care in the initial allocation?


Right now, you can do:

let size: usize;
let vec = Vec::<T>::from_raw_parts(
    alloc::System::alloc_zeroed(Layout::array<T>(size).unwrap()),
    size,
    size
);

With allocators it would be:

let size: usize;
let a: A;
let vec = Vec::<T, A>::from_raw_parts_alloc(
    A::alloc_zeroed(Layout::array<T>(size).unwrap()),
    size,
    size,
    a
);

It isn't nice, but it isn't horrible either. Hard to tell whether adding a method to make this easier would be worth it without knowing which problem it really solves

Lokathor commented 4 years ago

You care about the initial allocation because the allocator can give you zeroed memory if you ask for it except the Vec API doesn't let you ask for it.

Because most people don't use the allocator API, they just use the Vec type.

So we put something on the Vec type itself to help them out.

gnzlbg commented 4 years ago

If you don't care whether the subsequent memory is zeroed or not, and it is ok for it to be zeroed, the initial solution I showed here solves your problem.

Yet given the subsequent comments, you appear to care about subsequent memory not being zeroed for some reason.

Lokathor commented 4 years ago

Why would i care if subsequent memory allocations are also zeroed? That's the part that makes no sense to me.

The only time there would be a future allocation is during a reallocation, and in that case you're already pushing elements, so it doesn't matter if the reallocated memory was zero or any other value, because you're about to write over it with your new elements you're pushing in.

gnzlbg commented 4 years ago

So then the initial solution solves your problem, no need to add any extra method to Vec.

The only time there would be a future allocation is during a reallocation, and in that case you're already pushing elements, so it doesn't matter if the reallocated memory was zero or any other value, because you're about to write over it with your new elements you're pushing in.

Reallocation can happen due to a call to Vec::reserve, and with zeroed memory, you can just Vec::set_len afterwards.

Why would i care if subsequent memory allocations are also zeroed? T

How should we know? We have asked for you or @TimDiekmann to give a concrete example of a problem you would solve with this API, since that would save everybody a lot of time. Without that, the best one can do is guess.

TimDiekmann commented 4 years ago

No, as the initial solution introduced unnecessary overhead as soon as you reallocation.

But why? What is the use case that leads to those requirements?

Because it's an optimized way of allocating memory, you don't have zero the buffer after allocating it and you want to rely on this.

We have a type in the standard library already that represent a dynamically-sized fixed-size array

This isn't a problem about fixed sized arrays, it's a problem about initialization.

you are just talking about hypothetical non-concrete situations

I already gave an example: C-APIs, which requires an array of initialized structs, where zeroed means well defined. Another example would be an array of zeroed initialized memory (e.g. in neuronal networks).

EDIT: It's very hard to track your comments, if you are editing 5 or 6 times.

gnzlbg commented 4 years ago

No, as the initial solution introduced unnecessary overhead as soon as you reallocation.

Ok, so you do have the constraint that the memory must not be zeroed on reallocation. Notice that @Lokathor does not have this constraint, so you might be talking about two different problems.

This isn't a problem about fixed sized arrays, it's a problem about initialization.

I already gave an example: C-APIs,

Can you give an example in code? Are you implementing a C API ? Calling one? How do you deal with dynamic arrays in the C API ? (most of the use fixed-size arrays).

Another example would be an array of zeroed initialized memory (e.g. in neuronal networks).

Like an array of floats that are zero-initialized? vec![0_f32; dynamic_size] uses Alloc::alloc_zeroed for that already.