Remove `usable_size` APIs

rust-lang / wg-allocators

Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!

http://bit.ly/hello-wg-allocators

207 stars 9 forks source link

Remove `usable_size` APIs #17

Closed gnzlbg closed 4 years ago

gnzlbg commented 5 years ago

The Alloc::usable_size method is, in general, used like this:

let required_layout = ...compute minimum layout required...;
let (l, u) = Alloc::usable_size(required_layout.size(), required_layout.align());
let layout = ...compute a new layout that makes use of l,u...;
let a = Alloc::alloc(layout)?; // allocate the memory
...track [l, u] as part of the allocation internally...

This has two big downsides.

The main issue is that usable_size makes it impossible to obtain meaningful telemetry information about which allocations the program actually wants to perform, resulting in a program that is essentially "un-tuneable" from the allocator perspective.

For example, if one were to dump allocation statistics with jemalloc for a program that uses usable_size, this program would appear to be using the allocator's size classes perfectly, independently of how bad these are.

The other issue is that code that uses usable_size computes the allocation size classes twice, once in the usable_size call, and once in the call to alloc (alloc just gets a layout, it then needs to find the size class appropriate for the layout, which is exactly what usable_size already did, but alloc does not know).

So IMO we should remove usable_size, and instead, make the allocation functions always return the usable_size, such that code like the above is now written as:

let required_layout = ...compute minimum layout required...;
let (ptr, usable_size) = Alloc::alloc(required_layout)?;
...track [required_layout.size(), usable_size] as part of the allocation internally...

This makes sure that allocators are always able to print accurate statistics of the requested allocators, which can then be used to tune allocator performance for particular applications, and allows allocators to perform the size class computation only once (e.g. jemalloc's smallocx). Most ABIs do support two return registers in their calling convention, such that a pointer and a size can be always returned in most platforms without incurring any extra costs (e.g. like having to pass this via memory).

Allocators that do not support size class computation (e.g. glibc's malloc), can just return the requested layout size in Rust code, which can be inlined. If the usable size of the return is not used, inlining can remove its computation for allocators that do need to perform an extra call to obtain it, as long as that function is readonly (e.g. jemalloc's mallocx+nallocx).

This means that all Alloc::..._excess functions can be removed, since the normal functions do their job by default.

closes #13

TimDiekmann commented 5 years ago

When not using the returned usable_size, will the compiler be able to optimize away the code?

Do you think we should use Excess instead of a tuple? Or generally remove Excess?

TimDiekmann commented 5 years ago

With these changes it would be easier to write a wrapper implementation, because less methods would have to be overwritten.

gnzlbg commented 5 years ago

When not using the returned usable_size, will the compiler be able to optimize away the code?

jemalloc has a function that does this (smallocx returns a pointer and the usable size) at no extra cost, since the allocation function has to compute the size anyways. So if you are implementing Alloc for a "nice" allocator, there is nothing to optimize away.

Some allocators can query the usable size, by doing a separate FFI call. Just because that can be done does not mean that one should do it, e.g., some of this functionality in some allocators is for "debugging only" and not very fast. Still, you could implement Alloc twice, for two types, one doing the call, and one not doing the call.

Even when doing the FFI call, as mentioned in the OP, if the FFI function is read only, e.g., because it has the #[c_ffi_const] or #[c_ffi_pure] attributes or due to cross-lang-lto, then yes, if the result is not used, LLVM can, and does, eliminate the function call.

When implementing this functionality in Rust, e.g., by just returning layout.size() (the requested size), then, LLVM can also remove that because Layout::size() is readnone. If LLVM stops doing that, then the optimizer would be so broken that none of this would matter anyways.

gnzlbg commented 5 years ago

Do you think we should use Excess instead of a tuple? Or generally remove Excess?

I find Excess to be a very confusing name. I would prefer something like #[repr(C)] struct Allocation(NonNull<u8>, usize); or something like that to allow interfacing with jemalloc's smallocx_return_t (https://github.com/jemalloc/jemalloc/blob/dev/src/jemalloc.c#L2946) directly (e.g. via just a transmute), without having to convert that into another type.

The problem with using a tuple is that their layout is unspecified, and repr(Rust), so it would need to be converted internally into something else.

TimDiekmann commented 5 years ago

If this would be combined with #16, then the alloc signature could also be changed to this:

#[repr(C)] struct Allocation(NonNull<u8>, Option<NonZeroUsize>);
unsafe fn alloc(layout: Layout) -> Result<Allocation, AllocErr>;

if the usable size is None, the allocator does not handle the excess, or no excess is possible. Otherwise it returns Some(usable_size). This could be too verbose however and I don't know, if this is an improvement, or how good the compiler can optimize the branch on if let Some(usable_size) instead of just using the usize in Allocation(NonNull<u8>, usize).

EDIT: We probably shouldn't do my proposal :smile:

TimDiekmann commented 5 years ago

I experimented around a bit with removing usable_size and _excess variants, here are my experiences:

Removing usable_size limits the interface, as it may be useful to ask the allocator, what it can do with a specified layout. Additionally usable_size is used to provide shrink_in_place and grow_in_place, which would have to be implemented by the implementer itself. IMO usable_size should stay.

Removing _excess variants is working fine, as long as you don't need alloc_zeroed (and realloc_zeroed and grow_in_place_zeroed if we take #14 into account). If I call alloc_zeroed with a size of 10 bytes, and the allocator tells me to use 64 bytes, I expect the trailing 54 bytes to be zeroed. However, I will only use my 10 bytes, as I don't need a growable collection. The allocator is forced to memset 54 bytes to 0 (assuming the memory allocated is uninitialized), which are not needed at all.

I neither like the name Excess. As you mentioned, Allocation could be used. Another option would be Block, as it's describes a block of memory with starting address and size. Assuming, _excess won't get removed (what I would advocate), and Excess will be renamed, how to call _excess variants? I think they shouldn't be renamed.

Summary (what I'd prefer):

Don't remove usable_size
Don't remove _excess variants
Rename Excess to Allocation or Block

If this is plausible, we may close this in favor of a new issue Rename Excess.

A matter of taste, but I don't like tuple structs and would prefer struct Block { ptr: NonNull<u8>, usable_size: usize } for clarity.

gnzlbg commented 5 years ago

Removing usable_size limits the interface, as it may be useful to ask the allocator, what it can do with a specified layout.

The main use case of usable_size right now is obtaining the real size of an allocation to be able to use it all, and this is covered by the _excess methods.

AFAIK the only other situation in which it is useful is for debugging / introspection purposes, but this use case is in general better covered by allocator-specific APIs, which allow to query much more information than what "usable_size" is able to provide. Allocators can expose these APIs already, and we do so for jemalloc in jemallocator and in the jemalloc-ctl crates.

The question is more of whether it is worth it to have this API in the Alloc trait, and which problems is it aimed to solve.

as long as you don't need alloc_zeroed (and realloc_zeroed and grow_in_place_zeroed if we take #14 into account). If I call alloc_zeroed with a size of 10 bytes, and the allocator tells me to use 64 bytes, I expect the trailing 54 bytes to be zeroed. However, I will only use my 10 bytes, as I don't need a growable collection.

How is this any different from calling alloc_zeroed, and then calling usable_size on the allocation ?If you do so, you should be able to use the full allocation, which requires the allocator to zero it anyways. Worse, usable_size does not know whether the allocation came from alloc_zeroed or some other method, so it would always need to zero the whole bucket when requested via alloc_zeroed. If instead alloc_zeroed returns the actual allocation size, it could return 10 in your example, while for the same requested layout, alloc could return 64.

In practice, most allocators do not really care about the allocation size internally. The first thing they do is compute the size class, and then just use that. The worst case is when you have a memory page where you only use 1 byte, and that requires zeroing the whole page, but that only happens for "large" allocations, and the main optimization that alloc_zeroed enables is that the allocator can request zeroed pages from the OS and avoid having to touch the memory to zero it, not because zeroing is expensive, which is not, but because accessing memory is expensive, and on a system with overcommit the OS would be in charge of, on the first access, to zero the memory, which is not necessary if its not accessed.

#[repr(C)] struct Allocation(NonNull<u8>, Option<NonZeroUsize>);

I wonder whether that's better or worse than just Allocation(NonNull<u8>, Option<NonZeroUsize>). I like the idea of letting the allocator tell you whether it can support returning the real allocation size, but then code that would look like this:

let a = alloc(layout)? 
Self { ptr: a.0, size: a.1 }

would become

let a = alloc(layout)? 
Self { ptr: a.0, size: if let Some(sz) = a.1 { sz } else { layout.size() } }

where we have an extra branch. Is this worth it? I don't know, seems to add little value, since the allocator could have just set a.1 = layout.size() anyways.

I think that if the point of the Option is to add a generic API for querying whether an allocator supports returning a larger allocation size. If that's the use case, maybe that can be addressed somehow else, e.g. via a method or associated bool const that communicates that. I don't know whether this problem is worth solving, but we can always add that later if we find use cases for it.

A matter of taste, but I don't like tuple structs and would prefer struct Block { ptr: NonNull, usable_size: usize } for clarity.

Something like that would LGTM.

TimDiekmann commented 5 years ago

The main use case of usable_size right now is obtaining the real size of an allocation to be able to use it all, and this is covered by the _excess methods.

usable_size does not return the real size of an allocation, but a guarantee, how Layout can be modified when passing it into realloc or dealloc. From the current Alloc doc:

Returns bounds on the guaranteed usable size of a successful allocation created with the specified layout.

In particular, if one has a memory block allocated via a given allocator a and layout k where a.usable_size(k) returns (l, u), then one can pass that block to a.dealloc() with a layout in the size range [l, u].

This is also the guarantee, that grow_in_place and shrink_in_place rely on. So the real use case here is not to get the current allocation size, but the possible layout to be passed to dealloc and realloc. Every allocation with this specific layout have at least that bound returned by usable_size. It's basically a hint, which should be cheap to be calculated. To get the real allocation size, you may use alloc_excess. It is provided with a fallback to usable size, but an allocator can overwrite it. It does not necessarily match a call to usable_size().

The question is more of whether it is worth it to have this API in the Alloc trait, and which problems is it aimed to solve.

Yes, as only the allocator knows of the actual implementation and size hints.

Additionally, usable_size is used to provide grow_in_place and shrink_in_place. Otherwise those methods have to be removed as well or have to be implemented every time (which is error prone).

How is this any different from calling alloc_zeroed, and then calling usable_size on the allocation ?

Basically the same answer as above: alloc_zeroed_excess may return a higher upper bound on the allocation size while usable_size is not tied to the specific allocation.

If [...] alloc_zeroed returns the actual allocation size, it could return 10 in your example, while for the same requested layout, alloc could return 64.

As I have requested 10 bytes, this is fine. usable_size does not return the actual allocation, but a lower and upper bound on a specific layout. To continue with my example: If I pass a layout with a size of 10 bytes to usable_size, it will always return at most 10 for the lower bound and at least 64 for the upper bound. Otherwise, the safety clause on the trait is not fulfilled.

In practice, most allocators do not really care about the allocation size internally. The first thing they do is compute the size class, and then just use that. The worst case is when you have a memory page where you only use 1 byte, and that requires zeroing the whole page, but that only happens for "large" allocations, and the main optimization that alloc_zeroed enables is that the allocator can request zeroed pages from the OS and avoid having to touch the memory to zero it, not because zeroing is expensive, which is not, but because accessing memory is expensive, and on a system with overcommit the OS would be in charge of, on the first access, to zero the memory, which is not necessary if its not accessed.

As we design a trait for the general case, I would not rely on those statements. Rust is also used embedded, where zeroing memory is maybe more expensive, depending on the allocator. "most allocators" is not enough for a trait on a system language IMO.

where we have an extra branch. Is this worth it? I don't know, seems to add little value, since the allocator could have just set a.1 = layout.size() anyways.

This is the reason I did an edit to the post and marked it as outdated :slightly_smiling_face:

gnzlbg commented 5 years ago

usable_size does not return the real size of an allocation, but a guarantee, how Layout can be modified when passing it into realloc or dealloc.

Sure, but I do not see why we would need a method that provides this guarantee. We can just say that one is only allowed to pass all methods that need the Layout of an already existing allocation Layouts with the same alignment, and sizes in range [requested, actual], and that's it (all allocating and resizing methods would always return the actual size, and the requested one is always passed in).

In fact, usable_size cannot do better for any allocator I know. For the actual size, many allocators provide a simple and fast way to compute it (e.g. jemalloc nallocx), but for the lower bound, usable_size just returns Layout.size() because doing better is hard.

Consider an allocator with 2 size classes, 8 bytes, 16 bytes, and large allocations. Then when requesting 10 bytes, the allocator will compute quickly that it fits in the 16 bytes size class. However, for usable_size it would also need to compute that the lower bound is 9, so that usable_size could correctly return [9, 16]. For that the allocator needs to also query, what's the size class below 16, and return the smallest size that does not fit, which is something that allocators never actually need to do.

This information can be interesting, and the introspection methods of the allocators allow fetching the whole size class hierarchy and computing it, but that's often very slow (e.g. glibc's malloc_info returns XML, which would need to be parsed).

If instead we just say that all methods modifying an allocation and taking its Layout work for Layouts in range [requested, actual], then we are golden. Methods that resize an allocation get a requested size, and return an actual size, so if you want to shrink 10 bytes to 9 bytes in place in our example above, you could try that, and if shrink_in_place succeeds, it would return an actual size of 16, but since you requested 9, now you can pass [9, 16]. This basically says that, passing 9 before doing the shrink_in_place is UB, so that one could have only passed [10, 16], while in practice, passing 9 would have worked, but when would doing so be useful ? E.g. one Requests 10, and gets 16, one can either track 10 or 16, but when would one actually say I'm going to gamble that I can also deallocate this by passing 9 ? If one wants to do that, requiring a call to shrink_in_place to succeed seems more principled.

Additionally, usable_size is used to provide grow_in_place and shrink_in_place. Otherwise those methods have to be removed as well or have to be implemented every time (which is error prone).

The default implementation of usable_size is used to provide the default implementations of grow/shrink_in_place, that's indeed true, but those implementations don't do anything and are kind of useless, so for usable_size to be worth it one would need to at least show that doing this is useful.

For all allocators I know for which one could override usable_size, e.g., TCMalloc and jemalloc, one would still want to override grow/shrink_in_place, because the default ones would still be useless.

So... we can just provide default implementations of grow/shrink_in_place that do nothing, and if some allocator can do better, they can override these. Whether the allocator calls some "usable_size" like API internally or not, its up to them, but for jemalloc we wouldn't do that, since xallocx returns the actual allocation size already, and therefore doing so is not necessary.

One other aspect to consider is that if usable_size exists, and we define the valid layout for describing an allocation as its [lower, upper] bounds, then we are also requiring that allocation sizes have to be independent of memory addresses (because usable_size(Layout) doesn't take a memory address, as opposed to malloc_usable_size(ptr)).

TimDiekmann commented 5 years ago

Consider an allocator with 2 size classes, 8 bytes, 16 bytes, and large allocations. Then when requesting 10 bytes, the allocator will compute quickly that it fits in the 16 bytes size class. However, for usable_size it would also need to compute that the lower bound is 9, so that usable_size could correctly return [9, 16]. For that the allocator needs to also query, what's the size class below 16, and return the smallest size that does not fit, which is something that allocators never actually need to do.

This information can be interesting, and the introspection methods of the allocators allow fetching the whole size class hierarchy and computing it, but that's often very slow (e.g. glibc's malloc_info returns XML, which would need to be parsed).

If instead we just say that all methods modifying an allocation and taking its Layout work for Layouts in range [requested, actual], then we are golden. Methods that resize an allocation get a requested size, and return an actual size, so if you want to shrink 10 bytes to 9 bytes in place in our example above, you could try that, and if shrink_in_place succeeds, it would return an actual size of 16, but since you requested 9, now you can pass [9, 16]. This basically says that, passing 9 before doing the shrink_in_place is UB, so that one could have only passed [10, 16], while in practice, passing 9 would have worked, but when would doing so be useful ? E.g. one Requests 10, and gets 16, one can either track 10 or 16, but when would one actually say I'm going to gamble that I can also deallocate this by passing 9 ? If one wants to do that, requiring a call to shrink_in_place to succeed seems more principled.

Thanks for pointing this out, this is indeed a weird behavior! Another strange behavior is combining this with alloc_zeroed: When allocating 10 zeroed bytes in your allocator, it's guaranteed to be zeroed for [0, 10). usable_size however returns [0, 16) and thus I expect [10, 16) to be zeroed as well. Sure, this guarantee is not documented and in fact it is wrong to assume this, but this is not intuitive.

However, I don't like to assume, that every allocator can trivially return an excess, so I don't like to drop _excess. Let's put together an example, where a user (me for simplicity) wants to implement an allocator without any assumption about the allocator logic to stay conservative:

I need a specific allocator, implement alloc and dealloc, and everything works fine.
I discover a way how to optimize the allocator, and implement alloc_zeroed and realloc as well.
As I want to use this allocator in Vec<T, A> and in Box<T, A>, thus I want to query the excess of an allocation in 50% of my use case
I optimize the allocator to overallocate if this makes sense. The calculation however takes about two extra branches, but after benching it turns out, that those branches work well for Vec<T, A> but not for Box<T, A> and for some reason I don't want to implement the logic twice.

I see three ways to solve this problem:

Stick with the current API. Downsides:
- The API get's pretty big for simple allocators, which don't provide an excess api (minor)
- It's error prone when extending/optimizing an existing implementation (minor, implementing Alloc is a rare case, this is what testing is for)
Always return the excess. Downsides:
- The allocator is unoptimized for Box<T, A> (major, performance loss)
Provide an extension trait: ExcessAlloc. Downsides:
- Another trait (minor, performance gain > verbosity)
- Without specialization stabilized, it's not possible to pick the right trait. (minor, not an issue in lib_alloc)
- With specialization stabilized, it's more verbose to pick the right trait. (minor, performance gain > verbosity)

I think the latter is the best of both worlds. I'll play around with dropping usable_size and ExcessAlloc and will report if this makes sense as soon as I have finished.

Generally, a potential performance loss is not an option. If we find one use case, where performance is dropped, there will be a bunch of others.

By the way, even it turns out we drop the excess API, #21 gets even more unlikely, but this is another topic.

gnzlbg commented 5 years ago

However, I don't like to assume, that every allocator can trivially return an excess,

The problem is that the assumption is true: every allocator can trivially compute the correct excess with zero-overhead and in O(1) space a time by just returning Layout::size().

An example would be an "accurate" implementation of usable_size for glibc's malloc. We could call malloc_info inside usable_size, parse the returned XML, and return an accurate result, but then grow/shrink_in_place, which use it, could become much slower than just always reallocating the memory.

As I want to use this allocator in Vec<T, A> and in Box<T, A>, thus I want to query the excess of an allocation in 50% of my use case

Always return the excess. Downsides: The allocator is unoptimized for Box<T, A> (major, performance loss)

This is incorrect. Box does not use the excess returned, so the computation is optimized out. This is something that we already did discuss above.

SimonSapin commented 5 years ago

Box does not use the excess returned, so the computation is optimized out.

https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487 claims:

One way to reduce the problem in half is to just make the alloc family return an Excess, but it turns out that a) the compiler does a bad job at optimizing it out when it’s not used

TimDiekmann commented 5 years ago

However, I don't like to assume, that every allocator can trivially return an excess,

The problem is that the assumption is true: every allocator can trivially compute the correct excess with zero-overhead and in O(1) space a time by just returning Layout::size().

An example would be an "accurate" implementation of usable_size for glibc's malloc. We could call malloc_info inside usable_size, parse the returned XML, and return an accurate result, but then grow/shrink_in_place, which use it, could become much slower than just always reallocating the memory.

Sorry, I don't get your point. Sure, every allocator can return Layout::size() and this makes sense for Box<T, A>. But what if I want the allocator to parse the XML to use the actual excess in Vec<T, A>?

This is incorrect. Box does not use the excess returned, so the computation is optimized out. This is something that we already did discuss above.

The compiler cannot optimize out the branch for Box if the lookup may have side effects.

gnzlbg commented 5 years ago

https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487 claims:

That's fixed with https://github.com/rust-lang/rust/pull/58327 which is in FCP with disposition merge.

TimDiekmann commented 5 years ago

Yes, but you cannot assume, that every allocator which uses FFI uses ffi_const or ffi_pure.

gnzlbg commented 5 years ago

If your allocator is written in Rust code, you don't need that. That already works today.

Those attributes only help in the only case in which this does not work, which is if your excess computation is an unknown function.

TimDiekmann commented 5 years ago

Why do you assume every allocator is written in Rust or ffi_const or ffi_pure? The API should be designed for any case.

gnzlbg commented 5 years ago

The only thing I assume is that computing usable size is a side-effect free operation, which is the case for all allocators that we currently support.

TimDiekmann commented 5 years ago

The only thing I assume is that computing usable size is a side-effect free operation

I don't mean to offend you, but this is a false assumption IMO.

all allocators that we currently support.

Could you elaborate on this? Are there allocators we won't support?

gnzlbg commented 5 years ago

Basically, you are arguing that usable_size is worth it, because it would allow you to use glibc's XML output to compute an accurate lower bound for usable_size in Vec<T, A>, while not making Box<T, A> slow.

I disagree that this is useful:

The XML output of glibc is documented as something that should be used for "debug only", not production, is unstable and can change between versions, etc.
Vec<T, A> does not use the lower bound of usable_size for anything, so computing it in an accurate way does not solve any problem AFAICT (note that the upper bound would be returned by the _excess methods).

Could you elaborate on this? Are there allocators we won't support?

You are claiming that there exist allocators for which:

the usable_size method is useful enough for it to need to be part of the generic Alloc trait right now in the first RFC for an MVP of the Alloc trait.
usable_size is not readonly, that is, for which querying the usable_size writes to global data-structures.

AFAICT this is not true for any of the widely-used Alloc trait implementations that exist today. If you know of an impl for which this does not hold, please show it.

Sure, I can imagine that an usable_size implementation that writes to global memory or that is terribly slow is implementable. The thing I failing to see is why would anyone want to use such an allocator in practice. Or in other words, I am viewing this discussion from the point-of-view that people are going to implement Alloc for allocators that they actually want to use, not for every type for which the trait can actually be implemented.

SimonSapin commented 5 years ago

That's fixed with rust-lang/rust#58327

I might be missing something. jemalloc’s smallocx has the same effect of allocating so it’s not const or pure, is it?

gnzlbg commented 5 years ago

@SimonSapin jemalloc smallocx returns (ptr, allocation_size) and performs just like mallocx, and that can be used directly by the _excess methods. So by using that directly there is no issue.

jemalloc (also) and tcmalloc have a nallocx(size) -> alloc_size, that could be combined with mallocx in the implementation of the _excess methods. nallocx is #[c_ffi_pure], and when that's used, calling it and discarding the result is currently optimized away with LLVM - this approach is not very useful for jemalloc now that it has smallocx.

From the other widely-used allocators, the other similar API is malloc_usable_size(ptr*), which could also be used inside an _excess method implementation, since we have the pointer there, but not in an Alloc::usable_size(Layout) impl, since we don't have a pointer there. AFAICT, we can use pure for these as well (from Rust pov, we don't want these to be called if the result is not used). But malloc_usable_size is more intended for debugging, so... the implementor should decide whether using a debugging API inside the Alloc trait impl is a good idea or not. Even if it does not impact the perf of Box<T, A>, it could negatively impact the perf of Vec<T, A>, String, etc. which do use the excess size. This shows the difference between "a usable_size API" and a "useful usable_size API". If the API is not fast, then we risk people avoiding the excess (or the _excess methods) to work around that, so it being fast is a requirement for it to be useful.

TimDiekmann commented 5 years ago

Just to get this right, do you argue only against useable_size or also against the Excess API?

gnzlbg commented 5 years ago

I argue that we should remove usable_size, remove the _excess API, and make all allocating and resizing methods in Alloc return the actual allocation size (the upper bound that Alloc::usable_size currently returns). We should then just say that all methods taking an already-existing allocation Layout accept layouts with sizes in range [requested, actual].

The returned upper bound on the size only needs to be correct - it does not need to be accurate, which allows just returning the requested size for those allocators that do not support it, or for which only a slow computation exists (e.g. intended for debugging, but not as a hot path). This allows allocators that can provide an accurate allocator size quickly to do better, and don't make users avoid _excess or usable_size APIs with the fear that those would be too slow. This is zero-overhead for "reasonable" allocators, where by zero-overhead I mean that if you don't use the actual allocation size, like in Box, then you don't pay any run-time cost for it (you might pay some compilation time cost or similar though).

This fully solves the telemetry issue (which is IMO what adds the most value), because the allocator always knows the sizes that the user actually wanted to allocate (e.g., there is no simple way to call Alloc::usable_size and cache the return, and then use that, which would be impossible for an allocator to track).

This also avoids useless usable_size implementations for allocators with incompatible APIs (e.g. usable_size(ptr) vs usable_size(Layout)). Inside the allocation and resizing methods, both the pointer and the layout are available, so each allocator can pick what to do there.

This does not prevent allocators that have a very slow usable_size-like API, from providing a different handle, that uses it, and users can use that if they want to. This also does not prevent any users from using allocator specific apis like jemalloc_ctl to introspect their allocator for debugging or performance tuning, it just encourages generic code not do this (we could add generic introspection APIs in the future, but for them to be in the MVP and in the Alloc trait the motivation would need to be very strong).

The argument that Alloc::usable_size is still useful in presence of allocation methods that return the actual allocation size is very weak. The main argument supporting this seems to be that it returns the lower bound (because the alloc methods return the upper bound), but no examples of useful code that need the lower bound have been presented.

The argument that this would make allocators with a slow size computation that mutates global state unnecessary slow for things like Box<T, A> depends on the allocator implementation, e.g., these allocators could say that, for the purpose of the allocation and resizing methods, their query of the allocation size is indeed readonly, so that it is only executed if the result is actually used. Still, no useful allocators that behave this way have been shown, so while I can imagine that they exist, I have a harder time imagine that these are allocators that anybody would actually want to use.

The argument that this complicates the implementation of the Alloc trait for users is debatable. This change cuts the allocator API in half (due to the removal of the _excess methods - and the other issue #18 which removes another 4 methods makes it even smaller), and this is not a trait that's implemented all the time. Also, we can still provide default implementations for grow_in_place and shrink_in_place without Alloc::usable_size, that are as useful as those obtained by the defaultAlloc::usable_size implementation (that is, completely useless).

TimDiekmann commented 4 years ago

I like to come back to this issue. With #14 merged, and #13 would be the next logical step, the Excess API doubles the number of methods in AllocRef.

I did a test:

unsafe fn main_excess() -> Result<NonNull<i32>, AllocErr> {
    let allocation = Global.alloc_excess(Layout::new::<i32>())?;
    Ok(allocation.0.cast())
}

unsafe fn main() -> Result<NonNull<i32>, AllocErr> {
    let allocation = Global.alloc(Layout::new::<i32>())?;
    Ok(allocation.cast())
}

Both snippets results in

mov     edi, 4
mov     esi, 4
jmp     qword ptr [rip + __rust_alloc@GOTPCREL]

So replacing alloc, realloc etc. with their _excess variants seems fine (I was unsure on how good the compiler can optimize return values at this point). Regaring usable_size I see your point. This would mean, that the default implementation of shrink_in_place and grow_in_place would return Err(CannotReallocInPlace). Currently without implementing usable_size, that's the same behavior.

Another thing: _excess returns struct Excess(_, _). I'd say we just use plain tuples and ban the term excess all together.

	replace `x` with `x_excess`	close
@Amanieu	:+1:
@Ericson2314
@glandium
@gnzlbg
@Lokathor
@scottjmaddox	:+1:
@TimDiekmann	:+1:
@Wodann	:+1:

TimDiekmann commented 4 years ago

This is a comparison, on how this could be implemented: https://github.com/rust-lang/rust/compare/cd5441faf4e56d136d7c05d5eb55b4a41396edaf...TimDiekmann:excess

ordian commented 4 years ago

A use-case for usable_size that is not covered by the proposed and implemented change is when you want to implement something like https://github.com/servo/heapsize or https://github.com/servo/servo/tree/master/components/malloc_size_of in a allocator agnostic way. Imagine this:

#[derive(MallocSizeOf)]
struct Foo {
    bar: HashMap<String, Vec<u8>>,
    baz: Arc<Mutex<Db>>,
}

fn mem_used_by(foo: &Foo) -> usize {
    foo.malloc_size_of()
}

How would one implement this now?

Amanieu commented 4 years ago

Such functionality in inherently allocator-specific. If you know what allocator your code is using then you can call directly into e.g. jemalloc to get the size of an allocation.

Alternatively you could create an extension trait on AllocRef which adds a malloc_size_of method.

ordian commented 4 years ago

Such functionality in inherently allocator-specific. If you know what allocator your code is using then you can call directly into e.g. jemalloc to get the size of an allocation.

That's the problem

in a allocator agnostic way.

if it's implemented in a library, nothing stops users from defining a global allocator they want. Maybe I'm misunderstanding the issue here, but does it apply only to local allocators or global too?

Amanieu commented 4 years ago

The issue here only applies to local allocators.