Returning `NonNull<[u8]>` with full extent adds overhead to cases that don't need it.

thomcc commented 3 years ago

The current API always returning a slice with a size is inefficient in many allocators (those that don't have the information for free) if the caller doesn't care about the full size.

In many allocator implementations this is two calls. (There are some where it's not — but it being two calls is common enough that IMO it's worth considering in the API — and even in those where it's 1 call, it's usually a bit more expensive than providing just the pointer).

When implementing AllocRef for an allocator which can provide usable_size functionality you don't know if the caller needs it so, best practice is obviously going to be to provide it. The caller may very well care about the allocation extent (they may be a collection type like Vec<T>, String, etc), in which case this is hugely beneficial.

Unfortunately, the caller may not need it — Many never will. Some examples of this are: Box<T>/Rc<T>/Arc<T>/etc for Sized T, fixed-capacity collections, containers which are not yet AllocRef-aware...

Inside the AllocRef methods as they are now you cannot know which case you're in, and neither is really an obscure use case.

So IMO there should either be:

Be another variant of the functions which either returns NonNull<u8> or a NonNull<[u8]> that's not expected to be the full extent.
There should be a parameter added to the allocation functions which indicates if caller would like the full usable size when available.

(I'm hesitant about number 2, because I'd rather not add more checks into allocation when in many contexts global context it's unlikely to get inlined — it's also hard for me to imagine someone deciding between these at runtime).

Either way this would support both cases, allocator-aware collections, and uses like Box where there's really no benefit to querying the full extent, since it won't be used.

Amanieu commented 3 years ago

I don't think it's such a big issue because the allocator can just return the same length that was passed in if calculating the exact allocation length is too expensive.

thomcc commented 3 years ago

if calculating the exact allocation length is too expensive.

This isn't my concern. Calculating the exact length is usually not overly expensive, but it's certainly not free in most allocators.

Concretely:

jemalloc ends up doing an rtree lookup for this call (it stores allocation extents in a radix tree), which is slow enough that jemalloc provides a sized deallocation API (sdallocx) to avoid doing the same kind of rtree lookup on deallocation. (And in the past when I complained about AllocRef::dealloc requiring a layout, wanting to use sdallocx was cited as an example of a non-winapi case where we'd like to use that info).
Mimalloc is better here but still involves a bunch of pointer chasing (although it's somewhat likely that the malloc call would have put them into the cache).

The thing is, you get big wins from providing the extent to collections, since they can make more intelligent decisions around allocation (so there are a lot of times when it's worth it).

The problem is that the AllocRef impl has no idea if the caller wants that info or not — currently it has to assume they might (even though many allocations won't!).

Amanieu commented 3 years ago

I should have been clearer: allocation is extremely performance sensitive so I would only expect actual extents to be returned for allocators that are specifically designed with this API in mind. Other allocators like jemalloc which offer this via a separate call would simply always return the original length since obtaining the length requires a separate call which is what I meant by "too expensive".

thomcc commented 3 years ago

Huh. Well, that avoids the performance issue I mentioned, but seems to have a lot of downsides IMO. It also seems to defeat the point of these functions being allowed to return a larger-than-requested size...

Some of the downsides:

Allocator bindings where the underlying allocator can give this information need to to provide 2 impls of AllocRef — one that returns the full extent of the allocation and one that does not.
Users need to know whether or not the collection they're using can efficiently make use of excess extent, and use a extent-providing AllocRef iff that is the case.
Collections that only sometimes need full extents don't have the choice of only sometimes requesting them.

For example Vec::into_boxed_slice — The Box<[T], SomeAllocRef> has no way of storing the extent, so it would be pointless to query it in the call to AllocRef::shrink. One answer here might be "AllocRef::shrink shouldn't return full extents", but there are other cases w

Another example: abseil's btree collections will realloc to grow the root node until it reaches a specific size, however other nodes are allocated with an exact size. For those realloc calls, it could be beneficial to know the actual extent (to reduce the number of realloc calls made), but for other calls it wouldn't care.

All this is to say, I think it would be better if AllocRef were adjusted. Some of the possible fixes might be:

An additional parameter to allocation functions hinting that the caller can make good use of a larger-than-requested result.
An additional function to get the usable size (https://github.com/rust-lang/wg-allocators/issues/75). (Note: I think this is worth having regardless of how this issue is solved)
Two sets of allocation functions, one for when the caller can't use excess, and one for when they can.

Lokathor commented 3 years ago

option 3 seems simplest, because it can be defaulted to use the "no excess reported" methods for allocators that never allocate excess.

TimDiekmann commented 3 years ago

One option to consider is to invert the logic we had previously. Instead of alloc and alloc_excess we could use alloc and alloc_exact. That would be similar to Vec::grow. It would have to be discussed, which function defaults. With this, it would also be possible to add it in a backwards-compatible way (if we decide to use this approach and let alloc_exact using alloc).

On Nov 5, 2020, 04:51, at 04:51, Lokathor notifications@github.com wrote:

option 3 seems simplest, because it can be defaulted to use the "no excess reported" methods for allocators that never allocate excess.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/wg-allocators/issues/74#issuecomment-722124669

dead-claudia commented 9 months ago

Maybe {de,}allocate_sized methods could work here, defaulting to delegating to their unsized variants.

CAD97 commented 9 months ago

Since I don't see it recorded here yet, it should be noted that std's RawVec currently deliberately ignores the potentially provided excess, since most allocators never actually provide an excess (Global never will while it's based on GlobalAlloc) and respecting it requires extra handling[^1].

[^1]: The additional code would just be to divide the capacity by the generic type, but the code path gets monomorphized a ton and thus minor changes have outsized impact on compile perf.

This is lived experience that unless allocators actually do typically report excess allocated space that it kind of evens out back in favor of not bothering to respect the slack space unless using byte length directly (as opposed to item length).

rust-lang / wg-allocators

Returning `NonNull<[u8]>` with full extent adds overhead to cases that don't need it. #74