Closed gnzlbg closed 4 years ago
When not using the returned usable_size
, will the compiler be able to optimize away the code?
Do you think we should use Excess
instead of a tuple? Or generally remove Excess
?
With these changes it would be easier to write a wrapper implementation, because less methods would have to be overwritten.
When not using the returned usable_size, will the compiler be able to optimize away the code?
jemalloc has a function that does this (smallocx
returns a pointer and the usable size) at no extra cost, since the allocation function has to compute the size anyways. So if you are implementing Alloc
for a "nice" allocator, there is nothing to optimize away.
Some allocators can query the usable size, by doing a separate FFI call. Just because that can be done does not mean that one should do it, e.g., some of this functionality in some allocators is for "debugging only" and not very fast. Still, you could implement Alloc
twice, for two types, one doing the call, and one not doing the call.
Even when doing the FFI call, as mentioned in the OP, if the FFI function is read only, e.g., because it has the #[c_ffi_const]
or #[c_ffi_pure]
attributes or due to cross-lang-lto, then yes, if the result is not used, LLVM can, and does, eliminate the function call.
When implementing this functionality in Rust, e.g., by just returning layout.size()
(the requested size), then, LLVM can also remove that because Layout::size()
is readnone
. If LLVM stops doing that, then the optimizer would be so broken that none of this would matter anyways.
Do you think we should use Excess instead of a tuple? Or generally remove Excess?
I find Excess
to be a very confusing name. I would prefer something like #[repr(C)] struct Allocation(NonNull<u8>, usize);
or something like that to allow interfacing with jemalloc's smallocx_return_t
(https://github.com/jemalloc/jemalloc/blob/dev/src/jemalloc.c#L2946) directly (e.g. via just a transmute
), without having to convert that into another type.
The problem with using a tuple is that their layout is unspecified, and repr(Rust)
, so it would need to be converted internally into something else.
If this would be combined with #16, then the alloc
signature could also be changed to this:
#[repr(C)] struct Allocation(NonNull<u8>, Option<NonZeroUsize>);
unsafe fn alloc(layout: Layout) -> Result<Allocation, AllocErr>;
if the usable size is None
, the allocator does not handle the excess, or no excess is possible. Otherwise it returns Some(usable_size)
.
This could be too verbose however and I don't know, if this is an improvement, or how good the compiler can optimize the branch on if let Some(usable_size)
instead of just using the usize
in Allocation(NonNull<u8>, usize)
.
EDIT: We probably shouldn't do my proposal :smile:
I experimented around a bit with removing usable_size
and _excess
variants, here are my experiences:
Removing usable_size
limits the interface, as it may be useful to ask the allocator, what it can do with a specified layout. Additionally usable_size
is used to provide shrink_in_place
and grow_in_place
, which would have to be implemented by the implementer itself. IMO usable_size
should stay.
Removing _excess
variants is working fine, as long as you don't need alloc_zeroed
(and realloc_zeroed
and grow_in_place_zeroed
if we take #14 into account). If I call alloc_zeroed
with a size of 10 bytes, and the allocator tells me to use 64 bytes, I expect the trailing 54 bytes to be zeroed. However, I will only use my 10 bytes, as I don't need a growable collection. The allocator is forced to memset
54 bytes to 0 (assuming the memory allocated is uninitialized), which are not needed at all.
I neither like the name Excess
. As you mentioned, Allocation
could be used. Another option would be Block
, as it's describes a block of memory with starting address and size. Assuming, _excess
won't get removed (what I would advocate), and Excess
will be renamed, how to call _excess
variants? I think they shouldn't be renamed.
Summary (what I'd prefer):
usable_size
_excess
variantsExcess
to Allocation
or Block
If this is plausible, we may close this in favor of a new issue Rename Excess
.
A matter of taste, but I don't like tuple structs and would prefer struct Block { ptr: NonNull<u8>, usable_size: usize }
for clarity.
Removing usable_size limits the interface, as it may be useful to ask the allocator, what it can do with a specified layout.
The main use case of usable_size
right now is obtaining the real size of an allocation to be able to use it all, and this is covered by the _excess
methods.
AFAIK the only other situation in which it is useful is for debugging / introspection purposes, but this use case is in general better covered by allocator-specific APIs, which allow to query much more information than what "usable_size" is able to provide. Allocators can expose these APIs already, and we do so for jemalloc in jemallocator and in the jemalloc-ctl crates.
The question is more of whether it is worth it to have this API in the Alloc
trait, and which problems is it aimed to solve.
as long as you don't need alloc_zeroed (and realloc_zeroed and grow_in_place_zeroed if we take #14 into account). If I call alloc_zeroed with a size of 10 bytes, and the allocator tells me to use 64 bytes, I expect the trailing 54 bytes to be zeroed. However, I will only use my 10 bytes, as I don't need a growable collection.
How is this any different from calling alloc_zeroed
, and then calling usable_size
on the allocation ?If you do so, you should be able to use the full allocation, which requires the allocator to zero it anyways. Worse, usable_size
does not know whether the allocation came from alloc_zeroed
or some other method, so it would always need to zero the whole bucket when requested via alloc_zeroed
. If instead alloc_zeroed
returns the actual allocation size, it could return 10 in your example, while for the same requested layout, alloc
could return 64.
In practice, most allocators do not really care about the allocation size internally. The first thing they do is compute the size class, and then just use that. The worst case is when you have a memory page where you only use 1 byte, and that requires zeroing the whole page, but that only happens for "large" allocations, and the main optimization that alloc_zeroed
enables is that the allocator can request zeroed pages from the OS and avoid having to touch the memory to zero it, not because zeroing is expensive, which is not, but because accessing memory is expensive, and on a system with overcommit the OS would be in charge of, on the first access, to zero the memory, which is not necessary if its not accessed.
#[repr(C)] struct Allocation(NonNull<u8>, Option<NonZeroUsize>);
I wonder whether that's better or worse than just Allocation(NonNull<u8>, Option<NonZeroUsize>)
. I like the idea of letting the allocator tell you whether it can support returning the real allocation size, but then code that would look like this:
let a = alloc(layout)?
Self { ptr: a.0, size: a.1 }
would become
let a = alloc(layout)?
Self { ptr: a.0, size: if let Some(sz) = a.1 { sz } else { layout.size() } }
where we have an extra branch. Is this worth it? I don't know, seems to add little value, since the allocator could have just set a.1 = layout.size()
anyways.
I think that if the point of the Option
is to add a generic API for querying whether an allocator supports returning a larger allocation size. If that's the use case, maybe that can be addressed somehow else, e.g. via a method or associated bool const that communicates that. I don't know whether this problem is worth solving, but we can always add that later if we find use cases for it.
A matter of taste, but I don't like tuple structs and would prefer struct Block { ptr: NonNull
, usable_size: usize } for clarity.
Something like that would LGTM.
The main use case of
usable_size
right now is obtaining the real size of an allocation to be able to use it all, and this is covered by the_excess
methods.
usable_size
does not return the real size of an allocation, but a guarantee, how Layout
can be modified when passing it into realloc
or dealloc
. From the current Alloc
doc:
Returns bounds on the guaranteed usable size of a successful allocation created with the specified
layout
.In particular, if one has a memory block allocated via a given allocator
a
and layoutk
wherea.usable_size(k)
returns(l, u)
, then one can pass that block toa.dealloc()
with a layout in the size range [l, u].
This is also the guarantee, that grow_in_place
and shrink_in_place
rely on. So the real use case here is not to get the current allocation size, but the possible layout to be passed to dealloc
and realloc
. Every allocation with this specific layout have at least that bound returned by usable_size
. It's basically a hint, which should be cheap to be calculated. To get the real allocation size, you may use alloc_excess
. It is provided with a fallback to usable size, but an allocator can overwrite it. It does not necessarily match a call to usable_size()
.
The question is more of whether it is worth it to have this API in the
Alloc
trait, and which problems is it aimed to solve.
Yes, as only the allocator knows of the actual implementation and size hints.
Additionally, usable_size
is used to provide grow_in_place
and shrink_in_place
. Otherwise those methods have to be removed as well or have to be implemented every time (which is error prone).
How is this any different from calling
alloc_zeroed
, and then callingusable_size
on the allocation ?
Basically the same answer as above: alloc_zeroed_excess
may return a higher upper bound on the allocation size while usable_size
is not tied to the specific allocation.
If [...]
alloc_zeroed
returns the actual allocation size, it could return 10 in your example, while for the same requested layout,alloc
could return 64.
As I have requested 10 bytes, this is fine. usable_size
does not return the actual allocation, but a lower and upper bound on a specific layout. To continue with my example: If I pass a layout with a size of 10 bytes to usable_size
, it will always return at most 10 for the lower bound and at least 64 for the upper bound. Otherwise, the safety clause on the trait is not fulfilled.
In practice, most allocators do not really care about the allocation size internally. The first thing they do is compute the size class, and then just use that. The worst case is when you have a memory page where you only use 1 byte, and that requires zeroing the whole page, but that only happens for "large" allocations, and the main optimization that
alloc_zeroed
enables is that the allocator can request zeroed pages from the OS and avoid having to touch the memory to zero it, not because zeroing is expensive, which is not, but because accessing memory is expensive, and on a system with overcommit the OS would be in charge of, on the first access, to zero the memory, which is not necessary if its not accessed.
As we design a trait for the general case, I would not rely on those statements. Rust is also used embedded, where zeroing memory is maybe more expensive, depending on the allocator. "most allocators" is not enough for a trait on a system language IMO.
where we have an extra branch. Is this worth it? I don't know, seems to add little value, since the allocator could have just set
a.1 = layout.size()
anyways.
This is the reason I did an edit to the post and marked it as outdated :slightly_smiling_face:
usable_size does not return the real size of an allocation, but a guarantee, how Layout can be modified when passing it into realloc or dealloc.
Sure, but I do not see why we would need a method that provides this guarantee. We can just say that one is only allowed to pass all methods that need the Layout of an already existing allocation Layouts with the same alignment, and sizes in range [requested, actual]
, and that's it (all allocating and resizing methods would always return the actual size, and the requested one is always passed in).
In fact, usable_size
cannot do better for any allocator I know. For the actual size, many allocators provide a simple and fast way to compute it (e.g. jemalloc nallocx
), but for the lower bound, usable_size
just returns Layout.size()
because doing better is hard.
Consider an allocator with 2 size classes, 8 bytes, 16 bytes, and large allocations. Then when requesting 10 bytes, the allocator will compute quickly that it fits in the 16 bytes size class. However, for usable_size
it would also need to compute that the lower bound is 9
, so that usable_size
could correctly return [9, 16]
. For that the allocator needs to also query, what's the size class below 16, and return the smallest size that does not fit, which is something that allocators never actually need to do.
This information can be interesting, and the introspection methods of the allocators allow fetching the whole size class hierarchy and computing it, but that's often very slow (e.g. glibc's malloc_info
returns XML, which would need to be parsed).
If instead we just say that all methods modifying an allocation and taking its Layout work for Layouts in range [requested, actual]
, then we are golden. Methods that resize an allocation get a requested size, and return an actual size, so if you want to shrink 10 bytes to 9 bytes in place in our example above, you could try that, and if shrink_in_place
succeeds, it would return an actual size of 16, but since you requested 9, now you can pass [9, 16]
. This basically says that, passing 9
before doing the shrink_in_place
is UB, so that one could have only passed [10, 16]
, while in practice, passing 9
would have worked, but when would doing so be useful ? E.g. one Requests 10, and gets 16, one can either track 10 or 16, but when would one actually say I'm going to gamble that I can also deallocate this by passing 9 ? If one wants to do that, requiring a call to shrink_in_place
to succeed seems more principled.
Additionally, usable_size is used to provide grow_in_place and shrink_in_place. Otherwise those methods have to be removed as well or have to be implemented every time (which is error prone).
The default implementation of usable_size
is used to provide the default implementations of grow/shrink_in_place
, that's indeed true, but those implementations don't do anything and are kind of useless, so for usable_size
to be worth it one would need to at least show that doing this is useful.
For all allocators I know for which one could override usable_size
, e.g., TCMalloc and jemalloc, one would still want to override grow/shrink_in_place
, because the default ones would still be useless.
So... we can just provide default implementations of grow/shrink_in_place
that do nothing, and if some allocator can do better, they can override these. Whether the allocator calls some "usable_size" like API internally or not, its up to them, but for jemalloc we wouldn't do that, since xallocx
returns the actual allocation size already, and therefore doing so is not necessary.
One other aspect to consider is that if usable_size
exists, and we define the valid layout for describing an allocation as its [lower, upper] bounds, then we are also requiring that allocation sizes have to be independent of memory addresses (because usable_size(Layout)
doesn't take a memory address, as opposed to malloc_usable_size(ptr)
).
Consider an allocator with 2 size classes, 8 bytes, 16 bytes, and large allocations. Then when requesting 10 bytes, the allocator will compute quickly that it fits in the 16 bytes size class. However, for
usable_size
it would also need to compute that the lower bound is9
, so thatusable_size
could correctly return[9, 16]
. For that the allocator needs to also query, what's the size class below 16, and return the smallest size that does not fit, which is something that allocators never actually need to do.This information can be interesting, and the introspection methods of the allocators allow fetching the whole size class hierarchy and computing it, but that's often very slow (e.g. glibc's
malloc_info
returns XML, which would need to be parsed).If instead we just say that all methods modifying an allocation and taking its Layout work for Layouts in range
[requested, actual]
, then we are golden. Methods that resize an allocation get a requested size, and return an actual size, so if you want to shrink 10 bytes to 9 bytes in place in our example above, you could try that, and ifshrink_in_place
succeeds, it would return an actual size of 16, but since you requested 9, now you can pass[9, 16]
. This basically says that, passing9
before doing theshrink_in_place
is UB, so that one could have only passed[10, 16]
, while in practice, passing9
would have worked, but when would doing so be useful ? E.g. one Requests 10, and gets 16, one can either track 10 or 16, but when would one actually say I'm going to gamble that I can also deallocate this by passing 9 ? If one wants to do that, requiring a call toshrink_in_place
to succeed seems more principled.
Thanks for pointing this out, this is indeed a weird behavior! Another strange behavior is combining this with alloc_zeroed
: When allocating 10 zeroed bytes in your allocator, it's guaranteed to be zeroed for [0, 10)
. usable_size
however returns [0, 16)
and thus I expect [10, 16)
to be zeroed as well. Sure, this guarantee is not documented and in fact it is wrong to assume this, but this is not intuitive.
However, I don't like to assume, that every allocator can trivially return an excess, so I don't like to drop _excess
. Let's put together an example, where a user (me for simplicity) wants to implement an allocator without any assumption about the allocator logic to stay conservative:
alloc
and dealloc
, and everything works fine.alloc_zeroed
and realloc
as well.Vec<T, A>
and in Box<T, A>
, thus I want to query the excess of an allocation in 50% of my use caseVec<T, A>
but not for Box<T, A>
and for some reason I don't want to implement the logic twice.I see three ways to solve this problem:
Alloc
is a rare case, this is what testing is for)Box<T, A>
(major, performance loss)ExcessAlloc
. Downsides:
lib_alloc
)I think the latter is the best of both worlds. I'll play around with dropping usable_size
and ExcessAlloc
and will report if this makes sense as soon as I have finished.
Generally, a potential performance loss is not an option. If we find one use case, where performance is dropped, there will be a bunch of others.
By the way, even it turns out we drop the excess API, #21 gets even more unlikely, but this is another topic.
However, I don't like to assume, that every allocator can trivially return an excess,
The problem is that the assumption is true: every allocator can trivially compute the correct excess with zero-overhead and in O(1) space a time by just returning Layout::size()
.
An example would be an "accurate" implementation of usable_size
for glibc's malloc. We could call malloc_info
inside usable_size
, parse the returned XML, and return an accurate result, but then grow/shrink_in_place
, which use it, could become much slower than just always reallocating the memory.
As I want to use this allocator in Vec<T, A> and in Box<T, A>, thus I want to query the excess of an allocation in 50% of my use case
Always return the excess. Downsides: The allocator is unoptimized for Box<T, A> (major, performance loss)
This is incorrect. Box
does not use the excess returned, so the computation is optimized out. This is something that we already did discuss above.
Box does not use the excess returned, so the computation is optimized out.
https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487 claims:
One way to reduce the problem in half is to just make the alloc family return an Excess, but it turns out that a) the compiler does a bad job at optimizing it out when it’s not used
However, I don't like to assume, that every allocator can trivially return an excess,
The problem is that the assumption is true: every allocator can trivially compute the correct excess with zero-overhead and in O(1) space a time by just returning
Layout::size()
.An example would be an "accurate" implementation of
usable_size
for glibc's malloc. We could callmalloc_info
insideusable_size
, parse the returned XML, and return an accurate result, but thengrow/shrink_in_place
, which use it, could become much slower than just always reallocating the memory.
Sorry, I don't get your point. Sure, every allocator can return Layout::size()
and this makes sense for Box<T, A>
. But what if I want the allocator to parse the XML to use the actual excess in Vec<T, A>
?
This is incorrect.
Box
does not use the excess returned, so the computation is optimized out. This is something that we already did discuss above.
The compiler cannot optimize out the branch for Box
if the lookup may have side effects.
https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487 claims:
That's fixed with https://github.com/rust-lang/rust/pull/58327 which is in FCP with disposition merge.
Yes, but you cannot assume, that every allocator which uses FFI uses ffi_const
or ffi_pure
.
If your allocator is written in Rust code, you don't need that. That already works today.
Those attributes only help in the only case in which this does not work, which is if your excess computation is an unknown
function.
Why do you assume every allocator is written in Rust or ffi_const
or ffi_pure
? The API should be designed for any case.
The only thing I assume is that computing usable size is a side-effect free operation, which is the case for all allocators that we currently support.
The only thing I assume is that computing usable size is a side-effect free operation
I don't mean to offend you, but this is a false assumption IMO.
all allocators that we currently support.
Could you elaborate on this? Are there allocators we won't support?
Basically, you are arguing that usable_size
is worth it, because it would allow you to use glibc
's XML output to compute an accurate lower bound for usable_size
in Vec<T, A>
, while not making Box<T, A>
slow.
I disagree that this is useful:
Vec<T, A>
does not use the lower bound of usable_size
for anything, so computing it in an accurate way does not solve any problem AFAICT (note that the upper bound would be returned by the _excess
methods).Could you elaborate on this? Are there allocators we won't support?
You are claiming that there exist allocators for which:
usable_size
method is useful enough for it to need to be part of the generic Alloc
trait right now in the first RFC for an MVP of the Alloc
trait. usable_size
is not readonly
, that is, for which querying the usable_size
writes to global data-structures.AFAICT this is not true for any of the widely-used Alloc
trait implementations that exist today. If you know of an impl for which this does not hold, please show it.
Sure, I can imagine that an usable_size
implementation that writes to global memory or that is terribly slow is implementable. The thing I failing to see is why would anyone want to use such an allocator in practice. Or in other words, I am viewing this discussion from the point-of-view that people are going to implement Alloc
for allocators that they actually want to use, not for every type for which the trait can actually be implemented.
That's fixed with rust-lang/rust#58327
I might be missing something. jemalloc’s smallocx
has the same effect of allocating so it’s not const
or pure
, is it?
@SimonSapin jemalloc smallocx
returns (ptr, allocation_size)
and performs just like mallocx
, and that can be used directly by the _excess
methods. So by using that directly there is no issue.
jemalloc (also) and tcmalloc have a nallocx(size) -> alloc_size
, that could be combined with mallocx
in the implementation of the _excess
methods. nallocx
is #[c_ffi_pure]
, and when that's used, calling it and discarding the result is currently optimized away with LLVM - this approach is not very useful for jemalloc now that it has smallocx
.
From the other widely-used allocators, the other similar API is malloc_usable_size(ptr*)
, which could also be used inside an _excess
method implementation, since we have the pointer there, but not in an Alloc::usable_size(Layout)
impl, since we don't have a pointer there. AFAICT, we can use pure
for these as well (from Rust pov, we don't want these to be called if the result is not used). But malloc_usable_size
is more intended for debugging, so... the implementor should decide whether using a debugging API inside the Alloc trait impl is a good idea or not. Even if it does not impact the perf of Box<T, A>
, it could negatively impact the perf of Vec<T, A>
, String
, etc. which do use the excess size. This shows the difference between "a usable_size
API" and a "useful usable_size
API". If the API is not fast, then we risk people avoiding the excess (or the _excess
methods) to work around that, so it being fast is a requirement for it to be useful.
Just to get this right, do you argue only against useable_size
or also against the Excess API?
I argue that we should remove usable_size
, remove the _excess
API, and make all allocating and resizing methods in Alloc
return the actual allocation size (the upper bound that Alloc::usable_size
currently returns). We should then just say that all methods taking an already-existing allocation Layout accept layouts with sizes in range [requested, actual].
The returned upper bound on the size only needs to be correct - it does not need to be accurate, which allows just returning the requested size for those allocators that do not support it, or for which only a slow computation exists (e.g. intended for debugging, but not as a hot path). This allows allocators that can provide an accurate allocator size quickly to do better, and don't make users avoid _excess
or usable_size
APIs with the fear that those would be too slow. This is zero-overhead for "reasonable" allocators, where by zero-overhead I mean that if you don't use the actual allocation size, like in Box
, then you don't pay any run-time cost for it (you might pay some compilation time cost or similar though).
This fully solves the telemetry issue (which is IMO what adds the most value), because the allocator always knows the sizes that the user actually wanted to allocate (e.g., there is no simple way to call Alloc::usable_size
and cache the return, and then use that, which would be impossible for an allocator to track).
This also avoids useless usable_size
implementations for allocators with incompatible APIs (e.g. usable_size(ptr)
vs usable_size(Layout)
). Inside the allocation and resizing methods, both the pointer and the layout are available, so each allocator can pick what to do there.
This does not prevent allocators that have a very slow usable_size
-like API, from providing a different handle, that uses it, and users can use that if they want to. This also does not prevent any users from using allocator specific apis like jemalloc_ctl
to introspect their allocator for debugging or performance tuning, it just encourages generic code not do this (we could add generic introspection APIs in the future, but for them to be in the MVP and in the Alloc trait the motivation would need to be very strong).
The argument that Alloc::usable_size
is still useful in presence of allocation methods that return the actual allocation size is very weak. The main argument supporting this seems to be that it returns the lower bound (because the alloc methods return the upper bound), but no examples of useful code that need the lower bound have been presented.
The argument that this would make allocators with a slow size computation that mutates global state unnecessary slow for things like Box<T, A>
depends on the allocator implementation, e.g., these allocators could say that, for the purpose of the allocation and resizing methods, their query of the allocation size is indeed readonly
, so that it is only executed if the result is actually used. Still, no useful allocators that behave this way have been shown, so while I can imagine that they exist, I have a harder time imagine that these are allocators that anybody would actually want to use.
The argument that this complicates the implementation of the Alloc trait for users is debatable. This change cuts the allocator API in half (due to the removal of the _excess
methods - and the other issue #18 which removes another 4 methods makes it even smaller), and this is not a trait that's implemented all the time. Also, we can still provide default implementations for grow_in_place
and shrink_in_place
without Alloc::usable_size
, that are as useful as those obtained by the defaultAlloc::usable_size
implementation (that is, completely useless).
I like to come back to this issue. With #14 merged, and #13 would be the next logical step, the Excess
API doubles the number of methods in AllocRef
.
I did a test:
unsafe fn main_excess() -> Result<NonNull<i32>, AllocErr> {
let allocation = Global.alloc_excess(Layout::new::<i32>())?;
Ok(allocation.0.cast())
}
unsafe fn main() -> Result<NonNull<i32>, AllocErr> {
let allocation = Global.alloc(Layout::new::<i32>())?;
Ok(allocation.cast())
}
Both snippets results in
mov edi, 4
mov esi, 4
jmp qword ptr [rip + __rust_alloc@GOTPCREL]
So replacing alloc, realloc
etc. with their _excess
variants seems fine (I was unsure on how good the compiler can optimize return values at this point).
Regaring usable_size
I see your point. This would mean, that the default implementation of shrink_in_place
and grow_in_place
would return Err(CannotReallocInPlace)
. Currently without implementing usable_size
, that's the same behavior.
Another thing: _excess
returns struct Excess(_, _)
. I'd say we just use plain tuples and ban the term excess
all together.
replace x with x_excess |
close | |
---|---|---|
@Amanieu | :+1: | |
@Ericson2314 | ||
@glandium | ||
@gnzlbg | ||
@Lokathor | ||
@scottjmaddox | :+1: | |
@TimDiekmann | :+1: | |
@Wodann | :+1: |
This is a comparison, on how this could be implemented: https://github.com/rust-lang/rust/compare/cd5441faf4e56d136d7c05d5eb55b4a41396edaf...TimDiekmann:excess
A use-case for usable_size
that is not covered by the proposed and implemented change is
when you want to implement something like https://github.com/servo/heapsize or https://github.com/servo/servo/tree/master/components/malloc_size_of in a allocator agnostic way.
Imagine this:
#[derive(MallocSizeOf)]
struct Foo {
bar: HashMap<String, Vec<u8>>,
baz: Arc<Mutex<Db>>,
}
fn mem_used_by(foo: &Foo) -> usize {
foo.malloc_size_of()
}
How would one implement this now?
Such functionality in inherently allocator-specific. If you know what allocator your code is using then you can call directly into e.g. jemalloc to get the size of an allocation.
Alternatively you could create an extension trait on AllocRef
which adds a malloc_size_of
method.
Such functionality in inherently allocator-specific. If you know what allocator your code is using then you can call directly into e.g. jemalloc to get the size of an allocation.
That's the problem
in a allocator agnostic way.
if it's implemented in a library, nothing stops users from defining a global allocator they want. Maybe I'm misunderstanding the issue here, but does it apply only to local allocators or global too?
The issue here only applies to local allocators.
The
Alloc::usable_size
method is, in general, used like this:This has two big downsides.
The main issue is that
usable_size
makes it impossible to obtain meaningful telemetry information about which allocations the program actually wants to perform, resulting in a program that is essentially "un-tuneable" from the allocator perspective.For example, if one were to dump allocation statistics with jemalloc for a program that uses
usable_size
, this program would appear to be using the allocator's size classes perfectly, independently of how bad these are.The other issue is that code that uses
usable_size
computes the allocation size classes twice, once in theusable_size
call, and once in the call toalloc
(alloc
just gets a layout, it then needs to find the size class appropriate for the layout, which is exactly whatusable_size
already did, butalloc
does not know).So IMO we should remove
usable_size
, and instead, make the allocation functions always return theusable_size
, such that code like the above is now written as:This makes sure that allocators are always able to print accurate statistics of the requested allocators, which can then be used to tune allocator performance for particular applications, and allows allocators to perform the size class computation only once (e.g. jemalloc's
smallocx
). Most ABIs do support two return registers in their calling convention, such that a pointer and a size can be always returned in most platforms without incurring any extra costs (e.g. like having to pass this via memory).Allocators that do not support size class computation (e.g. glibc's
malloc
), can just return the requested layout size in Rust code, which can be inlined. If the usable size of the return is not used, inlining can remove its computation for allocators that do need to perform an extra call to obtain it, as long as that function isreadonly
(e.g. jemalloc'smallocx+nallocx
).This means that all
Alloc::..._excess
functions can be removed, since the normal functions do their job by default.closes #13