Open kornelski opened 5 months ago
The counter-argument is that since this information is always available at compile-time, why shouldn't we pass it to the allocator? This is an important optimization for allocators since the size of an allocation may not be stored near the allocation itself.
C++ actually added sized deallocation functions in C++14 (https://open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3536.html) for this reason, but they have the problem of dealing with backwards-compatibility.
It's only available in compile-time for things like Box
and Vec
where type and count can be used to reconstruct same Layout
.
This is not universally true though, sometimes data structure has to carry Layout
. Typically size is still required for some other operations, but alignment is only for deallocation.
But IMO it's counting pennies. I can't imagine a situation where data structure is required to keep track of allocation alignment causing problems.
I know it's very useful for some allocators, but not for all of them. Specifically, for the very common case of executables using the System
allocator based on libc
malloc
/free
it's not an optimization, and only redundant unused information that makes Rust executables larger.
Memory overhead for collections isn't a problem (they have to track sizes anyway), but it's a problem of code bloat. Deallocation calls in Rust are ubiquitous due to unwinding and use of ?
, and it adds up. Even when the layout is a compile-time constant, it doesn't get optimized out. It still needs to be materialized and passed as an argument at run-time. Freeing a Box<T>
takes twice as many instructions as free()
.
I'd expect dyn Allocator
to always require Layout
. However, how about borrowing idea from std::mem::needs_drop
, and allowing allocator users to generate smaller drop code where possible. It could be something like an associated const A::NEEDS_LAYOUT
(or const fn where Self: Sized
? Or maybe just specialization on System
?) that would allow code such as:
fn drop(&mut self) {
if A::NEEDS_LAYOUT {
A::dealloc(self.ptr, Layout::new(self.size, etc))
} else {
A::layoutless_dealloc(self.ptr)
}
}
But your example passes layout into __rust_dealloc
call.
It would require Global
to be aware of NEEDS_LAYOUT
of whatever allocator is set as global one.
I'd expect
dyn Allocator
to always requireLayout
. However, how about borrowing idea fromstd::mem::needs_drop
, and allowing allocator users to generate smaller drop code where possible. It could be something like an associated constA::NEEDS_LAYOUT
(orconst fn where Self: Sized
? Or maybe just specialization onSystem
?) that would allow code such as:
Doesn't this just boil down to inlining A::dealloc
? That would allow the compiler to see that the layout is unused.
Regarding the system allocator we actually do make use of the alignment information for deallocation in some cases, for example on Windows: https://github.com/rust-lang/rust/blob/9aa232ecc7bb006a1fad404f437b049482021a3a/library/std/src/sys/pal/windows/alloc.rs#L210.
I still think that we should keep the layout for deallocation, but I could be convinced otherwise if you can show a significant size saving in a real program (e.g. ripgrep).
Regarding the system allocator we actually do make use of the alignment information for deallocation in some cases, for example on Windows: https://github.com/rust-lang/rust/blob/9aa232ecc7bb006a1fad404f437b049482021a3a/library/std/src/sys/pal/windows/alloc.rs#L210.
To be precise, it doesn't need alignment info per se. The issue is that it's like two allocators in one. The first simply wraps HeapAlloc
/HeapFree
the second additionally adds inline metadata. So it's only necessary to store one bit to tell it which dealloc to use (i.e. with or without the inline header). But admittedly there's no good place to store that bit, other than testing the alignment again.
layout.alloc
is almost always a compile-time constant, so even the Windows' two-allocators-in-one could potentially be optimized, if only the allocator could have some inlineable method at the deallocation call site.
Unfortunately, the special __rust_dealloc
indirection makes it hard ;(
The implementation of #[global_allocator]
relies on linking global symbols at the linker level, so the crates using Global
allocator don't know what they're calling.
__rust_dealloc
doesn't seem to be inlined, even with LTO. I've found an LLVM discussion on the same issue. They would also like to optimize allocator calls for the specific allocator implementation, but it seems the problem is that function attributes that make alloc/dealloc functions special are lost when inlining.
So I think for this to happen one of these needs to change. For example,
What if the custom global allocator wasn't an attribute you set anywhere, but an explicit crate dependency, where liballoc
and everything else depends on user-provider allocator-implementing crate? I think that would enable normal monomorphisation and cross-crate inlining. Probably tricky from ABI perspective to avoid recompiling liballoc
?
I guess there could be an extra LLVM pass that happens after allocation-aware passes, which removes the alloc-family attributes from __rust_alloc
functions and redoes inlining and constant propagation. That'd be expensive, but maybe ok if reserved for opt-level=3
?
Tangentially, I think it would be useful if allocators had an associated constant of flags describing their properties, such as whether they support alignment-changing realloc (posix realloc doesn't, jemalloc does), whether their realloc avoids memcpy (doing virtual memory schenanigans), whether deallocation does anything (vs. an arena that drops all allocations at once) and similar things.
Containers can then choose to optimize their behavior based on those properties.
Currently Rust requires passing
Layout
to the deallocator. However, Rust is frequently used with theSystem
allocator that wrapslibc::free
, which doesn't need this information. This means that often all the extra size and layout tracking on the Rust side is wasted effort. This is a tiny code bloat, but deallocation code paths are nearly everywhere, so it adds up.https://rust.godbolt.org/z/K3xTd9MEr
I wonder if this could be somehow made optional in the
Allocator
trait?