Open CAD97 opened 2 years ago
I think we should be conservative with custom (non-global) allocators and just treat them as a library concept. Some allocators can be used for extremely specialized cases, such as allocating GPU memory, where you really want to maintain strict control over what allocations are made and not leave its interpretation up to the compiler.
Also as per #21 we will eventually want to deprecate the GlobalAlloc
trait and have #[global_allocator]
work with the normal Allocator
trait.
(I was just using GlobalAlloc
as a shorthand for "whatever #[global_allocator]
takes; everything applies identically to Allocator
.)
If Storage
and Allocator
are separate traits, and most things are parameterized over Storage
instead of Allocator
, then this gives an interesting split point, where the Allocator
could be the point where the "rust allocated object" magic happens (rather than just __rust_[de]alloc
), and Storage
is available for guaranteed behavior.
I do concur, however, that this is quite likely too subtle/implicit, and the magic barrier should just be in the definition of #[global_allocator]
, leaving code using explicit allocators to have normal semantics. It just seems unfortunate that a theoretical optimization using a custom allocator for shortlived values where possible, e.g. some Box<T, FramePool<'frame>>
[^1], could directly lead to worse performance as now the box allocation cannot be removed.
[^1]: It's a somewhat common optimization for frame-based applications (e.g. games) to have a memory pool to allocate in for objects which will be trashed before the next frame. This allows the allocation overhead in the memory pool to be a very simple scheme (e.g. a bump allocator) and dealloc to be a no-op, then resetting the pool to empty at the end of each frame (when the objects in it are assumed to have all been freed). Given a 'frame
lifetime for the period between GC cycles, this could be made 100% safe in Rust. The only limitation is then that this means any Box<T, FramePool<'frame>>
allocation will actually hit the frame pool, and so sometimes it'd actually be better to use Box<T, Global>
for very shortlived boxes where the compiler can optimize out the heap allocation.
Could we solve this by adding a wrapper struct OptimizedAllocator<T>(T) where T: Allocator
? It would implement the Allocator
trait and its methods would nondeterministically either:
In terms of semantics, that would probably work. The one thing I'd be careful of is to make sure that it's clear that #[global_allocator]
is always removable, and doesn't require a RemovableAllocator
wrapper (importantly, the lack of a wrapper doesn't make it nonremovable).
In terms of actual implementation, removable allocation is IIRC based on LLVM treating __rust_alloc
and related symbols as global removable malloc
equivalents, so actually making custom allocators removable is difficult. I just hope we don't accidentally prevent custom removable allocators.
The other potential option is some sort of alloc::removable_alloc(impl FnOnce(Layout) -> *mut ())
wrapper to mark a closure section as the removable bit. Semantically, that closure would choose between the built-in allocation and the provided closure.
It is absolutely the intent that calls to
__rust_alloc
/__rust_dealloc
can be removed. It happens in practice. As such, calls toGlobal.alloc
or other allocation directly onGlobal
, as well as the free functions, can be removed by inlining because they have no observable effect other than that of__rust_[de|re]alloc
. This isn't in question.#[global_allocator]
is defined such thatGlobal
allocations can be provided directly by the language or serviced by calling the#[global_allocator]
. Still completely[^1] uncontroversial.[^1]: Well, if we accept that 1) spurious calls to the
#[global_allocator]
are accepted, and 2) program execution on an FFI abstract machine can emit arbitrary sound allocation calls as well as arbitrary non-data-race reads and writes of memory allocated by such. And maybe we want#[global_allocator]
to disable the FFI machine's permission to do FFI allocation calls; otherwise a valid#[global_allocator]
implementation is#[cfg(FALSE)]
. So not completely uncontroversial at all.The question here is that if I have a concrete
GlobalAlloc
implementation and I callGlobalAlloc::alloc
on it, are its side effects considered observable?The simplest answer is that this code is specified to print to stdout; it directly calls a function which does so. Define
__rust_[de]alloc
as working in terms of the magical AM "rust allocated objects" rather than direct calls#[global_allocator]
and then everything else just falls out from that.But this is still not great. Say for example I have some sea-of-pointers style graph and I want to instrument how much allocation pressure it's putting out, but ignore allocations in the rest of the program. If calls to a normal implementation of
GlobalAlloc
can't be removed, then instrumenting doesn't actually give me an accurate picture, just an overestimate! This is because while the actual allocations can still be optimized out, the instrumentation remains there. Ideally, it'd be possible to use an observableGlobalAlloc
implementation to accurately[^2] track the allocations that are happening after optimization.[^2]: Allowing the accuracy to rely on Quality of Implementation is probably fine, but specifying it such that instrumented allocations can't be fully removed is extremely unfortunate.
While IIRC adding spurious (language-serviced) allocations is important for optimization, I think we can get away with only allowing explicit
GlobalAlloc
calls to be removed. If the actual allocation is known to be serviced by__rust_alloc
, then normal optimization rules apply; it's only the state and side effects of the allocator which fall under the more restrictive rules.I think I have a rough system that at the very least still justifies loop hoisting that I'm working on getting written down, but I wanted to make sure that the overall thought didn't get lost.