Can explicit allocator calls be removed?

CAD97 commented 2 years ago

It is absolutely the intent that calls to __rust_alloc/__rust_dealloc can be removed. It happens in practice. As such, calls to Global.alloc or other allocation directly on Global, as well as the free functions, can be removed by inlining because they have no observable effect other than that of __rust_[de|re]alloc. This isn't in question.

#[global_allocator] is defined such that Global allocations can be provided directly by the language or serviced by calling the #[global_allocator]. Still completely[^1] uncontroversial.

[^1]: Well, if we accept that 1) spurious calls to the #[global_allocator] are accepted, and 2) program execution on an FFI abstract machine can emit arbitrary sound allocation calls as well as arbitrary non-data-race reads and writes of memory allocated by such. And maybe we want #[global_allocator] to disable the FFI machine's permission to do FFI allocation calls; otherwise a valid #[global_allocator] implementation is #[cfg(FALSE)]. So not completely uncontroversial at all.

The question here is that if I have a concrete GlobalAlloc implementation and I call GlobalAlloc::alloc on it, are its side effects considered observable?

struct TraceAlloc;
unsafe impl GlobalAlloc for TraceAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        println!("  alloc {layout:?}");
        alloc(layout)
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        println!("dealloc {layout:?}");
        dealloc(ptr, layout)
    }
}

unsafe { TraceAlloc.alloc(Layout::from_size_align(0, 1)?) };

The simplest answer is that this code is specified to print to stdout; it directly calls a function which does so. Define __rust_[de]alloc as working in terms of the magical AM "rust allocated objects" rather than direct calls #[global_allocator] and then everything else just falls out from that.

But this is still not great. Say for example I have some sea-of-pointers style graph and I want to instrument how much allocation pressure it's putting out, but ignore allocations in the rest of the program. If calls to a normal implementation of GlobalAlloc can't be removed, then instrumenting doesn't actually give me an accurate picture, just an overestimate! This is because while the actual allocations can still be optimized out, the instrumentation remains there. Ideally, it'd be possible to use an observable GlobalAlloc implementation to accurately[^2] track the allocations that are happening after optimization.

[^2]: Allowing the accuracy to rely on Quality of Implementation is probably fine, but specifying it such that instrumented allocations can't be fully removed is extremely unfortunate.

While IIRC adding spurious (language-serviced) allocations is important for optimization, I think we can get away with only allowing explicit GlobalAlloc calls to be removed. If the actual allocation is known to be serviced by __rust_alloc, then normal optimization rules apply; it's only the state and side effects of the allocator which fall under the more restrictive rules.

I think I have a rough system that at the very least still justifies loop hoisting that I'm working on getting written down, but I wanted to make sure that the overall thought didn't get lost.

Amanieu commented 2 years ago

I think we should be conservative with custom (non-global) allocators and just treat them as a library concept. Some allocators can be used for extremely specialized cases, such as allocating GPU memory, where you really want to maintain strict control over what allocations are made and not leave its interpretation up to the compiler.

Also as per #21 we will eventually want to deprecate the GlobalAlloc trait and have #[global_allocator] work with the normal Allocator trait.

CAD97 commented 2 years ago

(I was just using GlobalAlloc as a shorthand for "whatever #[global_allocator] takes; everything applies identically to Allocator.)

If Storage and Allocator are separate traits, and most things are parameterized over Storage instead of Allocator, then this gives an interesting split point, where the Allocator could be the point where the "rust allocated object" magic happens (rather than just __rust_[de]alloc), and Storage is available for guaranteed behavior.

I do concur, however, that this is quite likely too subtle/implicit, and the magic barrier should just be in the definition of #[global_allocator], leaving code using explicit allocators to have normal semantics. It just seems unfortunate that a theoretical optimization using a custom allocator for shortlived values where possible, e.g. some Box<T, FramePool<'frame>>[^1], could directly lead to worse performance as now the box allocation cannot be removed.

[^1]: It's a somewhat common optimization for frame-based applications (e.g. games) to have a memory pool to allocate in for objects which will be trashed before the next frame. This allows the allocation overhead in the memory pool to be a very simple scheme (e.g. a bump allocator) and dealloc to be a no-op, then resetting the pool to empty at the end of each frame (when the objects in it are assumed to have all been freed). Given a 'frame lifetime for the period between GC cycles, this could be made 100% safe in Rust. The only limitation is then that this means any Box<T, FramePool<'frame>> allocation will actually hit the frame pool, and so sometimes it'd actually be better to use Box<T, Global> for very shortlived boxes where the compiler can optimize out the heap allocation.

filip-hejsek commented 1 year ago

Could we solve this by adding a wrapper struct OptimizedAllocator<T>(T) where T: Allocator? It would implement the Allocator trait and its methods would nondeterministically either:

call the corresponding method of the wrapped allocator and possibly do some magic to the provenance of the returned pointer, or
provide a "virtual"/optimized-out allocation without calling anything.

CAD97 commented 1 year ago

In terms of semantics, that would probably work. The one thing I'd be careful of is to make sure that it's clear that #[global_allocator] is always removable, and doesn't require a RemovableAllocator wrapper (importantly, the lack of a wrapper doesn't make it nonremovable).

In terms of actual implementation, removable allocation is IIRC based on LLVM treating __rust_alloc and related symbols as global removable malloc equivalents, so actually making custom allocators removable is difficult. I just hope we don't accidentally prevent custom removable allocators.

The other potential option is some sort of alloc::removable_alloc(impl FnOnce(Layout) -> *mut ()) wrapper to mark a closure section as the removable bit. Semantically, that closure would choose between the built-in allocation and the provided closure.

rust-lang / wg-allocators

Can explicit allocator calls be removed? #101