Refresh provenance of global allocator

RalfJung commented 1 year ago

It would seem nice if Miri could detect an error in the following code:

#![feature(allocator_api, slice_ptr_get)]

use std::alloc::{Allocator, Global, Layout, System};

#[global_allocator]
static A: System = System;

fn main() {
    let l = Layout::from_size_align(1, 1).unwrap();
    let ptr = Global.allocate(l).unwrap().as_non_null_ptr();
    unsafe {
        System.deallocate(ptr, l);
    }
}

That would basically reflect that the global allocator entry points are special magic and cannot be interchanged with directly calling the underlying allocator. (This doesn't catch all possible issue called by the magic of these symbols, e.g. it does not reflect that LLVM can replace heap allocations by stack allocations or even remove them entirely under some circumstances.)

To implement this we'll probably want the __rust_alloc shim to generate new provenance for the allocation (to distinguish it from the underlying allocation generated by System) and __rust_dealloc should undo that transformation. The details are pretty unclear though -- do we have two AllocId with the same address or do we use something more like Stacked Borrows to realize this "stacking" of allocations?

thomcc commented 1 year ago

do we have two AllocId with the same address or do we use something more like Stacked Borrows to realize this "stacking" of allocations?

Given how up in the air these semantics are, I'd suggest doing whatever is simplest that catches this would be best (that sounds like the former, but I have no knowledge of what that actually means, so I cannot say).

RalfJung commented 1 year ago

I think we'll want something like this either way, so I view this as a chance to explore different options and see what works better.

Right now I think reusing the Stacked Borrows stack (plus a special new kind of protector for "protected until dealloc is called") is easier than messing with AllocId generation.

RalfJung commented 1 year ago

It turns out that the semantics to justify the LLVM attributes are even wilder than I thought -- return-position noalias even applies to pointers that exist before the function call. Basically the returned pointer must not alias with any pointer that ever was or will be used in the program before or after the call.

I assume many custom global allocators will be incorrect under those semantics, and in practice that will work solely because LLVM doesn't know what happens on "the other side" of that allocation call.

RalfJung commented 1 year ago

There is a "missed UB" aspect here as well: if the allocator registered via #[global_allocator] returns memory that is also reachable from Rust via other means, and if everything happens via raw pointers, we won't currently detect any aliasing violation.

rust-lang / miri

Refresh provenance of global allocator #2686