Capability-based Aliasing Model

arielb1 commented 8 years ago

This is a variant of the ACA model that has some advantages over it and is preferred by @ubsan.

Capabilities

The heart of the model is memory capabilities. In this model, each pointer or reference has a capability-set that describes which borrows it can alias with.

In more detail:

data AccessKind = Read | Write

data BorrowKey = BorrowKey {
      bkId :: Unique,
      bkLifetime :: Lifetime,
}

data Capability = Capability {
      root :: BorrowKey,
      memoryRange :: MemoryRange,
      accessKinds :: Set AccessKind
      capLifetime :: Lifetime,
}

Creation

Each borrow creates a BorrowKey with the appropriate lifetime.

The capability-set of the newly-created borrow is the union of the capability keyed off the new BorrowKey and the guarantor's capability set, restricted elementwise to the borrow's lifetime, memory range, and access kind.

Propagation

Capability sets are preserved when values are stored, loaded, or operated on in ways other than reborrows. I could not find a satisfying way of specifying that when InstCombine-able integer operations and/or partial writes are allowed (and to define what an InstCombine-able integer operation is), but that does not matter much in practice (e.g. the C spec is handwavy there).

Access Capability Set

The capability set of an access from Rust code is the accessing pointer/reference's capability set, plus a new freshly-created borrow key and capability that lasts only for the duration of the access.

Adding that borrow key has the effect of prohibiting data races.

The capability set of an access from foreign code (including intrinsics) is implementation-defined.

Garbage Collection

Borrow keys and capabilities whose lifetime has expired have no effect and can be ignored.

The aliasing rule

The actual aliasing rule is pretty similar to the ACA rule. UB through an aliasing violation occurs if there are 2 accesses (in no particular order) - the asserting access and the conflicting access, and a capability - the asserted capability - such that:

The memory locations accessed by the 2 accesses, and the asserted capability's memory range overlap.
At least one of the 2 accesses is a write.
The asserted capability is within the asserting access's capability set.
Both accesses are within the asserted capabilities' borrow key's lifetime, and after these capabilities were created.
There is no access capability within the conflicting access's capability set

Where an access capability is a capability within the conflicting access's capability set such that

The conflicting access's lifetime is within that capability's lifetime
The access kind is within capability's access kinds
The access capability's memory range contains the conflicting access's memory range.
The access capability has the same borrow key as the asserted capability

arielb1 commented 8 years ago

cc @nikomatsakis

arielb1 commented 8 years ago

That's the basic model. We still need some way of formalizing RVO shields. Writing an undef to function return values at function calls would do.

nikomatsakis commented 8 years ago

Thanks, I like this writeup. It seems to be close to something that I was separately toying around with. I had a few questions:

Capability sets are preserved when values are stored, loaded, or operated on in ways other than reborrows.

This seems like a very broad statement. I guess this means that e.g. if I were to transmute a &T to a Box<T>, that Box<T> would have no more capabilities than the &T (and hence if its dtor ran, that would be bad), right?

Presuming the same is true if I round-tripped a pointer through a usize?

I'm not quite sure how raw pointers interact here. For example, if I write:

fn foo(x: &mut i32) {
    let y: *mut i32 = x;
    ...
}

what happens with the x, is it still usable?

arielb1 commented 8 years ago

This seems like a very broad statement. I guess this means that e.g. if I were to transmute a &T to a Box<T>, that Box<T> would have no more capabilities than the &T (and hence if its dtor ran, that would be bad), right?

That would totally be bad - same as if you mutated the content through that &T pointer.

I'm not quite sure how raw pointers interact here.

Raw pointers act like references, except they also keep the capability set when you reborrow them. I don't have a satisfying way of handling other types (usizes, control-flow dependencies, etc.), but I don't think that matters much.

what happens with the x, is it still usable?

By the model, sure. x and y have the same capability-set, so that they can be alternatively accessed.

arielb1 commented 8 years ago

One problem with this model, shared with the ACA model, is that it does not adequately protect things behind nested references, as this code is perfectly valid:

static mut PTR: *mut u32 = ptr::null_mut();

fn collaborator() {
    unsafe { *PTR = 0; }
}

fn question(data: & &mut u32) -> u32 {
    let before = **data;
    collaborator();
    before + **data;
}

fn main() {
    let result = unsafe {
        let mut buf = 2;
        let mut ptr = &mut buf;
        PTR = ptr;
        question(&ptr);
    };
    println!("{}", result);
}

Because PTR and ptr share the same capability-set, the access in collaborator is perfectly valid. The 2 reads from data create only short-lived capabilites, which do not encompass the call to collaborator. This means that the compiler can't merge the reads in question.

This does not conflict with LLVM or anything, because noalias only applies to direct arguments, but might be annoying sometimes.

rust-lang / rust-memory-model