assume no "pointer arithmetic" performed on an `&T`

nikomatsakis commented 8 years ago

It is a very powerful optimization for us to assume that the function foo here cannot access a or c:

let x = Foo { a: 22, b: 33, c: 44 };
foo(&x.b);

@pcwalton reports that by default in C or C++ -- or at least in LLVM -- it would be legal to have a function like this:

void foo(unsigned *x) {
    x[1] += 1;
}

and then to invoke it on foo(&x.b) in a similar scenario.

Telling LLVM to assume that Rust functions don't play such games appears likely to have a significant impact on our ability to optimize. I'll report some numbers when @pcwalton is more sure of them. :)

strega-nil commented 8 years ago

That makes me very uncomfortable. You have an object of the correct size, and you're accessing it through a pointer which is safely derived.

eternaleye commented 8 years ago

Hm; that implies either a much stricter handling of pointers in unsafe {} than the C model, or that this assumption must be disabled at function scope for functions containing unsafe {} (perhaps even transitively).

In particular, in C, it's defined so long as the (offset) pointer is still part of the same "object" (read: allocation), while this model basically introduces a subtyping relation on "objects" (in the C sense).

One pattern this would break hard is container_of() - a pattern I suspect is used in implementing intrusive collections in Rust, not just C.

container_of() not only makes use of offset-pointer-inside-same-allocation being valid, it does so wiith a negative offset.

Amanieu commented 8 years ago

I use a variant of container_of in intrusive-collections.

nikomatsakis commented 8 years ago

This code which does not cross function boundaries does a very similar thing:

let mut s = Struct { x: 0, y: 1, z: 2 };
let p: *mut i32 = &mut s.x;
*p.offset(1) += 1;

And yet I feel differently at fn boundaries. I think this is because I think about functions in terms of permissions. Having a safe function like this:

pub fn foo(p: &mut i32) {
    let q: *mut i32 = p;
    *q.offset(1) += 1;
}

seems completely and obviously wrong. It exceeds the permissions that was given to it (which applied to *p, not *(p+1), and hence (in my mind) is an illegal function.

I think before we are going to be able to settle any of these kinds of questions, we have to lay out the higher-level views that we are coming from and try to reach some agreement there.

strega-nil commented 8 years ago

I don't think function boundaries should do anything (beyond create an extra scope), for UB purposes. If people are pulling things out into functions, their code must not become undefined, IMHO.

burdges commented 8 years ago

I imagine this concerns only repr(Rust) given this repository and that it'd break C stuff.

Could these container_of style applications be dealt with by borrowing &x along with &x.b? I'm thinking along the lines of owning_ref or maybe some phantom &x?

eternaleye commented 8 years ago

I'll note that I want this to be enabled - I'm just not sure how viable it is given existing code.

@Amanieu: In the specific case of intrusive containers, how viable would it be to, instead of having Adaptor<Link> use an associated type for Container, instead be implemented on Self = Container? In that case, get_container simply becomes self. As far as I can tell, that looks workable (I've only looked at Cursor though).

Amanieu commented 8 years ago

@eternaleye That doesn't work since a Container may contain an arbitrary number of links Links. This is the case when a single object is a member of multiple intrusive collections.

Amanieu commented 8 years ago

After looking through my code a bit, I don't think it will be affected by this since it uses *const Link everywhere instead of &Link. However it is treading a very fine line.

RalfJung commented 8 years ago

I would argue this is yet another instance of how closely we "trust types", just with opposite "polarity". In the blog post, Niko discussed whether inside the body of a function that takes v: &usize as argument, that function can trust the target of v not to be mutated even by unknown code. Here, we discuss whether the caller of such a function that just takes a v: &mut usize as argument can rely on the function not to touch any memory except for globally reachable memory and the memory range indicated by value and type of its argument. In some sense, these are two sides of the same medal. (In fact, if we translate the function type to a separation logic Hoare triple, the answer to both questions would be "Yes, you can trust": A separation logic triple would give the function the permission to access v and rely on no conflicting accesses to happen, while also promising to the context that nothing else will be needed or touched by the function. This is called the "frame rule".)

A function that takes a v: &mut usize and accesses more than just the 4/8 bytes covered by the usize itself, is very much like a function that takes a raw pointer and turns it into a Box<T>: The function actually relies on more than is guaranteed by the type system (i.e., by the general contract that covers all safe interfaces). The function is just inherently unsafe to call, and should be marked as such.

It seems to me that whether the compiler is allowed to rely on this assumption ("functions that need more than is given by their arguments are unsafe") for optimizations again depends on whether this is an "unsafe context" or not -- generally, we want such optimizations as there is some potential for producing faster code, but for people writing unsafe code this may be a very dangerous prospect.

RalfJung commented 2 years ago

I am pretty sure that this is not allowed in C -- see e.g. this formalization of the C object memory model, which clearly forbids the original code.

In Rust, Stacked Borrows proposes to forbid this. Some related issues:

I will close this issue since the above seem to cover the relevant aspects.

rust-lang / rust-memory-model

assume no "pointer arithmetic" performed on an `&T` #19