rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.94k stars 12.53k forks source link

Rewrite nomicon references section #27911

Closed Gankra closed 7 years ago

Gankra commented 9 years ago

https://doc.rust-lang.org/nightly/nomicon/references.html

This involves solving the incredibly difficult question of "what on earth are Rust's True Pointer Aliasing Rules".

CC @aturon @arielb1 @nikomatsakis @pnkfelix @sunfishcode

Gankra commented 9 years ago

It has been argued that the references section should be written in terms of lvalue paths. I believe this is what the borrow checker reasons in terms of, and is at very least a concrete concept. However this section does not want to simply model how the borrow checker thinks -- the entire point is that there needs to be a more fundamental model that the borrow checker models a subset of, but is fundamentally unable to model all of. This is the model unsafe code should be written against, and that the borrow checker can grow into if improved (e.g. nonlexical borrows).

Gankra commented 9 years ago

CC @RalfJung and co who are working on formally modeling Rust's semantics.

RalfJung commented 9 years ago

Indeed, we'll have some fun figuring this out ;-)

Speaking of which, "A reference cannot outlive its referent" is already something that's not actually enforced in my model. It's only when you use a reference that you have to prove that the referent is still alive, by showing that the lifetime of the reference is still active. As long as you don't use the reference, the model doesn't care whether it is valid.

I should also mention that "path" is not a thing in my formal model. I don't even have a stack. It's all about owning locations, or knowing the protocols that some locations are currently subject to. (Like, a shared borrow to a basic datatype follows the protocol that everybody can read it, and that multiple reads are guaranteed to deliver the same result. A mutable borrow to a basic datatype has the protocol that you can temporarily exchange your borrow for actual ownership of the referent, but until you change this back, it is impossible for the lifetime of the borrow to end.) The challenge will be to translate these protocols, and the even more implicit notions of separation/disjointness, back to something that makes sense when looking at surface Rust code...

aturon commented 9 years ago

@RalfJung

Speaking of which, "A reference cannot outlive its referent" is already something that's not actually enforced in my model. It's only when you use a reference that you have to prove that the referent is still alive, by showing that the lifetime of the reference is still active. As long as you don't use the reference, the model doesn't care whether it is valid.

This was essentially what the whole mem::forget/thread::scoped drama was about, and it's equally true of Rust the language: the type system ensures that lifetimes are not usefully reachable outside the scope they describe, but you can e.g. stash them in a leaked Rc cycle.

I think the "path" description in the current document is a bit of a dead end for what the book is ultimately trying to do -- describe the constraints on unsafe code. I do think that a more precise version of the path explanation would be a good way to explain borrow checking, though.

Parakleta commented 8 years ago

I assume this is the correct issue to add this question to. I've discovered through some experimentation and by reading #10488 that

let _ = Iron::new(hello_world).http("localhost:3000").unwrap();

for example causes the destructor to be run immediately (i.e. the end of the statement) and so joins the thread and blocks further execution, but

let _ = &Iron::new(hello_world).http("localhost:3000").unwrap();

extends the lifetime to the enclosing block.

I can understand that

let _listen = Iron::new(hello_world).http("localhost:3000").unwrap();

extends the lifetime to the enclosing block because that is the scope of the variable _listen, even though it is unused. What I don't understand is how the lifetime of let _ = <rvalue> differs from let _ = &<rvalue>. Is this a difference I should be relying on? What is the correct method to control the lifetime of unused/anonymous objects?

Gankra commented 8 years ago

Interesting! @eddyb any thoughts on this?

nikomatsakis commented 8 years ago

On Mon, Nov 09, 2015 at 06:05:20PM -0800, Parakleta wrote:

I assume this is the correct issue to add this question to. I've discovered through some experimentation and by reading #10488 that

let _ = Iron::new(hello_world).http("localhost:3000").unwrap();

for example causes the destructor to be run immediately (i.e. the end of the statement) and so joins the thread and blocks further execution, but

let _ = &Iron::new(hello_world).http("localhost:3000").unwrap();

extends the lifetime to the enclosing block.

I can understand that

let _listen = Iron::new(hello_world).http("localhost:3000").unwrap();

extends the lifetime to the enclosing block because that is the scope of the variable _listen, even though it is unused. What I don't understand is how the lifetime of _ = <rvalue> differs from _ = &<rvalue>. Is this a difference I should be relying on? What is the correct method to control the lifetime of unused/anonymous objects?

Yes, these are all different. It's kind of the intersection of two distinct rules. The mental model is roughly that the initializer is stored into a temporary which has the lifetime of the statement. When you do let <pat> = <initializer>, then, the pattern is matched against this temporary. It may move things out of the temporary and place them into fresh bindings, which then live as long as the block, but things it does not move get dropped along with the temporary.

So something like let (foo, _) = <expr> is roughly as if you did:

let foo;
{
    let temp = <expr>;
    foo = temp.0;
}

Note that _ is not an identifier, it is a pattern which means "ignore this value".

So in terms of your examples:

let _ = foo.unwrap() means: call unwrap and discard result (drops immediately).

let _x = foo.unwrap() means: call unwrap and store result into a variable called _x (drops when _x is dropped)

Meanwhile, orthogonally: &foo.unwrap() means "create a temporary stack slot" and store foo.unwrap() into it. Because it is being stored into a let binding, the lifetime of this temporary is extended to the enclosing block.

It's possible that the lifetime of the temporary we create when doing pattern matching in a let should be the enclosing block, rather than the let statement. This would be perhaps more analogous with the & rules. But I wonder if this would break existing code; it's hard to know. I'm not 100% sure why I didn't do it this way at the time, because I remember being annoyed that let _ = foo() and let _x = foo() were not equivalent. That said, there are many who believe they should not be; I can't find the issue now, but there was at one point specific code in trans to ensure that let _ = foo() would drop the result of foo() immediately.

Parakleta commented 8 years ago

The discussion in #10488 for the distinction between _ and _x makes sense to me, and I'm happy with the rationale that let _ = <rvalue> is essentially a no-op. I'm just confused by the & case. Does it mean that an & anywhere in an expression always creates a temporary that has the lifetime at least as long as the enclosing block? The statement &Iron::new()::http()::unwrap() doesn't but is assume that's because it's a statement and not an expression.

nikomatsakis commented 8 years ago

On Tue, Nov 10, 2015 at 12:39:02PM -0800, Parakleta wrote:

Does it mean that an & anywhere in an expression always creates a temporary that has the lifetime at least as long as the enclosing block?

No. The rules are more subtle than that. Temporaries usually live until the end of the current statement (the let, in this case) but if they appear in specific places, they are extended until the end of the block. Basically, if they appear in a place where it is unambiguous that they would be stored into the result of a let.

Hence, the following temporaries will last until end of block:

let <pat> = &<expr>
let <pat> = StructName { field: &<expr>, ... }

But (under current rules) this would not:

let <pat> = method(&<expr>);

See http://doc.rust-lang.org/reference.html#temporary-lifetimes for more details and more examples.

Parakleta commented 8 years ago

So the let _ = <rvalue>; statement lifetime and the let <pat> = &<rvalue>; statement lifetimes are different but my confusion comes from the idea (maybe I'm missing something though) that _ is a valid <pat> and &<rvalue> is also a valid <rvalue> so the statement let _ = &<rvalue> would match both rules?

Does this mean that let <pat> = &<rvalue> has higher priority than let _ = ... and should we assume this will always be true?

eddyb commented 8 years ago

@Parakleta let _ = <rvalue>; drops the RHS immediatelly but always evaluates it, so the rules for &<rvalue> still apply, even if the reference is dropped (which is a no-op because references are Copy).

Parakleta commented 8 years ago

Thanks, I just noticed the text "The compiler uses simple syntactic rules to decide" in the reference manual. This all seems a bit fragile, considering that the decision is made on Syntax rather than Semantics. For example, if I have let _ = nop!(&<rvalue>); the lifetime is unknown without knowing exactly the contents of the macro. It seems RFC66 (Issue #15023) has the potential to end up changing this anyway. I'll steer clear of let _ = &<rvalue> for now and stick with let _tmp = <rvalue> instead.

nikomatsakis commented 8 years ago

On Thu, Nov 12, 2015 at 04:06:03PM -0800, Parakleta wrote:

Thanks, I just noticed the text "The compiler uses simple syntactic rules to decide" in the reference manual. This all seems a bit fragile, considering that the decision is made on Syntax rather than Semantics. For example, if I have let _ = nop!(&<rvalue>); the lifetime is unknown without knowing exactly the contents of the macro. It seems RFC66 (Issue #15023) has the potential to end up changing this anyway. I'll steer clear of let _ = &<rvalue> for now and stick with let _tmp = <rvalue> instead.

It is true that you have to know the contents of the macro. However, the motivation for using syntax was precisely to make it easier to follow -- people were nervous about relying on inference to decide when a destructor runs, since inference algorithms might be overly conservative, or change over time.

steveklabnik commented 7 years ago

Moving this to https://github.com/rust-lang-nursery/nomicon/issues/7