Closed Gankra closed 7 years ago
It has been argued that the references section should be written in terms of lvalue paths. I believe this is what the borrow checker reasons in terms of, and is at very least a concrete concept. However this section does not want to simply model how the borrow checker thinks -- the entire point is that there needs to be a more fundamental model that the borrow checker models a subset of, but is fundamentally unable to model all of. This is the model unsafe code should be written against, and that the borrow checker can grow into if improved (e.g. nonlexical borrows).
CC @RalfJung and co who are working on formally modeling Rust's semantics.
Indeed, we'll have some fun figuring this out ;-)
Speaking of which, "A reference cannot outlive its referent" is already something that's not actually enforced in my model. It's only when you use a reference that you have to prove that the referent is still alive, by showing that the lifetime of the reference is still active. As long as you don't use the reference, the model doesn't care whether it is valid.
I should also mention that "path" is not a thing in my formal model. I don't even have a stack. It's all about owning locations, or knowing the protocols that some locations are currently subject to. (Like, a shared borrow to a basic datatype follows the protocol that everybody can read it, and that multiple reads are guaranteed to deliver the same result. A mutable borrow to a basic datatype has the protocol that you can temporarily exchange your borrow for actual ownership of the referent, but until you change this back, it is impossible for the lifetime of the borrow to end.) The challenge will be to translate these protocols, and the even more implicit notions of separation/disjointness, back to something that makes sense when looking at surface Rust code...
@RalfJung
Speaking of which, "A reference cannot outlive its referent" is already something that's not actually enforced in my model. It's only when you use a reference that you have to prove that the referent is still alive, by showing that the lifetime of the reference is still active. As long as you don't use the reference, the model doesn't care whether it is valid.
This was essentially what the whole mem::forget
/thread::scoped
drama was about, and it's equally true of Rust the language: the type system ensures that lifetimes are not usefully reachable outside the scope they describe, but you can e.g. stash them in a leaked Rc cycle.
I think the "path" description in the current document is a bit of a dead end for what the book is ultimately trying to do -- describe the constraints on unsafe code. I do think that a more precise version of the path explanation would be a good way to explain borrow checking, though.
I assume this is the correct issue to add this question to. I've discovered through some experimentation and by reading #10488 that
let _ = Iron::new(hello_world).http("localhost:3000").unwrap();
for example causes the destructor to be run immediately (i.e. the end of the statement) and so joins the thread and blocks further execution, but
let _ = &Iron::new(hello_world).http("localhost:3000").unwrap();
extends the lifetime to the enclosing block.
I can understand that
let _listen = Iron::new(hello_world).http("localhost:3000").unwrap();
extends the lifetime to the enclosing block because that is the scope of the variable _listen
, even though it is unused. What I don't understand is how the lifetime of let _ = <rvalue>
differs from let _ = &<rvalue>
. Is this a difference I should be relying on? What is the correct method to control the lifetime of unused/anonymous objects?
Interesting! @eddyb any thoughts on this?
On Mon, Nov 09, 2015 at 06:05:20PM -0800, Parakleta wrote:
I assume this is the correct issue to add this question to. I've discovered through some experimentation and by reading #10488 that
let _ = Iron::new(hello_world).http("localhost:3000").unwrap();
for example causes the destructor to be run immediately (i.e. the end of the statement) and so joins the thread and blocks further execution, but
let _ = &Iron::new(hello_world).http("localhost:3000").unwrap();
extends the lifetime to the enclosing block.
I can understand that
let _listen = Iron::new(hello_world).http("localhost:3000").unwrap();
extends the lifetime to the enclosing block because that is the scope of the variable
_listen
, even though it is unused. What I don't understand is how the lifetime of_ = <rvalue>
differs from_ = &<rvalue>
. Is this a difference I should be relying on? What is the correct method to control the lifetime of unused/anonymous objects?
Yes, these are all different. It's kind of the intersection of two
distinct rules. The mental model is roughly that the initializer is
stored into a temporary which has the lifetime of the statement. When
you do let <pat> = <initializer>
, then, the pattern is matched
against this temporary. It may move things out of the temporary and
place them into fresh bindings, which then live as long as the block,
but things it does not move get dropped along with the temporary.
So something like let (foo, _) = <expr>
is roughly as if you did:
let foo;
{
let temp = <expr>;
foo = temp.0;
}
Note that _
is not an identifier, it is a pattern which means "ignore this value".
So in terms of your examples:
let _ = foo.unwrap()
means: call unwrap and discard result (drops
immediately).
let _x = foo.unwrap()
means: call unwrap and store result into a
variable called _x
(drops when _x
is dropped)
Meanwhile, orthogonally: &foo.unwrap()
means "create a temporary
stack slot" and store foo.unwrap()
into it. Because it is being
stored into a let
binding, the lifetime of this temporary is
extended to the enclosing block.
It's possible that the lifetime of the temporary we create when doing
pattern matching in a let
should be the enclosing block, rather than
the let
statement. This would be perhaps more analogous with the &
rules. But I wonder if this would break existing code; it's hard to
know. I'm not 100% sure why I didn't do it this way at the time,
because I remember being annoyed that let _ = foo()
and let _x = foo()
were not equivalent. That said, there are many who believe
they should not be; I can't find the issue now, but there was at one
point specific code in trans to ensure that let _ = foo()
would drop
the result of foo()
immediately.
The discussion in #10488 for the distinction between _
and _x
makes sense to me, and I'm happy with the rationale that let _ = <rvalue>
is essentially a no-op. I'm just confused by the &
case. Does it mean that an &
anywhere in an expression always creates a temporary that has the lifetime at least as long as the enclosing block? The statement &Iron::new()::http()::unwrap()
doesn't but is assume that's because it's a statement and not an expression.
On Tue, Nov 10, 2015 at 12:39:02PM -0800, Parakleta wrote:
Does it mean that an
&
anywhere in an expression always creates a temporary that has the lifetime at least as long as the enclosing block?
No. The rules are more subtle than that. Temporaries usually live
until the end of the current statement (the let
, in this case) but
if they appear in specific places, they are extended until the end of
the block. Basically, if they appear in a place where it is unambiguous
that they would be stored into the result of a let.
Hence, the following temporaries will last until end of block:
let <pat> = &<expr>
let <pat> = StructName { field: &<expr>, ... }
But (under current rules) this would not:
let <pat> = method(&<expr>);
See http://doc.rust-lang.org/reference.html#temporary-lifetimes for more details and more examples.
So the let _ = <rvalue>;
statement lifetime and the let <pat> = &<rvalue>;
statement lifetimes are different but my confusion comes from the idea (maybe I'm missing something though) that _
is a valid <pat>
and &<rvalue>
is also a valid <rvalue>
so the statement let _ = &<rvalue>
would match both rules?
Does this mean that let <pat> = &<rvalue>
has higher priority than let _ = ...
and should we assume this will always be true?
@Parakleta let _ = <rvalue>;
drops the RHS immediatelly but always evaluates it, so the rules for &<rvalue>
still apply, even if the reference is dropped (which is a no-op because references are Copy
).
Thanks, I just noticed the text "The compiler uses simple syntactic rules to decide" in the reference manual. This all seems a bit fragile, considering that the decision is made on Syntax rather than Semantics. For example, if I have let _ = nop!(&<rvalue>);
the lifetime is unknown without knowing exactly the contents of the macro. It seems RFC66 (Issue #15023) has the potential to end up changing this anyway. I'll steer clear of let _ = &<rvalue>
for now and stick with let _tmp = <rvalue>
instead.
On Thu, Nov 12, 2015 at 04:06:03PM -0800, Parakleta wrote:
Thanks, I just noticed the text "The compiler uses simple syntactic rules to decide" in the reference manual. This all seems a bit fragile, considering that the decision is made on Syntax rather than Semantics. For example, if I have
let _ = nop!(&<rvalue>);
the lifetime is unknown without knowing exactly the contents of the macro. It seems RFC66 (Issue #15023) has the potential to end up changing this anyway. I'll steer clear oflet _ = &<rvalue>
for now and stick withlet _tmp = <rvalue>
instead.
It is true that you have to know the contents of the macro. However, the motivation for using syntax was precisely to make it easier to follow -- people were nervous about relying on inference to decide when a destructor runs, since inference algorithms might be overly conservative, or change over time.
Moving this to https://github.com/rust-lang-nursery/nomicon/issues/7
https://doc.rust-lang.org/nightly/nomicon/references.html
This involves solving the incredibly difficult question of "what on earth are Rust's True Pointer Aliasing Rules".
CC @aturon @arielb1 @nikomatsakis @pnkfelix @sunfishcode