Closed RalfJung closed 5 years ago
Not sure of it. It's the "disposing of &mut
pointers" problem all over again, except you force the pointer to live long enough.
example:
fn foo(var: &mut u32) -> u32 {
let ptr: *mut u32 = var;
let ref_ = &mut *var;
mk_eqty!(ref_, var);
*ref_ = 2;
unsafe { *ptr }
}
The mk_eqty
would seem to coerce the borrow to last for the entire function, and prohibit the read from ptr
.
It's the "disposing of &mut pointers" problem all over again
What are you referring to here?
I also don't see the relation between your example and RefMut
. What you are doing here (if I understand correctly) is moving the (non-duplicable) ownership associated with var
over to ref_
, but not in a way that the type system could understand. In contrast, the reference in RefMut
is only "invalidated prematurely" if RefMut
is dropped, in which case ownership moves back to the RefCell
(and whoever is to call borrow
/borrow_mut
next). In a single function, I would write this as something like
fn foo(var: &mut u32) -> u32 {
let ref = { let ptr : *mut u32 = var; let ref_ = &mut *var; mk_eqty!(ref_, var); drop(var); ref_ };
*ref = 2;
*ref
}
But drop
of an &mut
reference is a no-op, so it is equivalent. If you inline the drop
, you get:
fn foo(var: &mut u32) -> u32 {
let ptr: *mut u32 = var;
let ref_ = &mut *var;
mk_eqty!(ref_, var);
*ref_ = 2;
{ let _x = ref_; }
unsafe { *ptr }
}
You can also have a write-sided version of this, which is more likely to be misoptimized (because abstract-interpretation optimizations move writes backward and reads forward).
fn foo_will_break(var: &mut u32) -> u32 {
let ptr: *mut u32 = var;
let ref_ = &mut *var;
mk_eqty!(ref_, var);
let result = *ref_;
{ let _x = ref_; }
unsafe { *ptr = 2; }
result
}
If the compiler DSEs the write to _x
, and then NRVOs (both optimizations that we want to do), this will break.
drop
is a no-op operationally, but in terms of the ownership available to the function, it makes a huge difference. In my code above, the block after let ref =
is a weird way of writing an identity function and making sure we use the full lifetime of var
for ref
. This makes sure we actually do an ownership transfer from var
to ref
, like RefMut
does when it is dropped. All of this is just my attempt of communicating the ownership transfer going on in RefMut
via code, which obviously failed. ;-)
What I am saying in my first post is: The semantic type RefMut<'a, T>
has strictly more ownership than its syntactic description indicates. There actually is a &'a T
there (actually, it's some superlifetime of 'a
). Adding safe code that performs read-only operations on (the internals of) RefMut
cannot break the module.
This is in contrast to Ref<'a, T>
, where the semantic type is incomparable to the syntactic appearance (it's neither more nor less ownership), and where one can easily write safe code performing read-only operations on RefMut
which can lead to UB. For example, this is not safe:
impl<'b, T> Ref<'b, T> {
fn get_ptr(self) -> &'b T {
self.value
}
}
The latter is the observation I brought up in a mail to Niko, which is the source of refcell_ref.md
. I can't see any similar issue with RefMut
, and that's what this issue is about.
Unfortunately, I don't understand what this observation has to do with the code you are bringing up. There's no compiler transformations involved here, I am just looking at the flow of ownership. Here, "ownership" is a fairly wide term, it refers to anything you can "own" in Rust -- in particular, you can "own a mutable borrow of a T
", which means you have sth. of type &mut T
, just like "owning a T
" is having sth. of type T
. A type can own more than is syntactically apparent, for example, Vec
owns the heap allocation it refers to even though there's just a raw pointer there (well, there's a Unique
in a RawVec
, but you get the point). And Ref
goes to show that a type can also own less than is syntactically apparent.
The problem is that, after inlining, foo_will_break
is equivalent to
fn refcell_example(r: &mut RefCell<u32>) -> u32 {
let ref_ = r.borrow_mut();
let result = *ref_;
drop(ref_);
*r.borrow_mut() = 2;
return result;
}
An &'a mut T
controls all access to T
until 'a
ends. If you forget
it, that means that nobody can access T
until 'a
ends.
My chain-based model allows you to "expose" a reference and remove its control over accesses, but drop::<&mut T>
is a no-op, not an expose.
I would argue there is a very fundamental difference between refcell_example
and foo_will_break
: The latter removes a layer of abstraction. Also, the raw pointer ptr
fails to capture the way r
owns its content.
If we have a semantics that reasons by looking at types (e.g. to exclude aliasing), I am not surprised that doing such a type-erasing kind of inlining breaks programs.
In other words, the unsafe code in RefCell
is safe because of careful tuning of all the types exported from that module. Hence optimizations that "smear" this border must be fairly conservative, because they can quickly fail to respect the abstraction that has been so carefully established.
But the example does not mess with types. It only messes with privacy. I am a little worried about introducing privacy to the aliasing rules.
Privacy is the one mechanism in Rust which makes abstraction work. Without privacy, the entire story of "ascribing additional meaning to types" (and therefore the entire story of hiding unsafety behind an abstraction barrier) breaks down.
@RalfJung
Sure. But UB does not care about which additional meaning you ascribe to your types - if it did, then Ref
would work too.
In your way, the aliasing rules would have to say something like "if you drop a struct containing an private &mut
field, the &mut
field is then counted as destroyed". And I don't want that.
@RalfJung I do not see the difference between Ref
and RefMut
. Either way, the reference stored in the value
field is not necessarily valid for the entirety of its lifetime. Once the borrow
is dropped, it is no longer conceptually valid.
The example I gave in the original Tootsie Pop model post seems to apply. Imagine that we had this function:
impl<'b> RefMut<'b, u32> {
pub fn broken(self) {
let x = *self.value;
mem::drop(self.borrow);
use(x);
}
}
It seems to me that, at least under an aggressive set of rules, the compiler would be within its rights to reorder those statements:
impl<'b> RefMut<'b, u32> {
pub fn broken(self) {
mem::drop(self.borrow);
let x = *self.value;
use(x);
}
}
After all, self.value
should be a unique reference not aliased anywhere else. Dropping self.borrow
is a normal, safe function call, so we'd like to be able to assume that it cannot have effects exceeding those of a normal, safe function -- which wouldn't be able to access self.value
(and certainly not invalidate it early).
(And of course to make this all more deadly we can think of RefCell
as a true mutex instead of one specialized to a single thread.)
@nikomatsakis You are right. I missed the fact that it's the destructor of BorrowRefMut
, not the one of RefMut
, which decrements the counter. As written, both Ref
and RefMut
are dangerous.
However, if we instead consider an alternative where BorrowRef
is "inlined" into Ref
, and BorrowRefMut
is "inlined" into RefMut
, then RefMut
would be okay (as far as I can see), but Ref
would not. This is because the "too long" lifetime in RefMut
is attached to a mutable reference and hence it cannot just be passed to a client, whereas Ref
does the same with a Copy
type.
@RalfJung
However, if we instead consider an alternative where BorrowRef is "inlined" into Ref, and BorrowRefMut is "inlined" into RefMut, then RefMut would be okay (as far as I can see) but Ref would not. This is because the "too long" lifetime in RefMut is attached to a mutable reference and hence it cannot just be passed to a client, whereas Ref does the same with a Copy type.
Hmm, interesting. I'm not sure if it is true that RefMut
with a &'b mut u32
is OK. For example, why can't we rewrite this function to create the same scenario (IOW, we can break it with safe code inside the privacy barrier, no?):
impl<'b> RefMut<'b, u32> {
pub fn broken(self) {
let x = *self.value;
mem::drop(self); // NB, not self.borrow
use(x);
}
}
Now the version of the code which dereferences after the drop
is not well-typed, so this transformation should not be legal.
Consider
fn foo(v: Vec<i32>) {
let x = v[0];
mem::drop(v);
use(x);
}
Sure the compiler is not allowed to move the v[0]
after the drop
.
@RalfJung that seems (at least potentially) different to me; the Vec
(conceptually) owns its internal memory, whereas the RefMut
has a borrowed pointer as its field (and that borrowed pointer thus refers to memory that outlives the RefMut
)
Consider this:
struct Foo<'a>(&'a i32);
fn foo<'a>(f: Foo<'a>) {
use(f.0)
}
fn foo_prime<'a>(f: Foo<'a>) {
let x = f.0;
drop(f);
use(x);
}
Basically, foo could be rewritten to foo_prime, which is okay for Foo, but not for RefMut.
that seems (at least potentially) different to me; the Vec (conceptually) owns its internal memory, whereas the RefMut has a borrowed pointer as its field (and that borrowed pointer thus refers to memory that outlives the RefMut)
I disagree. &mut
is not duplicable, and hence it also expresses some form of ownership. That ownership is weaker than "full ownership", but that's a different story. The destructor of RefMut
consumes its argument, and it moves the ownership it gets (in particular, the &mut
), elsewhere (namely, it moves it back to the RefCell
). This is like Vec
returning ownership back to the allocator.
@mystor Foo
is very different because the field in question, &i32
, is Copy
. So one can argue that in foo_prime
, drop
gets one copy of the field and x
gets the other one. I do agree with @nikomatsakis that Ref
is bad even when BorrowRef
is "inlined"; we're talking about the case of non-Copy
references.
@RalfJung I see your point. Very interesting! Have to think about it, but what you're saying makes sense to me.
I just noticed though that this is not stable under inlining -- if the destructor is inlined in your RefMut::broken
, there's no sign any more that an ownership transfer happened. I wonder if similar cases can be constructed not involving the destructor...
Or maybe that's not a problem. After all, in the case of RefCell
, nothing else can access the data between the call to the destructor and the return, as no unknown code is run. And in the case of RwLock
, the decrement of the refcount is a release write, and as such reordering the two is not legal.
I just noticed though that this is not stable under inlining -- if the destructor is inlined in your
RefMut::broken
, there's no sign any more that an ownership transfer happened. I wonder if similar cases can be constructed not involving the destructor...
That's what I was talking about.
(Beware of the zombies!)
Now that we have done the proof, I am thinking maybe even Ref
is just fine... The reasoning for that would be essentially as follows: If we take a Ref<'a, T>
and mem::forget
it, then it is actually sound to use its value
field at lifetime 'a
. That is, we can "disassemble" the ownership captured in Ref
to justify the type &'a T
given in the struct, but this "disassembling" is irreversible -- along the way, we permanently lose our right to decrement the borrow count by 1
.
Maybe we don't want to permit this kind of "irreversibility" when checking whether the private invariants are strong enough to justify the written type, I don't know. If we do not, then I would agree with you that RefMut
is just as bad; it also requires irreversible ownership transfer to obtain a &'a mut T
.
Actually, based on @comex's comment about "Stacked Borrows", I now conclude that RefMut
is actually not fine.
Ref
has the same problem, actually:
use std::cell::{RefCell, Ref};
fn break_it(rc: &RefCell<i32>, r: Ref<'_, i32>) {
// `r` has a shared reference, it is passed in as argument and hence
// a barrier is added that marks this memory as read-only for the entire
// duration of this function.
drop(r);
// *oops* here we can mutate that memory.
*rc.borrow_mut() = 2;
}
fn main() {
let rc = RefCell::new(0);
break_it(&rc, rc.borrow())
}
refcell_ref.md
says:However, I think that applies only to
RefCell::Ref
, not toRefCell::RefMut
. (The mistake is probably mine in the original mail to Niko.) The reason for this is that&mut T
is non-Copy
, so this actually is a reference that's valid for as long as the lifetime says. If the destructor ofRefCell::RefMut
is executed, ownership is moved fromRefMut
back to theRefCell
. In some sense, having a variable of type&'a mut T
does not actually guarantee that the borrow lives for lifetime'a
, all it really says is that if we keep hold of this variable, then the borrow lasts. But if we give up the variable, pass it to someone else, that someone may well kill the borrow earlier.In contrast to that,
&T
isCopy
, so there's no "giving up" of ownership here.(I was unsure whether an issue or the internals is the right place for this discussion. Feel free to move.)