Closed withoutboats closed 5 years ago
@rfcbot fcp merge
Team member @withoutboats has proposed to merge this. The next step is review by the rest of the tagged team members:
Concerns:
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
Thanks for the detailed writeup here @withoutboats! I've also sort of been historically confused by the various guarantees of Pin
, and I've currently got a few questions about the APIs being stabilized here wrt the safety guarantees. To help sort this out in my own head though I figured I'd try to write these things down.
In trying to start writing this down though I keep running up against a wall of "what is Unpin
?" I'm sort of confused by what that is and the various guarantees around it. Can you say again what it means for T
to and also to not implement Unpin
? Also, if Unpin
is a safe trait to implement it naively seems like it could be used to easily undermine the unsafe guarantees of Pin<T>
, but I'm surely missing something
if i have got this correct, every type that is not self referential (ie: not a generator) is Unpin
It's not just self-referentiality, there are some other use-cases for stable memory addresses that Pin
can also support. They are relatively few and far-between though.
How I understand Unpin
being safe to implement is that by implementing it you may violate an invariant required of other unsafe code you have written (crucially, only you may write this unsafe code, no external code can be relying on whether you have implemented Unpin
). There is nothing you can do with Pin
's safe API that will cause unsoundness whether or not you have implemented Unpin
. By opting in to using some of Pin
's unsafe API you are guaranteeing that you will only implement Unpin
when it is safe to do so. That is covered by point 1 of the "Notes on pinning & safety" section above.
Hm I still don't really understand Unpin
. I'm at first just trying to understand what it means to implement or not to impelment Unpin
.
First off, it's probably helpful to know what types implement Unpin
automatically! It's mentioned above that common pointer types (Arc/Rc/Box/references) implement Unpin
, but I think that's it? If this is an auto trait, does that mean that a type MyType
automatically implements Unpin
if it only contains pointers? Or do no other types automatically implement Unpin
?
I keep trying to summarize or state what Unpin
guarantees and such, but I'm finding it really difficult to do so. Can someone help out by reiterating again what it means to implement Unpin
as well as what it means to not implement Unpin
?
I think I understand the guarantees of Pin<P>
where you can't move out of any of the inline members of P::Target
, but is that right?
@alexcrichton Thanks for the questions, I'm sure the pinning APIs can be a bit confusing for people who haven't been part of the group focusing on them.
First off, it's probably helpful to know what types implement Unpin automatically!
Unpin is an auto trait like Send or Sync, so most types implement it automatically. Generators and the types of async functions are !Unpin
. Types that could contain a generator or an async function body (in other words, types with type parameters) are also not automatically Unpin
unless their type parameters are.
The explicit impls for pointer types is to make the Unpin
even if their type parameters are not. The reason for this will hopefully be clearer by the end of this comment.
Can someone help out by reiterating again what it means to implement Unpin as well as what it means to not implement Unpin?
Here is the sort of fundamental idea of the pinning APIs. First, given a pointer type P
, Pin<P>
acts like P
except that unless P::Target
implements Unpin
, it is unsafe to mutably dereference Pin<P>
. Second, there are two basic invariants unsafe code related to Pin
has to maintain:
&mut P::Target
from a Pin<P>
, you must never move P::Target
.Pin<P>
, it must be guaranteed that you'll never be able to get an unpinned pointer to the data that pointer points to until the destructor runs.The implication of all of this is that if you construct a Pin<P>
, the value pointed to by P
will never move again, which is the guarantee we need for self-referential structs as well as intrusive collections. But you can opt out of this guarantee by just implementing Unpin
for your type.
So if you implement Unpin
for a type, you're saying that the type opts out of any additional safety guarantees of Pin
- its possible to mutably dereference the pointers pointing to it. This means you're saying the type does not need to be immovable to be used safely.
Moving a pointer type like Rc<T>
doesn't move T
because T
is behind a pointer. Similarly, pinning a pointer to an Rc<T>
(as in Pin<Box<Rc<T>>
) doesn't actually pin T
, it only pins that particular pointer. This is why anything which keeps its generics behind a pointer can implement Unpin
even when their generics don't.
Also, if Unpin is a safe trait to implement it naively seems like it could be used to easily undermine the unsafe guarantees of Pin
, but I'm surely missing something
This was one of the trickiest parts of the pinning API, and we got it wrong at first.
Unpin means "even once something has been put into a pin, it is safe to get a mutable reference to it." There is another trait that exists today that gives you the same access: Drop
. So what we figured out was that since Drop
is safe, Unpin
must also be safe. Does this undermine the entire system? Not quite.
To actually implement a self-referential type would require unsafe code - in practice, the only self-referential types anyone cares about are those that the compiler generates for you: generators and async function state machines. These explicitly say they don't implement Unpin
and they don't have a Drop
implementation, so you know, for these types, once you have a Pin<&mut T>
, they will never actually get a mutable reference, because they're an anonymous type that we know doesn't implement Unpin or Drop.
The problem emerges once you have a struct containing one of these anonymous types, like a future combinator. In order go from a Pin<&mut Fuse<Fut>>
to a Pin<&mut Fut>
, you have to perform a "pin projection." This is where you can run into trouble: if you pin project to the future field of a combinator, but then implement Drop for the combinator, you can move out of a field that is supposed to be pinned.
For this reason, pin projection is unsafe! In order to perform a pin projection without violating the pinning invariants, you have to guarantee that you never do several things, which I listed in the stabilization proposal.
So, the tl;dr: Drop
exists, so Unpin
must be safe. But this doesn't ruin the whole thing, it just means that pin projection is unsafe
and anyone who wants to pin project needs to uphold a set of invariants.
generators and async function state machines. These explicitly say they don't implement Unpin and they don't have a Drop implementation, so you know, for these types, once you have a Pin<&mut T>, they will never actually get a mutable reference, because they're an anonymous type that we know doesn't implement Unpin or Drop.
Shouldn't an async state machine have a Drop
implementation? Things that are on the "stack" of the async function (which probably equals fields in the state machine) need to get destructed when the async function completes or gets cancelled. Or does this happen otherwise?
I guess what matters in this context is whether an impl Drop for Foo {…}
item exists, which would run code with &mut Foo
which could use for example mem::replace
to "exfiltrate" and move the Foo
value.
This is not the same as "drop glue", which can be called through ptr::drop_in_place
. Drop glue for a given Foo
type will call Drop::drop
if it’s implemented, then recursively call the drop glue for each field. But those recursive calls never involve &mut Foo
.
Also, while a generator (and therefore an async state machine) has custom drop glue, it's just to drop the correct set of fields based on the current state, it promises to never move any of the fields during drop.
The terminology I use (though I don't think there's any standard): "drop glue" is the compiler generated recursive walking of fields, calling their destructors; "Drop implementation" is an implementation of the Drop
trait, and "destructor" is the combination of the drop glue and the Drop implementation. The drop glue never moves anything around, so we only care about Drop implementations.
Is someone on the hook to write a nomicon chapter on futures? It seems incredibly necessary given how subtle this is. In particular I think examples, especially examples of buggy/bad implementations and how they're unsound, would be illuminating.
@Gankro Sure, you can put me down for that.
Thanks for the explanation everyone!
I personally am super new to async Rust and the Pin APIs, but I have played a bit around with it during the last days (https://github.com/rust-lang-nursery/futures-rs/pull/1315 - where I tried to exploit pinning for intrusive collections in async code).
During experimentation I got some concerns with these APIs:
within
impl core::future::future::Future+futures_core::future::FusedFuture
, the traitstd::marker::Unpin
is not implemented forstd::marker::Pinned
This one sounds pretty recursive and contradicting (why should something that is pinned be not pinned?)
Pin
projection), trying to copy the methods from pin-utils
, until I came to a solution that uses 2 unsafe methods to do what I needed.async
method and a select
case. It turned out I needed to pin_mut!()
it to futures stack. Which isn't really a stack, which makes it confusing.drop()
method now again take &mut self
, instead of a Pin
.Pin
and Unpin
actually are. This seems like a bit a leak of implementation details for some more special use-cases.drop()
is called. Which gives it some kind of virtual lifetime between the current scope and 'static
. It seems confusing to have 2 concepts which are related, but still completely different. I wondered whether it can be modelled by something like 'pinned
.While getting things to compile, I often thought about C++, where most of these issues are avoided: Types can be declared unmovable by deleting the move constructor and assignment operator. When a type is not moveable, types which contain that type are also not. Thereby the property and requirements flow natively through the type hierarchy and get checked by the compiler, instead of forwarding the property inside some calls (not all, e.g. not drop()
). I think this is a lot easier to understand. But maybe not applicable to Rust, since Future
s must currently often be moved before something starts polling them? But on the other hand that might be fixable with a new definition with that trait, or separating movement and polling phase.
Regarding Alex Unpin
comment: I'm slowly grasping what Unpin
means, but I agree that it's hard to understand. Maybe another name helps, but I can't find a concise one right now. Something along ThingsInsideDontBreakIfObjectGetsMoved
, DoesntRequireStableAddress
, PinDoesntMatter
, etc.
What I haven't yet fully understood is, why getting &mut self
from Pin<&mut self>
can't be safe for all types. If Pin would just mean the address of the object itself is stable, then this should hold true for most typical Rust types. It seems there is another concern integrated into Pin, which is that also self-referential types inside it don't break. For those types manipulating them after having a Pin
is always unsafe. But I think in most cases those will be compiler-generated, and unsafety or even raw pointers there should be fine. For a manually future combinators that need to forward pinned calls to subfields there would obviously need to be unsafe calls to create pins to the fields before polling those. But in that case I see the unsafety more related to the place where it actually matters (is the pin still valid?) than to accessing other fields for which it doesn't matter at all.
Another thought that I had is whether it's possible to get a simpler system if some limitations are applied. E.g. if things that require pinning could only be used inside async methods and not with future combinators. Maybe that could simplify things to one of Pin
or Unpin
. Considering that for lots of code async/await methods with select/join support might be favorable to combinators, this might not be the hugest loss. But I haven't really put enough thoughts into whether this really helps.
On the positive side: Being able to write async/await code with "stack" borrows is pretty cool and much needed! And ability to use pinning for other use-cases, like intrusive collections, will help performance as well as targets like embedded systems or kernels. So I'm really looking forward to a solution on this.
Minor nit w.r.t. the stabilization report:
fn get_unchecked_mut(self) -> &'a mut T
I'm guessing this is actually an unsafe fn
?
@Matthias247 you can't safely get &mut T from Pin<P<T>>
if T isn't Unpin because then you could mem::swap
to move T out of the Pin, which would defeat the purpose of pinning things.
One thing I have trouble explaining to myself is what makes Future fundamentally different from other traits such that Pin needs to be part of its API. I mean, I know intuitively it's because async/await requires pinning, but does that say something specifically about Futures that's different from, say, Iterators?
Could poll take &mut self and only implement Future for Pin<P>
types or types that are Unpin?
How can I access mutable fields in my method that takes a Pin as a parameter.
This actually makes me wonder whether there's a couple of missing methods on Pin
,
impl<'a, T: ?Sized> Pin<&'a T> {
fn map<U: Unpin, F: FnOnce(&T) -> &U>(self, f: F) -> Pin<&'a U>
}
impl<'a, T: ?Sized> Pin<&'a mut T> {
fn map<U: Unpin, F: FnOnce(&mut T) -> &mut U>(self, f: F) -> Pin<&'a mut U>
}
These could be used in the pin_utils::unsafe_unpinned!
macro.
I'm trying to work out why this macro claims to be unsafe. If we are !Unpin
and have a Unpin
field, why would it be unsafe to project to this field?
The only situation where I can see it being unsound is implementing a custom !Unpin
type, taking a raw pointer to an Unpin
field of self (and relying on it having a stable address/pointing to the same instance), then acquiring an &mut
to the same field and passing it to an external function. This seems to fall under the same reasoning of why implementing Unpin
is safe though, by taking a raw pointer to a field of a !Unpin
you are opting out of being able to call some of the safe APIs.
To make the previous situation safer there could be a wrapper built on Pinned
for marking which normally Unpin
fields of a struct are actually !Unpin
instead of just adding a Pinned
to the struct as a whole:
pub struct MustPin<T: Unpin>(T, Pinned);
impl<T: Unpin> MustPin<T> {
pub const fn new(t: T) -> Self { ... }
pub fn get(self: Pin<&Self>) -> *const T { ... }
pub fn get_mut(self: Pin<&mut Self>) -> *mut T { ... }
}
This all seems backwards compatible with the current API, it could remove some of the unsafety from futures-rs
combinators (e.g. it would give safe access to extra fields like this one), but isn't necessary for the current usecases. We could probably experiment with some APIs like this for implementing custom !Unpin
types (like intrusive collections) and add them later on.
@Nemo157 Those map functions are unsafe because I could mem::swap
on the &mut T
the function passes me before returning an &mut U
. (well the mutable one is, the immutable one might not be unsafe)
EDIT: Also the pin-utils macro is different, unsafe_unpinned
doesnt have to do with the target type being Unpin
, its just an "unpinned projection" - a projection to a &mut Field
. Its safe to use as long as you never pin project to that field, even if the field is !Unpin
.
One thing I have trouble explaining to myself is what makes Future fundamentally different from other traits such that Pin needs to be part of its API. I mean, I know intuitively it's because async/await requires pinning, but does that say something specifically about Futures that's different from, say, Iterators?
There's no theoretical difference, but there are pragmatic ones. For one, Iterator
is stable. But as a result, unless we figure out something very clever, you'll never be able to run a for
loop over a self-referential generator without a double indirection, even though that should be completely safe without it (because the for loop will consume and never move the generator).
Another important pragmatic difference is that the code patterns between Iterator
and Future
are quite different. You can't go 10 minutes without wanting to borrow across an await point in a future, which is why on futures 0.1 you see these Arc
s appearing all over the place so you can use the same variable in two different and_then
calls. But we've gotten quite far without being able to express self-referential iterators at all, it's just not that important of a use case.
Those map functions are unsafe because I could
mem::swap
on the&mut T
the function passes me before returning an&mut U
Ah, damn, forgot about that part of it :frowning:
fn as_mut(&mut self) -> Pin<&mut P::Target>
fn into_ref(self) -> Pin<&'a T>`
Should these be called as_pinned_mut
and into_pinned_ref
, to avoid confusing them with methods that actually return references?
Notable implementations of Unpin in std:
We also have
Never mind, you have that on your list. ;)impl<P> Unpin for Pin<P> {}
. Should this maybe become impl<P: Unpin> Unpin for Pin<P> {}
? For the types we use this has no effect and it seems a bit safer? After all, Pin
itself does not add a ptr indirection, so the argument about pinning not being structural does not apply.
I think we have to codify the Drop
guarantee before stabilization, or else it will be too late:
It is illegal to invalidate the storage of a pinned object (the target of a
Pin
reference) without callingdrop
on the object.
"Invalidate" here can mean "deallocation", but also "repurposing": When changing x
from Ok(foo)
to Err(bar)
, the storage of foo
gets invalidated.
This has the consequence, for example, that it becomes illegal to deallocate a Pin<Box<T>>
, Pin<Rc<T>>
or Pin<Arc<T>>
without first calling drop::<T>
.
Deref
, DerefMut
I still feel slightly uneasy about how this repurposes the Deref
trait to mean "this is a smart pointer". We use Deref
for other things as well, e.g. for "inheritance". That might be an anti-pattern, but it is still in fairly common use and, frankly, it is useful. :D
I think this is not a soundness issue though.
Unpin
Shameless plug: I wrote two blog posts on this, that might help or not depending on how much you enjoy reading (semi-)formal syntax. ;) The APIs have changed since then, but the basic idea has not.
@Matthias247 I think one problem you are running into is that building abstractions that involve pinning currently almost always requires unsafe code. Using those abstractions is fine, but e.g. defining a future combinator in safe code won't work. The reason for this is that "pin projections" have constraints that we could only safely check with compiler changes. For example, you ask
Why does my drop() method now again take &mut self, instead of a Pin.
Well, drop()
is old -- it exists since Rust 1.0 -- so we cannot change it. We'd love to just make it take Pin<&mut Self>
, and then Unpin
types could get their &mut
like they do now, but that's a non-backwards-compatible change.
To make writing future combinators safe, we'd need safe pin projections, and for that we have to change the compiler, this cannot be done in a library. We could almost do it in a derive
proc_macro, except that we'd need a way to assert "this type does not have any implementation of Drop
or Unpin
".
I think it'd be worth the effort to figure out how to get such a safe API for pin projections. However, that does not have to block stabilization: The API we stabilize here should be forward compatible with safe pin projections. (Unpin
might have to become a lang item to implement the above assertion, but that doesn't seem too bad.)
I often thought about C++, where most of these issues are avoided: Types can be declared unmovable by deleting the move constructor and assignment operator. When a type is not moveable, types which contain that type are also not.
There were several attempts at defining unmovable types for Rust. It gets very complicated very quickly.
One important thing to understand is that pinning does not provide unmovable types! All types remain movable in Rust. If you have a T
, you can move it anywhere you want. Instead of unmovable types, we make use of Rust's ability to define new encapsulated APIs, as we define a pointer type from which you cannot move out. All the magic is in the pointer type (Pin<&mut T>
), not in the pointee (T
). There is no way for a type to say "I cannot get moved ever". However, there is a way for a type to say "If I got pinned, don't move me again". So a MyFuture
that I own will always be movable, but Pin<&mut MyFuture>
is a pointer to an instance of MyFuture
that cannot get moved any more.
This is a subtle point, and we probably should spend some time fleshing out the docs here to help avoid this very common misunderstanding.
But we've gotten quite far without being able to express self-referential iterators at all, it's just not that important of a use case.
This is because, so far, all iterators are defined using a type and an impl Iterator for …
block. When state needs to be kept between two iterations, there is no choice but to stash it in fields of the iterator type and work with &mut self
.
However, even if this is not the primary motivation for including generators in the language, it would be very nice to eventually be able to use generator yield
syntax to define something that can be used with a for
loop. As soon as that exists, borrowing across yield
because just as important for generator-iterators as it is for generator-futures.
(
Unpin
might have to become a lang item to implement the above assertion, but that doesn't seem too bad.)
Unpin
and Pin
already have to be lang items to support safe immovable generators.
Ok thanks for the explanations all! I agree with @Gankro that a nomicon-style chapter about Pin
and such would be massively useful here. I suspect that there's a lot of development history for why various safe methods exist and why unsafe methods are unsafe and such.
In an effort to help myself understand this though I wanted to try again to write down why each function is safe or why it's unsafe
. With the understanding from above I've got all the following, but there's a few questions here and there. If others can help me fill this in that'd be great! (or help point out where my thinking is wrong)
fn new(P) -> Pin<P> where P: Deref, P::Target: Unpin
Pin<P>
, and while the existence of
Pin<P>
normally means that P::Target
will never be moved again in the
program, P::Target
implements Unpin
which is read "the guarantees of
Pin
no longer hold". As a result, the safety here is because there's no
guarantees to uphold, so everything can proceed as usual.unsafe fn new_unchecked(P) -> Pin<P> where P: Deref
unsafe
because P
doesn't
necessarily implement Unpin
, so Pin<P>
must uphold the guarantee that
P::Target
will never move ever again after Pin<P>
is created.A trivial way to violate this guarantee, if this function is safe, looks like:
fn foo<T>(mut a: T, b: T) {
Pin::new_unchecked(&mut a); // should mean `a` can never move again
let a2 = mem::replace(&mut a, b);
// the address of `a` changed to `a2`'s stack slot
}
Consequently it's up to the user to guarantee that Pin<P>
indeed means
P::Target
is never moved again after construction, so it's unsafe
!
fn as_ref(&Pin<P>) -> Pin<&P::Target> where P: Deref
Pin<P>
we have a guarantee that P::Target
will never move. That's
part of the contract of Pin
. As a result it trivially means that
&P::Target
, another "smart pointer" to P::Target
will provide the same
guarantee, so &Pin<P>
can be safely translate to Pin<&P::Target>
.This is a generic method to go from Pin<SmartPointer<T>>
to Pin<&T>
fn as_mut(&mut Pin<P>) -> Pin<&mut P::Target> where P: DerefMut
as_ref
above.
We're not handing out mutable access, just a Pin
, so nothing can be easily
violated yet.Question: what about a "malicious" DerefMut
impl? This is a safey way
to call a user-provided DerefMut
which crates &mut P::Target
natively,
presumably allowing it to modify it as well. How's this safe?
fn set(&mut Pin<P>, P::Target); where P: DerefMut
Pin<P>
(and the fact that we don't know anything about
Unpin
), shouldn't this guarantee that P::Target
never moves? If we can
reinitialize it with another P::Target
, though, shouldn't this be unsafe?
Or is this somehow related to destructors and all that?unsafe fn map_unchecked<U, FnOnce(&T) -> &U>(Pin<&'a T>, f: F) -> Pin<&'a U>
unsafe
function so the main question here is "why isn't this
safe"? An example of violating guarantees if this were safe looks like:...
Question:: what's the counter-example here? If this were safe, what's
the example that shows violating the guarantees of Pin
?
fn get_ref(Pin<&'a T>) -> &'a T
Pin<&T>
means that T
will never move. Returning &T
doesn't allow mutation of T
, so it should be safe to do while upholding
this guarantee.One "maybe gotcha" here is interior mutability, what if T
were
RefCell<MyType>
? This, however, doesn't violate the guarantees of
Pin<&T>
because the guarantee only applies to T
as a whole, not the
interior field MyType
. While interior mutability can move around
internals, it still fundamentally can't move the entire structure behind a
&
reference.
fn into_ref(Pin<&'a mut T>) -> Pin<&'a T>
Pin<&mut T>
means that T
will never move. As a result it means Pin<&T>
provides the same guarantee. Shouldn't be much issue with this conversion,
it's mostly switching types.unsafe fn get_unchecked_mut(Pin<&'a mut T>) -> &'a mut T
Pin<&mut T>
means that T
must never move, so this is trivially unsafe
because you can use mem::replace
on the result to move T
(safely). The
unsafe
here "although I am giving you &mut T
, you're not allowed to ever
move T
".unsafe fn map_unchecked_mut<U, F: FnOnce(&mut T) -> &mut U>(Pin<&'a mut T>, f: F) -> Pin<&'a mut U>
unsafe
here is basically at least the same as above, we're
handing out &mut T
safely (can't require an unsafe closure) which could
easily be used with mem::replace
. There's probably other unsafety here too
with the projection, but it seems reasonable that it's at least unsafe
because of that.fn get_mut(Pin<&'a mut T>) -> &'a mut T where T: Unpin
Unpin
a type says "Pin<&mut T>
has no guarantees, it's
just a newtype wrapper of &mut T
". As a result, with no guarantees to
uphold, we can return &mut T
safelyimpl<P: Deref> Deref for Pin<P> { type Target = P::Target }
as_ref
followed by get_ref
, so the
safety of this impl follows from the above.impl<P: DerefMut> DerefMut for Pin<P> where T::Target: Unpin { }
as_mut
followed by get_mut
, so the
safety of this impl follows from the above.impl<T: ?Sized> Unpin for Box<T>
(and other pointer-related implementations)
Box<T>
would implement the Unpin
trait
only if T
implemented Unpin
. This implementation here codifies that even
if T
explicitly doesn't implement Unpin
, Box<T>
implements Unpin
.Question: What's an example for what this is fundamentally empowering?
For example, what safe could would otherwise require unsafe
if this impl
did not exist.
@Matthias247 you can't safely get &mut T from Pin<P
> if T isn't Unpin because then you could mem::swap to move T out of the Pin, which would defeat the purpose of pinning things.
Thanks! Yes, since swap
is no unsafe method this is obviously problematic. But could that be fixed by adding an Unpin
bound to swap()
? Since all code up to now should be Unpin
or is unsafe anyway, this shouldn't break anything.
I guess one of the things that still confuses me most is that Pin<T>
codifies multiple guarantees: That the address of the T is stable, as well as some guarantees about it's internal state (it's kind of frozen/immutable in some situations, but also not really).
It seemed to me like moving the unsafe code/projections only to the places where further Pin
based calls are needed (e.g. in poll
s on fields) might be preferable to dealing with those unsafe projections in all cases. However I now also realized that there is another issue with that: Code that has access to the mutable reference can move the field freely around, and when drop()
is then safely called on this field, it might break due to it's address being stored somewhere else. For that to the fixed the drop(Pin<T>)
overload that was talked about would be required.
@RalfJung Thanks for the explanations! I agree on the fact that I certainly tried to do something unsafe in general, and therefore it should be fine to require the extra understanding from me. I'm more concerned about people who want to write more generally safe future combinators, but now also might get confronted with all of those terms. If they can write combinators without having to understand Unpin
and pin projections at all, and then only get combinators which work in a reduced fashion (only on Unpin
futures), that seems preferable. Since I haven't tried it I can't say if it's currently the case. I think one still needs to add at least Unpin
bounds manually.
I also understand the fact that non-moveable types are different from the pin type. However I am currently more focused on use-cases than the difference. And for the use-case of intrusive collections non-moveable types work very well without introducing too much complexity. For futures obviously some research would be required, since they way from a moveable to a non-moveable type is missing. If that way would be not more ergonomic than the Pin APIs, there would also be no win.
Adding T: Unpin
to mem::swap
would mean it couldn't be used on certain types even when they're not inside a Pin.
But could that be fixed by adding an Unpin bound to swap()? Since all code up to now should be Unpin or is unsafe anyway, this shouldn't break anything.
That would break all generic code: If you write a function in today's stable rust for some T
without constraints, you can call swap
on it. This has to keep working.
non-moveable types work very well without introducing too much complexity.
Nobody has demonstrated a way to add non-movable types to Rust in a backwards-compatible way without complexity significantly exceeding what is being proposed for stabilization here. You should find some of these old proposals and discussions in the RFC repo. See https://github.com/rust-lang/rfcs/pull/1858 for one example.
The RFC at https://github.com/rust-lang/rfcs/pull/2349 as well as boat's blog series starting here should help to give you some background and some impression of the other designs that were considered. (Also notice the dates, this design has been in the works for almost 10 months!)
Also mem::swap is a red herring, because it isn't at all an interesting function. It's literally just
let temp = *a;
*a = *b;
*b = temp;
@Gankro doesn't it use unsafe code? Afaik it's not possible to write that literally.
Edit: I guess another way to think about this is that adding T: Unpin
to mem::swap
is actually changing the definition of safety at the language level. It would break all the mycrate::swap
fns in the wild.
Adding T: Unpin to mem::swap would mean it couldn't be used on certain types even when they're not inside a Pin.
If Unpin is automatically derived (in the same fashion as Sync
/Send
are) then I thought this shouldn't be an issue.
That would break all generic code: If you write a function in today's stable rust for some T without constraints, you can call swap on it. This has to keep working.
But this one obviously is. Didn't thought about the fact that trait bounds need to be explicitly propagated in Rust.
Nobody has demonstrated a way to add non-movable types to Rust in a backwards-compatible way without complexity significantly exceeding what is being proposed for stabilization here. You should find some of these old proposals and discussions in the RFC repo. See rust-lang/rfcs#1858 for one example.
Thanks, I will read up a bit more on the previous work on this if I find some time. Obviously lots of thoughts and effort already have went into this, and I certainly don't want to block things. I just wanted to provide my concerns and questions when trying to work with this.
@Matthias247
If Unpin is automatically derived (in the same fashion as Sync/Send are) then I thought this shouldn't be an issue.
I'm not sure if I was clear. To clarify, it's perfectly safe to move around a !Unpin
type and thus perfectly safe to mem::swap
them around. It's only unsafe to move a type T: !Unpin
after it's pinned, i.e. inside of a Pin<P<T>>
.
For example, in async/await code, you can move around futures returned from an async function as much as you want. You only stop being able to move them once you pin_mut!
them or put them in a Pin<Box<..>>>
or whatnot.
I have some assorted questions about this:
Given the importance of Pin<T>
for async
and for generators and potential for unsoundness when interacting with other parts of Rust (e.g. swap
& replace
), has any formal verification (e.g. by @jhjourdan or @RalfJung) been done of the variant of the pinning API that is proposed here?
Does this API give any new guarantees about the abstract machine / operational semantics of Rust? I.e., if we forget about the use case as a backing mechanism for async
& await
/ generators, could we put this inside a crate in the ecosystem and it would just work given the current guarantees we give?
What sort of APIs or type system additions become impossible as a consequence of stabilizing the pinning API? (this question is an extension of 2.)
What avenues does the to-be-stabilized API provide in terms of evolving it a language provided &pin T
type to improve upon field projection and such (which doesn't seem great with the proposed API).
I have some assorted notes:
The documentation in the standard library seems quite sparse. :(
I agree with others that the pinning construct is quite mentally taxing / complex.
The Unmovable
example in the documentation seems overly complicated and involves unsafe
; this seems sub-optimal. With gradual initialization as elaborated on in a draft RFC in the language (i.e. improving upon NLL) could instead give us:
struct Unmovable<'a> {
data: String,
slice: &'a str,
}
let um: Unmovable<'_>;
um.data = "hello".to_string();
um.slice = &um.data; // OK! we borrow self-referentially.
drop(um); // ERROR! `um.slice` is borrowing `um.data` so you cannot move `um`.
// You won't be able to take a &mut reference to `um` so no `swap` problems.
This involves zero unsafe and it's quite easy for the user to deal with this.
Additionally, the std APIs provide no safe way to pin objects to the stack. This is because there is no way to implement that safely using a function API.
What about API like this?
pub fn using_pin<T, R, F>(value: T, f: F) -> R
where F: for<'a> FnOnce(Pin<&'a mut T>) -> R {
pin_mut!(value); // Actual implementation inlines this but the point is this API is safe as long as pin_mut! is safe.
f(value)
}
I’ve not been following along with the development of the pinning APIs too closely, so this could have been mentioned or explained elsewhere and I’ve been unable to find it, apologies if that is the case:
The
Pinned
type is a ZST which does not implementUnpin
; it allows you to supress the auto-implementation ofUnpin
on stable, where!Unpin
impls would not be stable yet.
Is there an explanation anywhere about why !Unpin
impls couldn’t be made stable?
It is only safe if Foo (the containing type) is not repr(packed), because this causes fields to be moved around to realign them.
Does packed mean that fields can be moved dynamically? That is a bit scary. Are we sure that llvm will never generate code to move fields in other circumstances? Likewise is it possible that pinned values on the stack might be moved by llvm?
Despite the subtlety, this seems like a really nice API. Good work!
@Centril Ralf has written about pinning on his blog.
The pin APIs involved no language changes at all, and are entirely implemented in the standard library using pre-existing language features. It has no impact on Rust the language and doesn't foreclose on any other language feature.
Pin
is really just a clever implementation of one of Rust's most valuable and classic features: the ability to introduce invariants to an API by marking part of the API unsafe
. Pin
wraps a pointer to make one operation (DerefMut
) unsafe, requiring that people who do it uphold certain invariants (that they don't move out of the reference), and allowing other code to assume that this will never occur. A similar, much older example of this same trick is String
, which makes it unsafe to put non-UTF8 bytes into a string, allowing other code to assume that all the data in the String
is UTF8.
Is there an explanation anywhere about why !Unpin impls couldn’t be made stable?
Negative impls are currently unstable, totally unrelated to these APIs.
Negative impls are currently unstable.
🤦♂️ That makes sense, should have thought about that for a little bit longer. Thanks.
@alexcrichton That's some great analysis of this API, we should try to preserve it some place better than some comment that will get lost!
Some comments:
as_mut
: Question: what about a "malicious" DerefMut impl? This is a safey way to call a user-provided DerefMut which crates &mut P::Target natively, presumably allowing it to modify it as well. How's this safe?
Basically, when you call new_unchecked
, you make a promise about the Deref
and DerefMut
implementations for this type.
set
: Given Pin(and the fact that we don't know anything about Unpin), shouldn't this guarantee that P::Target never moves? If we can reinitialize it with another P::Target, though, shouldn't this be unsafe? Or is this somehow related to destructors and all that?
This drops the old content of the pointer, and puts new content in there. "Pinned" doesn't mean "never dropped", it means "never moved until dropped". So Dropping it or overwriting it while calling drop
is fine. Calling drop
is crucial, that's the drop guarantee I mentioned above.
Where do you see something moving here?
map_unchecked
: Question:: what's the counter-example here? If this were safe, what's the example that shows violating the guarantees of Pin?
An example would be starting with a Pin<&&T>
and using this method to get a Pin<&T>
. Pinning does not "propagate through" references.
get_ref
: One "maybe gotcha" here is interior mutability, what if T were RefCell?
Indeed, this is a gotcha, but as you observed not a soundness problem. What would be unsound is having a method that goes from Pin<RefCell<T>>
to Pin<&[mut] T>
. Basically, what happens is that RefCell
does not propagate pinning, we could impl<T> Unpin for RefCell<T>
.
has any formal verification (e.g. by @jhjourdan or @RalfJung) been done of the variant of the pinning API that is proposed here?
No, not from our side. The blog posts I mentioned above contain some thoughts for how I'd start formalizing this, but we haven't actually gone through and done it. If you give me a time machine or an interested PhD student we'll do it. ;)
if we forget about the use case as a backing mechanism for async & await / generators, could we put this inside a crate in the ecosystem and it would just work given the current guarantees we give?
That is the intention.
What sort of APIs or type system additions become impossible as a consequence of stabilizing the pinning API? (this question is an extension of 2.)
Uh, I have no idea how to answer that question with any confidence. The space of possible additions is just too large and also too high-dimensional that I'd dare making any kind of universal statement about it.
What avenues does the to-be-stabilized API provide in terms of evolving it a language provided &pin T type to improve upon field projection and such (which doesn't seem great with the proposed API).
I am not sure any more about &pin T
, it doesn't match very well the new generic Pin<T>
. In terms of handling projections, we'd need some hack to say "Unpin
and Drop
have not been implemented for this type", then we could do this safely with a macro. For additional ergonomics, we'd likely want a generic "field projection" capability in the language, one that would also cover going from &Cell<(A, B)>
to &Cell<A>
.
Is there an explanation anywhere about why !Unpin impls couldn’t be made stable?
AFAIK negative impls have some long-standing limitations, and if you add generic bounds to them they will often not work the way you think they should. (Generic bounds sometimes just get ignored, or so.)
Maybe Chalk fixes all of this, maybe just some of it, but either way we'd probably not want to block this on Chalk.
Does packed mean that fields can be moved dynamically? That is a bit scary.
It is the only sound way to call drop
on a packed field, given that drop
expects an aligned reference.
Are we sure that llvm will never generate code to move fields in other circumstances? Likewise is it possible that pinned values on the stack might be moved by llvm?
LLVM copying bytes around should be no problem, as it cannot change program behavior. This is about a higher-level concept of "moving" data, in a way that is observable in Rust (e.g. because pointers to the data no longer point to it). LLVM cannot just move data elsewhere that we may have pointers to. Likewise, LLVM cannot just move values on the stack that have their address taken.
Ok thanks @RalfJung, makes sense! Some follow-up questions...
When you say "never moved until dropped", this means that Drop
may be called where &mut self
has moved from the address of Pin<&mut Self>
? Destructors can't rely on interior pointers being accurate, right?
My worry when looking at set
was how panics would be handled, but I think that this is a non-issue after digging into the codegen a bit more.
When you say "never moved until dropped", this means that
Drop
may be called where&mut self
has moved from the address ofPin<&mut Self>
? Destructors can't rely on interior pointers being accurate, right?
@alexcrichton my understanding was that means never moved until after Drop::drop
returns. Otherwise some usecases (intrusive collections and stack allocated DMA buffers at least) become impossible.
Destructors can rely on something never having been moved if they can prove it was previously in a Pin
. For example if a state machine can only enter a state through an API which requires it to be pinned, the destructor can assume that it was pinned before it was dropped if it is in that state.
Code cannot assume that nonlocal destructors do not move members, but obviously you can assume your own destructors don't move things around because you're the one writing them.
When you say "never moved until dropped", this means that Drop may be called where &mut self has moved from the address of Pin<&mut Self>? Destructors can't rely on interior pointers being accurate, right?
I mean that the data will never move (in the sense of not going anywhere else, including not being deallocated) until drop
was called. And yes, drop
can rely on running on the pinned location, even though its type cannot express that. drop
should take Pin<&mut self>
(for all types), but alas, too late for that.
After drop
was called, the data is just meaningless bytes, you may do anything with them: Put new stuff in there, deallocate, whatever.
This allows, for example, an intrusive linked list where the destructor of an element deregisters it by adjusting the neighboring pointers. We know the memory will not go away without that destructor being called. (The element can still be leaked, and then it will just stay in that linked list forever. But in that case it will remain valid memory so there is no safety issue.)
I've been reading through all I can find on Pin
, Unpin
, and the discussion leading up to it all, and while I think I now understand what's going on, there is also a lot of subtlety around what the different types mean, as well as the contract that implementors and users must follow. Sadly, I see relatively little discussion of that in the docs for std::pin
or on Pin
and Unpin
specifically. In particular, I'd love to see some of the content from these comments:
incorporated into the docs. Specifically:
Pin
only protects "one level deep".Pin::new_unchecked
places restrictions on the impl for Deref
and DerefMut
.Pin
and Drop
.!Unpin
is only not allowed to move once placed in a Pin
.Pin<Box<T>>: Unpin where T: !Unpin
. This ties back to the "one level deep" restriction above, but I think this concrete example with a proper explanation would be helpful to readers.I found @alexcrichton's counter-examples pretty helpful too. Giving the reader an idea of what can go wrong for the different unsafe
methods beyond just prose I think would help a lot (at least it certainly did for me). In general, due to the subtlety of this API, I would like to see the Safety sections of the various unsafe methods expanded, and possibly also referenced from the std::pin
module-level docs.
I haven't used this API much yet myself, so I'll trust the technical judgement that its in a good state now. However, I think the names Pin
, Pinned
and Unpin
are too similar and inexpressive for what are very different types/traits and a relatively complex API, which makes understanding it harder (as evidenced by a few comments on this thread).
They seem to follow the naming conventions for marker traits, so I can't really complain, but I wonder if we could make a tradeoff between verbosity and self-explaining names:
Pinned
- kind of confusing because its the same word as Pin
. Given that its used like PhantomData
to augment a struct with meta information that would be missing otherwise, how about PinnedData
, PhantomPinned
, PhantomSelfRef
or even DisableUnpin
? Something that indicates usage pattern and/or the effect it is going to have.Unpin
- Like in https://github.com/rust-lang/rust/issues/55766#issuecomment-437266922, I'm confused by the name Unpin
, because "un-pin" can be understood in multiple ambiguous ways. Something like IgnorePin
, PinNeutral
maybe?Generally I'm failing and finding good alternative names myself though...
PhantomPin and PinNeutral strike me as particularly nice names.
Just to provide a counterpoint, I've found Unpin
intuitive (once I understood it). Pinned
is harder to keep straight, granted I haven't used it in my own code. What about changing Pinned
to NotUnpin
?
@rfcbot fcp merge Feature name:
pin
Stabilization target: 1.32.0 Tracking issue: #49150 Related RFCs: rust-lang/rfcs#2349This is a proposal to stabilize the
pin
library feature, making the "pinning" APIs for manipulating pinned memory usable on stable.(I've tried to write this proposal as a comprehensive "stabilization report.")
Stabilized feature or APIs
[std|core]::pin::Pin
This stabilizes the
Pin
type in thepin
submodule ofstd
/core
.Pin
is a fundamental, transparent wrapper around a generic typeP
, which is intended to be a pointer type (for example,Pin<&mut T>
andPin<Box<T>>
are both valid, intended constructs). ThePin
wrapper modifies the pointer to "pin" the memory it refers to in place, preventing the user from moving objects out of that memory.The usual way to use the
Pin
type is to construct a pinned variant of some kind of owning pointer (Box
,Rc
, etc). The std library owning pointers all provide apinned
constructor which returns this. Then, to manipulate the value inside, all of these pointers provide a way to degrade towardPin<&T>
andPin<&mut T>
. Pinned pointers can deref, giving you back&T
, but cannot safely mutably deref: this is only possible using the unsafeget_mut
function.As a result, anyone mutating data through a pin will be required to uphold the invariant that they never move out of that data. This allows other code to safely assume that the data is never moved, allowing it to contain (for example) self references.
The
Pin
type will have these stabilized APIs:impl<P> Pin<P> where P: Deref, P::Target: Unpin
fn new(pointer: P) -> Pin<P>
impl<P> Pin<P> where P: Deref
unsafe fn new_unchecked(pointer: P) -> Pin<P>
fn as_ref(&self) -> Pin<&P::Target>
impl<P> Pin<P> where P: DerefMut
fn as_mut(&mut self) -> Pin<&mut P::Target>
fn set(&mut self, P::Target);
impl<'a, T: ?Sized> Pin<&'a T>
unsafe fn map_unchecked<U, F: FnOnce(&T) -> &U>(self, f: F) -> Pin<&'a U>
fn get_ref(self) -> &'a T
impl<'a, T: ?Sized> Pin<&'a mut T>
fn into_ref(self) -> Pin<&'a T>
unsafe fn get_unchecked_mut(self) -> &'a mut T
unsafe fn map_unchecked_mut<U, F: FnOnce(&mut T) -> &mut U>(self, f: F) -> Pin<&'a mut U>
impl<'a, T: ?Sized> Pin<&'a mut T> where T: Unpin
fn get_mut(self) -> &'a mut T
Trait impls
Most of the trait impls on
Pin
are fairly rote, these two are important to its operation:impl<P: Deref> Deref for Pin<P> { type Target = P::Target }
impl<P: DerefMut> DerefMut for Pin<P> where P::Target: Unpin { }
std::marker::Unpin
Unpin is a safe auto trait which opts out of the guarantees of pinning. If the target of a pinned pointer implements
Unpin
, it is safe to mutably dereference to it.Unpin
types do not have any guarantees that they will not be moved out of aPin
.This makes it as ergonomic to deal with a pinned reference to something that does not contain self-references as it would be to deal with a non-pinned reference. The guarantees of
Pin
only matter for special case types like self-referential structures: those types do not implementUnpin
, so they have the restrictions of thePin
type.Notable implementations of
Unpin
in std:impl<'a, T: ?Sized> Unpin for &'a T
impl<'a, T: ?Sized> Unpin for &'a mut T
impl<T: ?Sized> Unpin for Box<T>
impl<T: ?Sized> Unpin for Rc<T>
impl<T: ?Sized> Unpin for Arc<T>
These codify the notion that pinnedness is not transitive across pointers. That is, a
Pin<&T>
only pins the actual memory block represented byT
in a place. Users have occassionally been confused by this and expected that a type likePin<&mut Box<T>>
pins the data ofT
in place, but it only pins the memory the pinned reference actually refers to: in this case, theBox
's representation, which a pointer into the heap.std::marker::Pinned
The
Pinned
type is a ZST which does not implementUnpin
; it allows you to supress the auto-implementation ofUnpin
on stable, where!Unpin
impls would not be stable yet.Smart pointer constructors
Constructors are added to the std smart pointers to create pinned references:
Box::pinned(data: T) -> Pin<Box<T>>
Rc::pinned(data: T) -> Pin<Rc<T>>
Arc::pinned(data: T) -> Pin<Arc<T>>
Notes on pinning & safety
Over the last 9 months the pinning APIs have gone through several iterations as we have investigated their expressive power and also the soundness of their guarantees. I would now say confidently that the pinning APIs stabilized here are sound and close enough to the local maxima in ergonomics and expressiveness; that is, ready for stabilization.
One of the trickier issues of pinning is determining when it is safe to perform a pin projection: that is, to go from a
Pin<P<Target = Foo>>
to aPin<P<Target = Bar>>
, whereBar
is a field ofFoo
. Fortunately, we have been able to codify a set of rules which can help users determine if such a projection is safe:(Foo: Unpin) implies (Bar: Unpin)
: that is, if it is never the case thatFoo
(the containing type) isUnpin
whileBar
(the projected type) is notUnpin
.Bar
is never moved during the destruction ofFoo
, meaning that eitherFoo
has no destructor, or the destructor is carefully checked to make sure that it never moves out of the field being projected to.Foo
(the containing type) is notrepr(packed)
, because this causes fields to be moved around to realign them.Additionally, the std APIs provide no safe way to pin objects to the stack. This is because there is no way to implement that safely using a function API. However, users can unsafely pin things to the stack by guaranteeing that they never move the object again after creating the pinned reference.
The
pin-utils
crate on crates.io contains macros to assist with both stack pinning and pin projection. The stack pinning macro safely pins objects to the stack using a trick involving shadowing, whereas a macro for projection exists which is unsafe, but avoids you having to write the projection boilerplate in which you could possibly introduce other incorrect unsafe code.Implementation changes prior to stabilization
Unpin
from the prelude, removepin::Unpin
re-exportAs a general rule, we don't re-export things from multiple places in std unless one is a supermodule of the real definition (e.g. shortening
std::collections::hash_map::HashMap
tostd::collections::HashMap
). For this reason, the re-export ofstd::marker::Unpin
fromstd::pin::Unpin
is out of place.At the same time, other important marker traits like Send and Sync are included in the prelude. So instead of re-exporting
Unpin
from thepin
module, by putting in the prelude we make it unnecessary to importstd::marker::Unpin
, the same reason it was put intopin
.Currently, a lot of the associated function of
Pin
do not use method syntax. In theory, this is to avoid conflicting with derefable inner methods. However, this rule has not been applied consistently, and in our experience has mostly just made things more inconvenient. Pinned pointers only implement immutable deref, not mutable deref or deref by value, limiting the ability to deref anyway. Moreover, many of these names are fairly unique (e.g.map_unchecked
) and unlikely to conflict.Instead, we prefer to give the
Pin
methods their due precedence; users who need to access an interior method always can using UFCS, just as they would be required to to access the Pin methods if we did not use method syntax.get_mut_unchecked
toget_unchecked_mut
The current ordering is inconsistent with other uses in the standard library.
impl<P> Unpin for Pin<P>
This impl is not justified by our standard justification for unpin impls: there is no pointer direction between
Pin<P>
andP
. Its usefulness is covered by the impls for pointers themselves.This futures impl will need to change to add a
P: Unpin
bound.Pin
asrepr(transparent)
Pin should be a transparent wrapper around the pointer inside of it, with the same representation.
Connected features and larger milestones
The pin APIs are important to safely manipulating sections of memory which can be guaranteed not to be moved out. If the objects in that memory do not implement
Unpin
, their address will never change. This is necessary for creating self-referential generators and asynchronous functions. As a result, thePin
type appears in the standard libraryfuture
APIs and will soon appear in the APIs for generators as well (#55704).Stabilizing the
Pin
type and its APIs is a necessary precursor to stabilizing theFuture
APIs, which is itself a necessary precursor to stabilizing theasync/await
syntax and moving the entirefutures 0.3
async IO ecosystem onto stable Rust.cc @cramertj @RalfJung