rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
94.82k stars 12.22k forks source link

[Stabilization] Pin APIs #55766

Closed withoutboats closed 5 years ago

withoutboats commented 5 years ago

@rfcbot fcp merge Feature name: pin Stabilization target: 1.32.0 Tracking issue: #49150 Related RFCs: rust-lang/rfcs#2349

This is a proposal to stabilize the pin library feature, making the "pinning" APIs for manipulating pinned memory usable on stable.

(I've tried to write this proposal as a comprehensive "stabilization report.")

Stabilized feature or APIs

[std|core]::pin::Pin

This stabilizes the Pin type in the pin submodule of std/core. Pin is a fundamental, transparent wrapper around a generic type P, which is intended to be a pointer type (for example, Pin<&mut T> and Pin<Box<T>> are both valid, intended constructs). The Pin wrapper modifies the pointer to "pin" the memory it refers to in place, preventing the user from moving objects out of that memory.

The usual way to use the Pin type is to construct a pinned variant of some kind of owning pointer (Box, Rc, etc). The std library owning pointers all provide a pinned constructor which returns this. Then, to manipulate the value inside, all of these pointers provide a way to degrade toward Pin<&T> and Pin<&mut T>. Pinned pointers can deref, giving you back &T, but cannot safely mutably deref: this is only possible using the unsafe get_mut function.

As a result, anyone mutating data through a pin will be required to uphold the invariant that they never move out of that data. This allows other code to safely assume that the data is never moved, allowing it to contain (for example) self references.

The Pin type will have these stabilized APIs:

impl<P> Pin<P> where P: Deref, P::Target: Unpin

impl<P> Pin<P> where P: Deref

impl<P> Pin<P> where P: DerefMut

impl<'a, T: ?Sized> Pin<&'a T>

impl<'a, T: ?Sized> Pin<&'a mut T>

impl<'a, T: ?Sized> Pin<&'a mut T> where T: Unpin

Trait impls

Most of the trait impls on Pin are fairly rote, these two are important to its operation:

std::marker::Unpin

Unpin is a safe auto trait which opts out of the guarantees of pinning. If the target of a pinned pointer implements Unpin, it is safe to mutably dereference to it. Unpin types do not have any guarantees that they will not be moved out of a Pin.

This makes it as ergonomic to deal with a pinned reference to something that does not contain self-references as it would be to deal with a non-pinned reference. The guarantees of Pin only matter for special case types like self-referential structures: those types do not implement Unpin, so they have the restrictions of the Pin type.

Notable implementations of Unpin in std:

These codify the notion that pinnedness is not transitive across pointers. That is, a Pin<&T> only pins the actual memory block represented by T in a place. Users have occassionally been confused by this and expected that a type like Pin<&mut Box<T>> pins the data of T in place, but it only pins the memory the pinned reference actually refers to: in this case, the Box's representation, which a pointer into the heap.

std::marker::Pinned

The Pinned type is a ZST which does not implement Unpin; it allows you to supress the auto-implementation of Unpin on stable, where !Unpin impls would not be stable yet.

Smart pointer constructors

Constructors are added to the std smart pointers to create pinned references:

Notes on pinning & safety

Over the last 9 months the pinning APIs have gone through several iterations as we have investigated their expressive power and also the soundness of their guarantees. I would now say confidently that the pinning APIs stabilized here are sound and close enough to the local maxima in ergonomics and expressiveness; that is, ready for stabilization.

One of the trickier issues of pinning is determining when it is safe to perform a pin projection: that is, to go from a Pin<P<Target = Foo>> to a Pin<P<Target = Bar>>, where Bar is a field of Foo. Fortunately, we have been able to codify a set of rules which can help users determine if such a projection is safe:

  1. It is only safe to pin project if (Foo: Unpin) implies (Bar: Unpin): that is, if it is never the case that Foo (the containing type) is Unpin while Bar (the projected type) is not Unpin.
  2. It is only safe if Bar is never moved during the destruction of Foo, meaning that either Foo has no destructor, or the destructor is carefully checked to make sure that it never moves out of the field being projected to.
  3. It is only safe if Foo (the containing type) is not repr(packed), because this causes fields to be moved around to realign them.

Additionally, the std APIs provide no safe way to pin objects to the stack. This is because there is no way to implement that safely using a function API. However, users can unsafely pin things to the stack by guaranteeing that they never move the object again after creating the pinned reference.

The pin-utils crate on crates.io contains macros to assist with both stack pinning and pin projection. The stack pinning macro safely pins objects to the stack using a trick involving shadowing, whereas a macro for projection exists which is unsafe, but avoids you having to write the projection boilerplate in which you could possibly introduce other incorrect unsafe code.

Implementation changes prior to stabilization

As a general rule, we don't re-export things from multiple places in std unless one is a supermodule of the real definition (e.g. shortening std::collections::hash_map::HashMap to std::collections::HashMap). For this reason, the re-export of std::marker::Unpin from std::pin::Unpin is out of place.

At the same time, other important marker traits like Send and Sync are included in the prelude. So instead of re-exporting Unpin from the pin module, by putting in the prelude we make it unnecessary to import std::marker::Unpin, the same reason it was put into pin.

Currently, a lot of the associated function of Pin do not use method syntax. In theory, this is to avoid conflicting with derefable inner methods. However, this rule has not been applied consistently, and in our experience has mostly just made things more inconvenient. Pinned pointers only implement immutable deref, not mutable deref or deref by value, limiting the ability to deref anyway. Moreover, many of these names are fairly unique (e.g. map_unchecked) and unlikely to conflict.

Instead, we prefer to give the Pin methods their due precedence; users who need to access an interior method always can using UFCS, just as they would be required to to access the Pin methods if we did not use method syntax.

The current ordering is inconsistent with other uses in the standard library.

This impl is not justified by our standard justification for unpin impls: there is no pointer direction between Pin<P> and P. Its usefulness is covered by the impls for pointers themselves.

This futures impl will need to change to add a P: Unpin bound.

Pin should be a transparent wrapper around the pointer inside of it, with the same representation.

Connected features and larger milestones

The pin APIs are important to safely manipulating sections of memory which can be guaranteed not to be moved out. If the objects in that memory do not implement Unpin, their address will never change. This is necessary for creating self-referential generators and asynchronous functions. As a result, the Pin type appears in the standard library future APIs and will soon appear in the APIs for generators as well (#55704).

Stabilizing the Pin type and its APIs is a necessary precursor to stabilizing the Future APIs, which is itself a necessary precursor to stabilizing the async/await syntax and moving the entire futures 0.3 async IO ecosystem onto stable Rust.

cc @cramertj @RalfJung

withoutboats commented 5 years ago

@rfcbot fcp merge

rfcbot commented 5 years ago

Team member @withoutboats has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

alexcrichton commented 5 years ago

Thanks for the detailed writeup here @withoutboats! I've also sort of been historically confused by the various guarantees of Pin, and I've currently got a few questions about the APIs being stabilized here wrt the safety guarantees. To help sort this out in my own head though I figured I'd try to write these things down.

In trying to start writing this down though I keep running up against a wall of "what is Unpin?" I'm sort of confused by what that is and the various guarantees around it. Can you say again what it means for T to and also to not implement Unpin? Also, if Unpin is a safe trait to implement it naively seems like it could be used to easily undermine the unsafe guarantees of Pin<T>, but I'm surely missing something

tinaun commented 5 years ago

if i have got this correct, every type that is not self referential (ie: not a generator) is Unpin

Nemo157 commented 5 years ago

It's not just self-referentiality, there are some other use-cases for stable memory addresses that Pin can also support. They are relatively few and far-between though.

How I understand Unpin being safe to implement is that by implementing it you may violate an invariant required of other unsafe code you have written (crucially, only you may write this unsafe code, no external code can be relying on whether you have implemented Unpin). There is nothing you can do with Pin's safe API that will cause unsoundness whether or not you have implemented Unpin. By opting in to using some of Pin's unsafe API you are guaranteeing that you will only implement Unpin when it is safe to do so. That is covered by point 1 of the "Notes on pinning & safety" section above.

alexcrichton commented 5 years ago

Hm I still don't really understand Unpin. I'm at first just trying to understand what it means to implement or not to impelment Unpin.

First off, it's probably helpful to know what types implement Unpin automatically! It's mentioned above that common pointer types (Arc/Rc/Box/references) implement Unpin, but I think that's it? If this is an auto trait, does that mean that a type MyType automatically implements Unpin if it only contains pointers? Or do no other types automatically implement Unpin?

I keep trying to summarize or state what Unpin guarantees and such, but I'm finding it really difficult to do so. Can someone help out by reiterating again what it means to implement Unpin as well as what it means to not implement Unpin?

I think I understand the guarantees of Pin<P> where you can't move out of any of the inline members of P::Target, but is that right?

withoutboats commented 5 years ago

@alexcrichton Thanks for the questions, I'm sure the pinning APIs can be a bit confusing for people who haven't been part of the group focusing on them.

First off, it's probably helpful to know what types implement Unpin automatically!

Unpin is an auto trait like Send or Sync, so most types implement it automatically. Generators and the types of async functions are !Unpin. Types that could contain a generator or an async function body (in other words, types with type parameters) are also not automatically Unpin unless their type parameters are.

The explicit impls for pointer types is to make the Unpin even if their type parameters are not. The reason for this will hopefully be clearer by the end of this comment.

Can someone help out by reiterating again what it means to implement Unpin as well as what it means to not implement Unpin?

Here is the sort of fundamental idea of the pinning APIs. First, given a pointer type P, Pin<P> acts like P except that unless P::Target implements Unpin, it is unsafe to mutably dereference Pin<P>. Second, there are two basic invariants unsafe code related to Pin has to maintain:

The implication of all of this is that if you construct a Pin<P>, the value pointed to by P will never move again, which is the guarantee we need for self-referential structs as well as intrusive collections. But you can opt out of this guarantee by just implementing Unpin for your type.

So if you implement Unpin for a type, you're saying that the type opts out of any additional safety guarantees of Pin - its possible to mutably dereference the pointers pointing to it. This means you're saying the type does not need to be immovable to be used safely.

Moving a pointer type like Rc<T> doesn't move T because T is behind a pointer. Similarly, pinning a pointer to an Rc<T> (as in Pin<Box<Rc<T>>) doesn't actually pin T, it only pins that particular pointer. This is why anything which keeps its generics behind a pointer can implement Unpin even when their generics don't.

withoutboats commented 5 years ago

Also, if Unpin is a safe trait to implement it naively seems like it could be used to easily undermine the unsafe guarantees of Pin, but I'm surely missing something

This was one of the trickiest parts of the pinning API, and we got it wrong at first.

Unpin means "even once something has been put into a pin, it is safe to get a mutable reference to it." There is another trait that exists today that gives you the same access: Drop. So what we figured out was that since Drop is safe, Unpin must also be safe. Does this undermine the entire system? Not quite.

To actually implement a self-referential type would require unsafe code - in practice, the only self-referential types anyone cares about are those that the compiler generates for you: generators and async function state machines. These explicitly say they don't implement Unpin and they don't have a Drop implementation, so you know, for these types, once you have a Pin<&mut T>, they will never actually get a mutable reference, because they're an anonymous type that we know doesn't implement Unpin or Drop.

The problem emerges once you have a struct containing one of these anonymous types, like a future combinator. In order go from a Pin<&mut Fuse<Fut>> to a Pin<&mut Fut>, you have to perform a "pin projection." This is where you can run into trouble: if you pin project to the future field of a combinator, but then implement Drop for the combinator, you can move out of a field that is supposed to be pinned.

For this reason, pin projection is unsafe! In order to perform a pin projection without violating the pinning invariants, you have to guarantee that you never do several things, which I listed in the stabilization proposal.

So, the tl;dr: Drop exists, so Unpin must be safe. But this doesn't ruin the whole thing, it just means that pin projection is unsafe and anyone who wants to pin project needs to uphold a set of invariants.

Matthias247 commented 5 years ago

generators and async function state machines. These explicitly say they don't implement Unpin and they don't have a Drop implementation, so you know, for these types, once you have a Pin<&mut T>, they will never actually get a mutable reference, because they're an anonymous type that we know doesn't implement Unpin or Drop.

Shouldn't an async state machine have a Drop implementation? Things that are on the "stack" of the async function (which probably equals fields in the state machine) need to get destructed when the async function completes or gets cancelled. Or does this happen otherwise?

SimonSapin commented 5 years ago

I guess what matters in this context is whether an impl Drop for Foo {…} item exists, which would run code with &mut Foo which could use for example mem::replace to "exfiltrate" and move the Foo value.

This is not the same as "drop glue", which can be called through ptr::drop_in_place. Drop glue for a given Foo type will call Drop::drop if it’s implemented, then recursively call the drop glue for each field. But those recursive calls never involve &mut Foo.

Nemo157 commented 5 years ago

Also, while a generator (and therefore an async state machine) has custom drop glue, it's just to drop the correct set of fields based on the current state, it promises to never move any of the fields during drop.

withoutboats commented 5 years ago

The terminology I use (though I don't think there's any standard): "drop glue" is the compiler generated recursive walking of fields, calling their destructors; "Drop implementation" is an implementation of the Drop trait, and "destructor" is the combination of the drop glue and the Drop implementation. The drop glue never moves anything around, so we only care about Drop implementations.

Gankra commented 5 years ago

Is someone on the hook to write a nomicon chapter on futures? It seems incredibly necessary given how subtle this is. In particular I think examples, especially examples of buggy/bad implementations and how they're unsound, would be illuminating.

cramertj commented 5 years ago

@Gankro Sure, you can put me down for that.

Matthias247 commented 5 years ago

Thanks for the explanation everyone!

I personally am super new to async Rust and the Pin APIs, but I have played a bit around with it during the last days (https://github.com/rust-lang-nursery/futures-rs/pull/1315 - where I tried to exploit pinning for intrusive collections in async code).

During experimentation I got some concerns with these APIs:

While getting things to compile, I often thought about C++, where most of these issues are avoided: Types can be declared unmovable by deleting the move constructor and assignment operator. When a type is not moveable, types which contain that type are also not. Thereby the property and requirements flow natively through the type hierarchy and get checked by the compiler, instead of forwarding the property inside some calls (not all, e.g. not drop()). I think this is a lot easier to understand. But maybe not applicable to Rust, since Futures must currently often be moved before something starts polling them? But on the other hand that might be fixable with a new definition with that trait, or separating movement and polling phase.

Regarding Alex Unpin comment: I'm slowly grasping what Unpin means, but I agree that it's hard to understand. Maybe another name helps, but I can't find a concise one right now. Something along ThingsInsideDontBreakIfObjectGetsMoved, DoesntRequireStableAddress, PinDoesntMatter, etc.

What I haven't yet fully understood is, why getting &mut self from Pin<&mut self> can't be safe for all types. If Pin would just mean the address of the object itself is stable, then this should hold true for most typical Rust types. It seems there is another concern integrated into Pin, which is that also self-referential types inside it don't break. For those types manipulating them after having a Pin is always unsafe. But I think in most cases those will be compiler-generated, and unsafety or even raw pointers there should be fine. For a manually future combinators that need to forward pinned calls to subfields there would obviously need to be unsafe calls to create pins to the fields before polling those. But in that case I see the unsafety more related to the place where it actually matters (is the pin still valid?) than to accessing other fields for which it doesn't matter at all.

Another thought that I had is whether it's possible to get a simpler system if some limitations are applied. E.g. if things that require pinning could only be used inside async methods and not with future combinators. Maybe that could simplify things to one of Pin or Unpin. Considering that for lots of code async/await methods with select/join support might be favorable to combinators, this might not be the hugest loss. But I haven't really put enough thoughts into whether this really helps.

On the positive side: Being able to write async/await code with "stack" borrows is pretty cool and much needed! And ability to use pinning for other use-cases, like intrusive collections, will help performance as well as targets like embedded systems or kernels. So I'm really looking forward to a solution on this.

glaebhoerl commented 5 years ago

Minor nit w.r.t. the stabilization report:

fn get_unchecked_mut(self) -> &'a mut T

I'm guessing this is actually an unsafe fn?

tikue commented 5 years ago

@Matthias247 you can't safely get &mut T from Pin<P<T>> if T isn't Unpin because then you could mem::swap to move T out of the Pin, which would defeat the purpose of pinning things.

One thing I have trouble explaining to myself is what makes Future fundamentally different from other traits such that Pin needs to be part of its API. I mean, I know intuitively it's because async/await requires pinning, but does that say something specifically about Futures that's different from, say, Iterators?

Could poll take &mut self and only implement Future for Pin<P> types or types that are Unpin?

Nemo157 commented 5 years ago

How can I access mutable fields in my method that takes a Pin as a parameter.

This actually makes me wonder whether there's a couple of missing methods on Pin,

impl<'a, T: ?Sized> Pin<&'a T> {
    fn map<U: Unpin, F: FnOnce(&T) -> &U>(self, f: F) -> Pin<&'a U>
}

impl<'a, T: ?Sized> Pin<&'a mut T> {
    fn map<U: Unpin, F: FnOnce(&mut T) -> &mut U>(self, f: F) -> Pin<&'a mut U>
}

These could be used in the pin_utils::unsafe_unpinned! macro.

I'm trying to work out why this macro claims to be unsafe. If we are !Unpin and have a Unpin field, why would it be unsafe to project to this field?

The only situation where I can see it being unsound is implementing a custom !Unpin type, taking a raw pointer to an Unpin field of self (and relying on it having a stable address/pointing to the same instance), then acquiring an &mut to the same field and passing it to an external function. This seems to fall under the same reasoning of why implementing Unpin is safe though, by taking a raw pointer to a field of a !Unpin you are opting out of being able to call some of the safe APIs.

To make the previous situation safer there could be a wrapper built on Pinned for marking which normally Unpin fields of a struct are actually !Unpin instead of just adding a Pinned to the struct as a whole:

pub struct MustPin<T: Unpin>(T, Pinned);

impl<T: Unpin> MustPin<T> {
    pub const fn new(t: T) -> Self { ... }
    pub fn get(self: Pin<&Self>) -> *const T { ... }
    pub fn get_mut(self: Pin<&mut Self>) -> *mut T { ... }
}

This all seems backwards compatible with the current API, it could remove some of the unsafety from futures-rs combinators (e.g. it would give safe access to extra fields like this one), but isn't necessary for the current usecases. We could probably experiment with some APIs like this for implementing custom !Unpin types (like intrusive collections) and add them later on.

withoutboats commented 5 years ago

@Nemo157 Those map functions are unsafe because I could mem::swap on the &mut T the function passes me before returning an &mut U. (well the mutable one is, the immutable one might not be unsafe)

EDIT: Also the pin-utils macro is different, unsafe_unpinned doesnt have to do with the target type being Unpin, its just an "unpinned projection" - a projection to a &mut Field. Its safe to use as long as you never pin project to that field, even if the field is !Unpin.

withoutboats commented 5 years ago

One thing I have trouble explaining to myself is what makes Future fundamentally different from other traits such that Pin needs to be part of its API. I mean, I know intuitively it's because async/await requires pinning, but does that say something specifically about Futures that's different from, say, Iterators?

There's no theoretical difference, but there are pragmatic ones. For one, Iterator is stable. But as a result, unless we figure out something very clever, you'll never be able to run a for loop over a self-referential generator without a double indirection, even though that should be completely safe without it (because the for loop will consume and never move the generator).

Another important pragmatic difference is that the code patterns between Iterator and Future are quite different. You can't go 10 minutes without wanting to borrow across an await point in a future, which is why on futures 0.1 you see these Arcs appearing all over the place so you can use the same variable in two different and_then calls. But we've gotten quite far without being able to express self-referential iterators at all, it's just not that important of a use case.

Nemo157 commented 5 years ago

Those map functions are unsafe because I could mem::swap on the &mut T the function passes me before returning an &mut U

Ah, damn, forgot about that part of it :frowning:

RalfJung commented 5 years ago

fn as_mut(&mut self) -> Pin<&mut P::Target> fn into_ref(self) -> Pin<&'a T>`

Should these be called as_pinned_mut and into_pinned_ref, to avoid confusing them with methods that actually return references?

Notable implementations of Unpin in std:

We also have impl<P> Unpin for Pin<P> {}. Should this maybe become impl<P: Unpin> Unpin for Pin<P> {}? For the types we use this has no effect and it seems a bit safer? After all, Pin itself does not add a ptr indirection, so the argument about pinning not being structural does not apply. Never mind, you have that on your list. ;)

Drop guarantee

I think we have to codify the Drop guarantee before stabilization, or else it will be too late:

It is illegal to invalidate the storage of a pinned object (the target of a Pin reference) without calling drop on the object.

"Invalidate" here can mean "deallocation", but also "repurposing": When changing x from Ok(foo) to Err(bar), the storage of foo gets invalidated.

This has the consequence, for example, that it becomes illegal to deallocate a Pin<Box<T>>, Pin<Rc<T>> or Pin<Arc<T>> without first calling drop::<T>.

Repurposing Deref, DerefMut

I still feel slightly uneasy about how this repurposes the Deref trait to mean "this is a smart pointer". We use Deref for other things as well, e.g. for "inheritance". That might be an anti-pattern, but it is still in fairly common use and, frankly, it is useful. :D

I think this is not a soundness issue though.

Understanding Unpin

Shameless plug: I wrote two blog posts on this, that might help or not depending on how much you enjoy reading (semi-)formal syntax. ;) The APIs have changed since then, but the basic idea has not.

Safe pin projections

@Matthias247 I think one problem you are running into is that building abstractions that involve pinning currently almost always requires unsafe code. Using those abstractions is fine, but e.g. defining a future combinator in safe code won't work. The reason for this is that "pin projections" have constraints that we could only safely check with compiler changes. For example, you ask

Why does my drop() method now again take &mut self, instead of a Pin.

Well, drop() is old -- it exists since Rust 1.0 -- so we cannot change it. We'd love to just make it take Pin<&mut Self>, and then Unpin types could get their &mut like they do now, but that's a non-backwards-compatible change.

To make writing future combinators safe, we'd need safe pin projections, and for that we have to change the compiler, this cannot be done in a library. We could almost do it in a derive proc_macro, except that we'd need a way to assert "this type does not have any implementation of Drop or Unpin".

I think it'd be worth the effort to figure out how to get such a safe API for pin projections. However, that does not have to block stabilization: The API we stabilize here should be forward compatible with safe pin projections. (Unpin might have to become a lang item to implement the above assertion, but that doesn't seem too bad.)

Pinning is not !Move

I often thought about C++, where most of these issues are avoided: Types can be declared unmovable by deleting the move constructor and assignment operator. When a type is not moveable, types which contain that type are also not.

There were several attempts at defining unmovable types for Rust. It gets very complicated very quickly.

One important thing to understand is that pinning does not provide unmovable types! All types remain movable in Rust. If you have a T, you can move it anywhere you want. Instead of unmovable types, we make use of Rust's ability to define new encapsulated APIs, as we define a pointer type from which you cannot move out. All the magic is in the pointer type (Pin<&mut T>), not in the pointee (T). There is no way for a type to say "I cannot get moved ever". However, there is a way for a type to say "If I got pinned, don't move me again". So a MyFuture that I own will always be movable, but Pin<&mut MyFuture> is a pointer to an instance of MyFuture that cannot get moved any more.

This is a subtle point, and we probably should spend some time fleshing out the docs here to help avoid this very common misunderstanding.

SimonSapin commented 5 years ago

But we've gotten quite far without being able to express self-referential iterators at all, it's just not that important of a use case.

This is because, so far, all iterators are defined using a type and an impl Iterator for … block. When state needs to be kept between two iterations, there is no choice but to stash it in fields of the iterator type and work with &mut self.

However, even if this is not the primary motivation for including generators in the language, it would be very nice to eventually be able to use generator yield syntax to define something that can be used with a for loop. As soon as that exists, borrowing across yield because just as important for generator-iterators as it is for generator-futures.

Nemo157 commented 5 years ago

(Unpin might have to become a lang item to implement the above assertion, but that doesn't seem too bad.)

Unpin and Pin already have to be lang items to support safe immovable generators.

alexcrichton commented 5 years ago

Ok thanks for the explanations all! I agree with @Gankro that a nomicon-style chapter about Pin and such would be massively useful here. I suspect that there's a lot of development history for why various safe methods exist and why unsafe methods are unsafe and such.

In an effort to help myself understand this though I wanted to try again to write down why each function is safe or why it's unsafe. With the understanding from above I've got all the following, but there's a few questions here and there. If others can help me fill this in that'd be great! (or help point out where my thinking is wrong)

Matthias247 commented 5 years ago

@Matthias247 you can't safely get &mut T from Pin<P> if T isn't Unpin because then you could mem::swap to move T out of the Pin, which would defeat the purpose of pinning things.

Thanks! Yes, since swap is no unsafe method this is obviously problematic. But could that be fixed by adding an Unpin bound to swap()? Since all code up to now should be Unpin or is unsafe anyway, this shouldn't break anything.

I guess one of the things that still confuses me most is that Pin<T> codifies multiple guarantees: That the address of the T is stable, as well as some guarantees about it's internal state (it's kind of frozen/immutable in some situations, but also not really).

It seemed to me like moving the unsafe code/projections only to the places where further Pin based calls are needed (e.g. in polls on fields) might be preferable to dealing with those unsafe projections in all cases. However I now also realized that there is another issue with that: Code that has access to the mutable reference can move the field freely around, and when drop() is then safely called on this field, it might break due to it's address being stored somewhere else. For that to the fixed the drop(Pin<T>) overload that was talked about would be required.

@RalfJung Thanks for the explanations! I agree on the fact that I certainly tried to do something unsafe in general, and therefore it should be fine to require the extra understanding from me. I'm more concerned about people who want to write more generally safe future combinators, but now also might get confronted with all of those terms. If they can write combinators without having to understand Unpin and pin projections at all, and then only get combinators which work in a reduced fashion (only on Unpin futures), that seems preferable. Since I haven't tried it I can't say if it's currently the case. I think one still needs to add at least Unpin bounds manually.

I also understand the fact that non-moveable types are different from the pin type. However I am currently more focused on use-cases than the difference. And for the use-case of intrusive collections non-moveable types work very well without introducing too much complexity. For futures obviously some research would be required, since they way from a moveable to a non-moveable type is missing. If that way would be not more ergonomic than the Pin APIs, there would also be no win.

tikue commented 5 years ago

Adding T: Unpin to mem::swap would mean it couldn't be used on certain types even when they're not inside a Pin.

RalfJung commented 5 years ago

But could that be fixed by adding an Unpin bound to swap()? Since all code up to now should be Unpin or is unsafe anyway, this shouldn't break anything.

That would break all generic code: If you write a function in today's stable rust for some T without constraints, you can call swap on it. This has to keep working.

non-moveable types work very well without introducing too much complexity.

Nobody has demonstrated a way to add non-movable types to Rust in a backwards-compatible way without complexity significantly exceeding what is being proposed for stabilization here. You should find some of these old proposals and discussions in the RFC repo. See https://github.com/rust-lang/rfcs/pull/1858 for one example.

The RFC at https://github.com/rust-lang/rfcs/pull/2349 as well as boat's blog series starting here should help to give you some background and some impression of the other designs that were considered. (Also notice the dates, this design has been in the works for almost 10 months!)

Gankra commented 5 years ago

Also mem::swap is a red herring, because it isn't at all an interesting function. It's literally just

let temp = *a; 
*a = *b; 
*b = temp;
tikue commented 5 years ago

@Gankro doesn't it use unsafe code? Afaik it's not possible to write that literally.

Edit: I guess another way to think about this is that adding T: Unpin to mem::swap is actually changing the definition of safety at the language level. It would break all the mycrate::swap fns in the wild.

Matthias247 commented 5 years ago

Adding T: Unpin to mem::swap would mean it couldn't be used on certain types even when they're not inside a Pin.

If Unpin is automatically derived (in the same fashion as Sync/Send are) then I thought this shouldn't be an issue.

That would break all generic code: If you write a function in today's stable rust for some T without constraints, you can call swap on it. This has to keep working.

But this one obviously is. Didn't thought about the fact that trait bounds need to be explicitly propagated in Rust.

Nobody has demonstrated a way to add non-movable types to Rust in a backwards-compatible way without complexity significantly exceeding what is being proposed for stabilization here. You should find some of these old proposals and discussions in the RFC repo. See rust-lang/rfcs#1858 for one example.

Thanks, I will read up a bit more on the previous work on this if I find some time. Obviously lots of thoughts and effort already have went into this, and I certainly don't want to block things. I just wanted to provide my concerns and questions when trying to work with this.

tikue commented 5 years ago

@Matthias247

If Unpin is automatically derived (in the same fashion as Sync/Send are) then I thought this shouldn't be an issue.

I'm not sure if I was clear. To clarify, it's perfectly safe to move around a !Unpin type and thus perfectly safe to mem::swap them around. It's only unsafe to move a type T: !Unpin after it's pinned, i.e. inside of a Pin<P<T>> .

For example, in async/await code, you can move around futures returned from an async function as much as you want. You only stop being able to move them once you pin_mut! them or put them in a Pin<Box<..>>> or whatnot.

Centril commented 5 years ago

I have some assorted questions about this:

  1. Given the importance of Pin<T> for async and for generators and potential for unsoundness when interacting with other parts of Rust (e.g. swap & replace), has any formal verification (e.g. by @jhjourdan or @RalfJung) been done of the variant of the pinning API that is proposed here?

  2. Does this API give any new guarantees about the abstract machine / operational semantics of Rust? I.e., if we forget about the use case as a backing mechanism for async & await / generators, could we put this inside a crate in the ecosystem and it would just work given the current guarantees we give?

  3. What sort of APIs or type system additions become impossible as a consequence of stabilizing the pinning API? (this question is an extension of 2.)

  4. What avenues does the to-be-stabilized API provide in terms of evolving it a language provided &pin T type to improve upon field projection and such (which doesn't seem great with the proposed API).

I have some assorted notes:

  1. The documentation in the standard library seems quite sparse. :(

  2. I agree with others that the pinning construct is quite mentally taxing / complex.

  3. The Unmovable example in the documentation seems overly complicated and involves unsafe; this seems sub-optimal. With gradual initialization as elaborated on in a draft RFC in the language (i.e. improving upon NLL) could instead give us:

struct Unmovable<'a> {
    data: String,
    slice: &'a str,
}

let um: Unmovable<'_>;
um.data = "hello".to_string();
um.slice = &um.data; // OK! we borrow self-referentially.

drop(um); // ERROR! `um.slice` is borrowing `um.data` so you cannot move `um`.

// You won't be able to take a &mut reference to `um` so no `swap` problems.

This involves zero unsafe and it's quite easy for the user to deal with this.

pcpthm commented 5 years ago

Additionally, the std APIs provide no safe way to pin objects to the stack. This is because there is no way to implement that safely using a function API.

What about API like this?

pub fn using_pin<T, R, F>(value: T, f: F) -> R
where F: for<'a> FnOnce(Pin<&'a mut T>) -> R {
    pin_mut!(value);    // Actual implementation inlines this but the point is this API is safe as long as pin_mut! is safe.
    f(value)
}
davidtwco commented 5 years ago

I’ve not been following along with the development of the pinning APIs too closely, so this could have been mentioned or explained elsewhere and I’ve been unable to find it, apologies if that is the case:

The Pinned type is a ZST which does not implement Unpin; it allows you to supress the auto-implementation of Unpin on stable, where !Unpin impls would not be stable yet.

Is there an explanation anywhere about why !Unpin impls couldn’t be made stable?

nrc commented 5 years ago

It is only safe if Foo (the containing type) is not repr(packed), because this causes fields to be moved around to realign them.

Does packed mean that fields can be moved dynamically? That is a bit scary. Are we sure that llvm will never generate code to move fields in other circumstances? Likewise is it possible that pinned values on the stack might be moved by llvm?

nrc commented 5 years ago

Despite the subtlety, this seems like a really nice API. Good work!

withoutboats commented 5 years ago

@Centril Ralf has written about pinning on his blog.

The pin APIs involved no language changes at all, and are entirely implemented in the standard library using pre-existing language features. It has no impact on Rust the language and doesn't foreclose on any other language feature.

Pin is really just a clever implementation of one of Rust's most valuable and classic features: the ability to introduce invariants to an API by marking part of the API unsafe. Pin wraps a pointer to make one operation (DerefMut) unsafe, requiring that people who do it uphold certain invariants (that they don't move out of the reference), and allowing other code to assume that this will never occur. A similar, much older example of this same trick is String, which makes it unsafe to put non-UTF8 bytes into a string, allowing other code to assume that all the data in the String is UTF8.

withoutboats commented 5 years ago

Is there an explanation anywhere about why !Unpin impls couldn’t be made stable?

Negative impls are currently unstable, totally unrelated to these APIs.

davidtwco commented 5 years ago

Negative impls are currently unstable.

🤦‍♂️ That makes sense, should have thought about that for a little bit longer. Thanks.

RalfJung commented 5 years ago

@alexcrichton That's some great analysis of this API, we should try to preserve it some place better than some comment that will get lost!

Some comments:

as_mut: Question: what about a "malicious" DerefMut impl? This is a safey way to call a user-provided DerefMut which crates &mut P::Target natively, presumably allowing it to modify it as well. How's this safe?

Basically, when you call new_unchecked, you make a promise about the Deref and DerefMut implementations for this type.

set: Given Pin

(and the fact that we don't know anything about Unpin), shouldn't this guarantee that P::Target never moves? If we can reinitialize it with another P::Target, though, shouldn't this be unsafe? Or is this somehow related to destructors and all that?

This drops the old content of the pointer, and puts new content in there. "Pinned" doesn't mean "never dropped", it means "never moved until dropped". So Dropping it or overwriting it while calling drop is fine. Calling drop is crucial, that's the drop guarantee I mentioned above.

Where do you see something moving here?

map_unchecked: Question:: what's the counter-example here? If this were safe, what's the example that shows violating the guarantees of Pin?

An example would be starting with a Pin<&&T> and using this method to get a Pin<&T>. Pinning does not "propagate through" references.

get_ref: One "maybe gotcha" here is interior mutability, what if T were RefCell?

Indeed, this is a gotcha, but as you observed not a soundness problem. What would be unsound is having a method that goes from Pin<RefCell<T>> to Pin<&[mut] T>. Basically, what happens is that RefCell does not propagate pinning, we could impl<T> Unpin for RefCell<T>.

RalfJung commented 5 years ago

has any formal verification (e.g. by @jhjourdan or @RalfJung) been done of the variant of the pinning API that is proposed here?

No, not from our side. The blog posts I mentioned above contain some thoughts for how I'd start formalizing this, but we haven't actually gone through and done it. If you give me a time machine or an interested PhD student we'll do it. ;)

if we forget about the use case as a backing mechanism for async & await / generators, could we put this inside a crate in the ecosystem and it would just work given the current guarantees we give?

That is the intention.

What sort of APIs or type system additions become impossible as a consequence of stabilizing the pinning API? (this question is an extension of 2.)

Uh, I have no idea how to answer that question with any confidence. The space of possible additions is just too large and also too high-dimensional that I'd dare making any kind of universal statement about it.

What avenues does the to-be-stabilized API provide in terms of evolving it a language provided &pin T type to improve upon field projection and such (which doesn't seem great with the proposed API).

I am not sure any more about &pin T, it doesn't match very well the new generic Pin<T>. In terms of handling projections, we'd need some hack to say "Unpin and Drop have not been implemented for this type", then we could do this safely with a macro. For additional ergonomics, we'd likely want a generic "field projection" capability in the language, one that would also cover going from &Cell<(A, B)> to &Cell<A>.

Is there an explanation anywhere about why !Unpin impls couldn’t be made stable?

AFAIK negative impls have some long-standing limitations, and if you add generic bounds to them they will often not work the way you think they should. (Generic bounds sometimes just get ignored, or so.)

Maybe Chalk fixes all of this, maybe just some of it, but either way we'd probably not want to block this on Chalk.

Does packed mean that fields can be moved dynamically? That is a bit scary.

It is the only sound way to call drop on a packed field, given that drop expects an aligned reference.

Are we sure that llvm will never generate code to move fields in other circumstances? Likewise is it possible that pinned values on the stack might be moved by llvm?

LLVM copying bytes around should be no problem, as it cannot change program behavior. This is about a higher-level concept of "moving" data, in a way that is observable in Rust (e.g. because pointers to the data no longer point to it). LLVM cannot just move data elsewhere that we may have pointers to. Likewise, LLVM cannot just move values on the stack that have their address taken.

alexcrichton commented 5 years ago

Ok thanks @RalfJung, makes sense! Some follow-up questions...

When you say "never moved until dropped", this means that Drop may be called where &mut self has moved from the address of Pin<&mut Self>? Destructors can't rely on interior pointers being accurate, right?

My worry when looking at set was how panics would be handled, but I think that this is a non-issue after digging into the codegen a bit more.

Nemo157 commented 5 years ago

When you say "never moved until dropped", this means that Drop may be called where &mut self has moved from the address of Pin<&mut Self>? Destructors can't rely on interior pointers being accurate, right?

@alexcrichton my understanding was that means never moved until after Drop::drop returns. Otherwise some usecases (intrusive collections and stack allocated DMA buffers at least) become impossible.

withoutboats commented 5 years ago

Destructors can rely on something never having been moved if they can prove it was previously in a Pin. For example if a state machine can only enter a state through an API which requires it to be pinned, the destructor can assume that it was pinned before it was dropped if it is in that state.

Code cannot assume that nonlocal destructors do not move members, but obviously you can assume your own destructors don't move things around because you're the one writing them.

RalfJung commented 5 years ago

When you say "never moved until dropped", this means that Drop may be called where &mut self has moved from the address of Pin<&mut Self>? Destructors can't rely on interior pointers being accurate, right?

I mean that the data will never move (in the sense of not going anywhere else, including not being deallocated) until drop was called. And yes, drop can rely on running on the pinned location, even though its type cannot express that. drop should take Pin<&mut self> (for all types), but alas, too late for that.

After drop was called, the data is just meaningless bytes, you may do anything with them: Put new stuff in there, deallocate, whatever.

This allows, for example, an intrusive linked list where the destructor of an element deregisters it by adjusting the neighboring pointers. We know the memory will not go away without that destructor being called. (The element can still be leaked, and then it will just stay in that linked list forever. But in that case it will remain valid memory so there is no safety issue.)

jonhoo commented 5 years ago

I've been reading through all I can find on Pin, Unpin, and the discussion leading up to it all, and while I think I now understand what's going on, there is also a lot of subtlety around what the different types mean, as well as the contract that implementors and users must follow. Sadly, I see relatively little discussion of that in the docs for std::pin or on Pin and Unpin specifically. In particular, I'd love to see some of the content from these comments:

incorporated into the docs. Specifically:

I found @alexcrichton's counter-examples pretty helpful too. Giving the reader an idea of what can go wrong for the different unsafe methods beyond just prose I think would help a lot (at least it certainly did for me). In general, due to the subtlety of this API, I would like to see the Safety sections of the various unsafe methods expanded, and possibly also referenced from the std::pin module-level docs.

Kimundi commented 5 years ago

I haven't used this API much yet myself, so I'll trust the technical judgement that its in a good state now. However, I think the names Pin, Pinned and Unpin are too similar and inexpressive for what are very different types/traits and a relatively complex API, which makes understanding it harder (as evidenced by a few comments on this thread).

They seem to follow the naming conventions for marker traits, so I can't really complain, but I wonder if we could make a tradeoff between verbosity and self-explaining names:

Generally I'm failing and finding good alternative names myself though...

mark-i-m commented 5 years ago

PhantomPin and PinNeutral strike me as particularly nice names.

tikue commented 5 years ago

Just to provide a counterpoint, I've found Unpin intuitive (once I understood it). Pinned is harder to keep straight, granted I haven't used it in my own code. What about changing Pinned to NotUnpin?