rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.57k stars 12.47k forks source link

Tracking issue for RFC 3519: `arbitrary_self_types` #44874

Open arielb1 opened 6 years ago

arielb1 commented 6 years ago

This is the tracking issue for RFC 3519: Arbitrary self types v2.

The feature gate for this issue is #![feature(arbitrary_self_types)].

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Unresolved Questions

None.

Related

Implementation history

TODO.


(Below follows content that predated the accepted Arbitrary Self Types v2 RFC.)

Object Safety

See https://github.com/rust-lang/rust/issues/27941#issuecomment-332157526

Handling of inference variables

Calling a method on *const _ could now pick impls of the form

impl RandomType {
    fn foo(*const Self) {}
}

Because method dispatch wants to be "limited", this won't really work, and as with the existing situation on &_ we should be emitting an "the type of this value must be known in this context" error.

This feels like fairly standard inference breakage, but we need to check the impact of this before proceeding.

Safe virtual raw pointer methods

e.g. this is UB, so we might want to force the call <dyn Foo as Foo>::bar to be unsafe somehow - e.g. by not allowing dyn Foo to be object safe unless bar was an unsafe fn

trait Foo {
    fn bar(self: *const Self);
}

fn main() {
    // creates a raw pointer with a garbage vtable
    let foo: *const dyn Foo = unsafe { mem::transmute([0usize, 0x1000usize]) };
    // and call it
    foo.bar(); // this is UB
}

However, even today you could UB in safe code with mem::size_of_val(foo) on the above code, so this might not be actually a problem.

More information

There's no reason the self syntax has to be restricted to &T, &mut T and Box<T>, we should allow for more types there, e.g.

trait MyStuff {
    fn do_async_task(self: Rc<Self>);
}

impl MyStuff for () {
    fn do_async_task(self: Rc<Self>) {
        // ...
    }
}

Rc::new(()).do_async_stuff();
porky11 commented 6 years ago

Why would you need this? Why wouldn't you write an impl like this:

impl MyStuff for Rc<()> {
    fn do_async_task(self) {
        // ...
    }
}

I'd rather define the trait different. Maybe like this:

trait MyStuff: Rc {
    fn do_async_task(self);
}

In this case, Rc would be a trait type. If every generic type implemented a specific trait (this could be implemented automatically for generic types) this seems more understandable to me.

cuviper commented 6 years ago

This could only be allowed for trait methods, right?

For inherent methods, I can't impl Rc<MyType>, but if impl MyType can add methods with self: Rc<Self>, it seems like that would enable weird method shadowing.

arielb1 commented 6 years ago

@cuviper

This is still pending lang team decisions (I hope there will be at least 1 RFC) but I think it will only be allowed for trait method impls.

arielb1 commented 6 years ago

@porky11

You can't implement anything for Rc<YourType> from a crate that does not own the trait.

arielb1 commented 6 years ago

So changes needed:

SimonSapin commented 6 years ago

I’ll look into this.

arielb1 commented 6 years ago

Note that this is only supported to work with trait methods (and trait impl methods), aka

trait Foo {
    fn foo(self: Rc<Self>);
}
impl Foo for () {
    fn foo(self: Rc<Self>) {}
}

and is NOT supposed to work for inherent impl methods:

struct Foo;
impl Foo {
    fn foo(self: Rc<Self>) {}
}
SimonSapin commented 6 years ago

I got caught in some more Stylo work that's gonna take a while, so if someone else wants to work on this in the meantime feel free.

kennytm commented 6 years ago

Is this supposed to allow any type as long as it involves Self? Or must it impl Deref<Target=Self>?

trait MyStuff {
    fn a(self: Option<Self>);
    fn b(self: Result<Self, Self>);
    fn c(self: (Self, Self, Self));
    fn d(self: Box<Box<Self>>);
}

impl MyStuff for i32 {
   ...
}

Some(1).a();  // ok?
Ok(2).b();  // ok?
(3, 4, 5).c(); // ok?
(box box 6).d(); // ok?
mikeyhew commented 6 years ago

I've started working on this issue. You can see my progress on this branch

mikeyhew commented 6 years ago

@arielb1 You seem adamant that this should only be allowed for traits and not structs. Aside from method shadowing, are there other concerns?

arielb1 commented 6 years ago

@mikeyhew

inherent impl methods are loaded based on the type. You shouldn't be able to add a method to Rc<YourType> that is usable without any use statement.

arielb1 commented 6 years ago

That's it, if you write something like

trait Foo {
    fn bar(self: Rc<Self>);
}

Then it can only be used if the trait Foo is in-scope. Even if you do something like fn baz(self: u32); that only works for modules that use the trait.

If you write an inherent impl, then it can be called without having the trait in-scope, which means we have to be more careful to not allow these sorts of things.

mikeyhew commented 6 years ago

@arielb1 Can you give an example of what we want to avoid? I'm afraid I don't really see what the issue is. A method you define to take &self will still be callable on Rc<Self>, the same as if you define it to take self: Rc<Self>. And the latter only affectsRc<MyStruct>, not Rc<T> in general.

mikeyhew commented 6 years ago

I've been trying to figure out how we can support dynamic dispatch with arbitrary self types. Basically we need a way to take a CustomPointer<Trait>, and do two things: (1) extract the vtable, so we can call the method, and (2) turn it into a CustomPointer<T> without knowing T.

(1) is pretty straightforward: call Deref::deref and extract the vtable from that. For (2), we'll effectively need to do the opposite of how unsized coercions are implemented for ADTs. We don't know T, but we can can coerce to CustomPointer<()>, assuming CustomPointer<()> has the same layout as CustomPointer<T> for all T: Sized. (Is that true?)

The tough question is, how do we get the type CustomPointer<()>? It looks simple in this case, but what if CustomPointer had multiple type parameters and we had a CustomPointer<Trait, Trait>? Which type parameter do we switch with ()? In the case of unsized coercions, it's easy, because the type to coerce to is given to us. Here, though, we're on our own.

@arielb1 @nikomatsakis any thoughts?

nikomatsakis commented 6 years ago

@arielb1

and is NOT supposed to work for inherent impl methods:

Wait, why do you not want it work for inherent impl methods? Because of scoping? I'm confused. =)

nikomatsakis commented 6 years ago

@mikeyhew

I've been trying to figure out how we can support dynamic dispatch with arbitrary self types.

I do want to support that, but I expected it to be out of scope for this first cut. That is, I expected that if a trait uses anything other than self, &self, &mut self, or self: Box<Self> it would be considered no longer object safe.

mikeyhew commented 6 years ago

@nikomatsakis

I do want to support that, but I expected it to be out of scope for this first cut.

I know, but I couldn't help looking into it, it's all very interesting to me :)

arielb1 commented 6 years ago

Wait, why do you not want it work for inherent impl methods? Because of scoping? I'm confused. =)

We need some sort of "orphan rule" to at least prevent people from doing things like this:

struct Foo;
impl Foo {
    fn frobnicate<T>(self: Vec<T>, x: Self) { /* ... */ }
}

Because then every crate in the world can call my_vec.frobnicate(...); without importing anything, so if 2 crates do this there's a conflict when we link them together.

Maybe the best way to solve this would be to require self to be a "thin pointer to Self" in some way (we can't use Deref alone because it doesn't allow for raw pointers - but Deref + deref of raw pointers, or eventually an UnsafeDeref trait that reifies that - would be fine).

I think that if we have the deref-back requirement, there's no problem with allowing inherent methods - we just need to change inherent method search a bit to also look at defids of derefs. So that's probably a better idea than restricting to trait methods only.

Note that the CoerceSized restriction for object safety is orthogonal if we want allocators:

struct Foo;
impl Tr for Foo {
    fn frobnicate<A: Allocator+?Sized>(self: RcWithAllocator<Self, A>) { /* ... */ }
}

Where an RcWithAllocator<Self, A> can be converted to a doubly-fat RcWithAllocator<Tr, Allocator>.

mikeyhew commented 6 years ago

@arielb1

Because then every crate in the world can call my_vec.frobnicate(...); without importing anything, so if 2 crates do this there's a conflict when we link them together.

Are saying is that there would be a "conflicting symbols for architechture x86_64..." linker error?

Maybe the best way to solve this would be to require self to be a "thin pointer to Self" in some way (we can't use Deref alone because it doesn't allow for raw pointers - but Deref + deref of raw pointers, or eventually an UnsafeDeref trait that reifies that - would be fine).

I'm confused, are you still talking about frobnicate here, or have you moved on to the vtable stuff?

arielb1 commented 6 years ago

I'm confused, are you still talking about frobnicate here, or have you moved on to the vtable stuff?

The deref-back requirement is supposed to be for everything, not only object-safety. It prevents the problem when one person does

struct MyType;
impl MyType {
    fn foo<T>(self: Vec<(MyType, T)>) { /* ... */ }
}   

While another person does

struct OurType;
impl OurType {
    fn foo<T>(self: Vec<(T, OurType)>) {/* ... */ }
}   

And now you have a conflict on Vec<(MyType, OurType)>. If you include the deref-back requirement, there is no problem with allowing inherent impls.

bstrie commented 6 years ago

@arielb1 , you suggest that the point of this is to get around the coherence rules, but I can't see how this wouldn't fall afoul of the same incoherence that the orphan rules are designed to prevent. Can you explain further?

Secondly, the syntax is misleading. Given that fn foo(&mut self) allows one to write blah.foo() and have it desugar to foo(&mut blah), one would intuitively expect fn foo(self: Rc<Self>) to allow one to write blah.foo() and desugar to foo(Rc::new(blah))... which, indeed, would obviously be pointless, but the discrepancy rankles.

...Oh bleh, in my experiments I'm realizing that fn foo(self: Self), fn foo(self: &Self), and fn foo(self: &mut Self) are all surprisingly allowed alternatives to fn foo(self), fn foo(&self), and fn foo(&mut self)... and that, astonishingly, fn foo(self: Box<Self>) is already in the language and functioning in the inconsistent way that I'm grumpy about in the above paragraph. Of course we special-cased Box... :P

shepmaster commented 6 years ago

I'm all for supporting extra types for Self — I've been looking forward to it for three years.

mikeyhew commented 6 years ago

@shepmaster feel free to try them out! Now that bors has merged my PR, they're available on nightly behind the arbitrary_self_types feature gate: https://play.rust-lang.org/?gist=cb47987d3cb3275934eb974df5f8cba3&version=nightly

@bstrie I don't have as good a handle on the coherence rules as @arielb1, but I can't see how arbitrary self types would fall afoul of them. Perhaps you could give an example?

Secondly, the syntax is misleading. Given that fn foo(&mut self) allows one to write blah.foo() and have it desugar to foo(&mut blah), one would intuitively expect fn foo(self: Rc) to allow one to write blah.foo() and desugar to foo(Rc::new(blah))... which, indeed, would obviously be pointless, but the discrepancy rankles.

For the few methods that use non-standard self types, I get that this could be annoying. But they should be used sparingly, only when needed: the self type needs to Deref to Self anyway, so having the method take &self or &mut self would be preferred because it is more general. In particular, taking self: Rc<Self> should only be done if you actually need to take ownership of the Rc (to keep the value alive past the end of the method call).

arielb1 commented 6 years ago

Secondly, the syntax is misleading. Given that fn foo(&mut self) allows one to write blah.foo() and have it desugar to foo(&mut blah), one would intuitively expect fn foo(self: Rc) to allow one to write blah.foo() and desugar to foo(Rc::new(blah))

I don't find that "inversion principle" intuitive. After all, you can't call fn foo(&mut self) if your self isn't mutable. If you have a foo(&mut self), you need to pass it something that is basically an &mut Self. If you have a foo(self: Rc<Self>), you need to pass it something that is basically an Rc<Self>.

More than that, we already have self: Box<Self> today with no "autoboxing".

SimonSapin commented 6 years ago

Recent Nightlies emit a lot of warnings like this when compiling Servo:

warning[E0619]: the type of this value must be known in this context
  --> /Users/simon/projects/servo/components/script/dom/bindings/iterable.rs:81:34
   |
81 |             dict_return(cx, rval.handle_mut(), true, value.handle())
   |                                  ^^^^^^^^^^
   |
   = note: this will be made into a hard error in a future version of the compiler

First, I had to do git archeology in the rust repo to find that that warning was introduced bc0439b3880808e1385da4b99964d5d506f76e3f, which is part of #46837, which links here. The error message should probably include some URL to let people read further details and context, ask questions, etc.

Second, I have no idea what this warning is telling me. Why is this code problematic now, while it wasn’t before? What should I do to fix it? The error message should explain some more.

petrochenkov commented 6 years ago

The breakage from this change seems to be larger than it's normally allowed for compatibility warnings. A crater run wasn't performed, but if even rustc alone hits multiple instances of broken code (https://github.com/rust-lang/rust/pull/46914), it means than effect on the ecosystem in general is going to be large.

mikeyhew commented 6 years ago

@SimonSapin I'm working on changing that warning to a future-compatibility lint right now (https://github.com/rust-lang/rust/pull/46914), which will reference a tracking issue that explains the reason for the change; I still have to finish writing the issue though so right now it doesn't explain anything

mikeyhew commented 6 years ago

I guess we didn't realize there would be so much breakage from that PR. Right now, I'm finding lots of places in rust-lang/rust that produce this warning, and they are only popping up now that I'm turning it into a full-blown lint, which is turned into a hard error by deny(warnings).

bstrie commented 6 years ago

Can anyone give a summary of what the status is here? Given that we have two accepted, high-priority RFCs mentioning this feature (https://github.com/rust-lang/rfcs/pull/2349 and https://github.com/rust-lang-nursery/futures-rfcs/pull/2), it's concerning that we still don't have even anything resembling an RFC for this. Since I assume (?) that the RFCs in question are going to end up influencing things in libstd I suppose it's not absolutely imperative that we stabilize this anytime soon, but even having an idea of what the design goals of this feature are would be a good start to making sure we don't box ourselves into a corner somehow. It seems unprecedented for a feature to progress so far without even a pre-RFC...

mikeyhew commented 6 years ago

@bstrie there is an RFC PR that @withoutboats has opened. As far as implementation is concerned, everything in that RFC is implemented behind the arbitrary_self_types feature gate, except that all custom receiver types are considered non-object-safe, and dynamic dispatch hasn't been implemented. I've been working on implementing that, and you can follow this WIP PR for updates.

There are also some things that are implemented that aren't included in that RFC: notably, raw pointer method receivers, and receivers that deref transitively to Self rather than directly implementing Deref<Target=Self>, e.g. &Rc<Self>.

I hope that clears things up a bit. It doesn't completely address your concerns about RFCs being merged that are depending on an unimplemented, undocumented, and perhaps incompletely designed language feature, but hopefully, once I'm finished implementing the object-safety checks and dynamic dispatch, and that RFC gets merged, the foundation for those other RFCs will be less rocky.

bstrie commented 6 years ago

@mikeyhew thanks! Up top I've added a link to the RFC so that people can follow along easier. To be clear, do we intend not to support object safety in receivers for this initial implementation pass, or is that just yet-to-be-implemented?

mikeyhew commented 6 years ago

@bstrie the initial implementation pass is finished, and it did not include object-safety. The next pass, which I'm working on right now, will include object-safety.

burdges commented 6 years ago

There are interesting covarint uses of this :

fn foo<'a,E,F: FnOnce() -> Result<&'a mut Cat,E>>(self: F, bar: Bar) -> Result<Dog,E> {
    let h = heavy_setup(bar);
    let r = self()?.light_manipulations(h);
    Ok( heavy_conclusions(r) )
}

Now foo(|| &mut my_mutex.bla.lock()?.deref_mut().blabla) locks only briefly in the middle, but the API containing foo need not plan around any particular struct hierarchy or locking scheme, whiole still enforcing that heavy_setup be run before heavy_conclusions.

mikeyhew commented 6 years ago

@burdges that wouldn't be possible, because the self argument must Deref to Self. Why does the closure need to be a method receiver and not just a normal argument?

You could create a Thunk<T> that holds a closure that gets called when it's dereferenced. As far as I know, though, Deref impls aren't supposed to panic, so that might be an antipattern.

burdges commented 6 years ago

Yes, a normal argument works just fine of course, and works on stable. I even used it that way by mistake, instead of writing (|| &mut my_mutex.bla.lock()?.deref_mut().blabla).foo().

At first blush, this seemingly made sense as a method receiver, but yeah arbitrary self types do sounds far off now, and normal arguments might make more sense and/or improve type inference and error messages.

eliantor commented 6 years ago

Is this planned to be in 2018 edition?

mikeyhew commented 6 years ago

@eliantor no it isn't. It still isn't fully implemented, and once it is, it will probably still be a while until it is stabilized. The RFC isn't merged yet either, and I hope to get a working implementation before we do merge it.

withoutboats commented 6 years ago

I'm having some trouble with method resolution when passing through multiple dereferences:

#![feature(pin, arbitrary_self_types)]

use std::marker::Unpin;
use std::ops::{Deref, DerefMut};

#[derive(Copy, Clone)]
pub struct Pin<P> {
    pointer: P,
}

impl<P, T: ?Sized> Deref for Pin<P> where
    P: Deref<Target = T>,
{
    type Target = T;
    fn deref(&self) -> &T {
        &*self.pointer
    }
}

impl<P, T: ?Sized> DerefMut for Pin<P> where
    P: DerefMut<Target = T>,
    T: Unpin,
{
    fn deref_mut(&mut self) -> &mut T {
        &mut *self.pointer
    }
}

trait Future {
    type Output;
    fn poll(self: Pin<&mut Self>) -> Self::Output;
}

impl Future for () {
    type Output = ();
    fn poll(self: Pin<&mut Self>) -> Self::Output { }
}

fn test(pin: Pin<&mut ()>) {
    pin.poll()
}

https://play.rust-lang.org/?gist=77af55591988d9cf2aa216e8b4fcb8cb&version=nightly&mode=debug&edition=2015

@mikeyhew does this look like a bug or am I doing something wrong?

andreytkachenko commented 6 years ago

Am I right, there are no way to declare vector of Futures yet?:

Vec<Box<Future<Output = u32>>>
mikeyhew commented 6 years ago

@withoutboats that looks like a bug. pin.poll() should work. Future::poll(pin) does work, so it is indeed a problem with method lookup.

@andreytkachenko as far as I know, the Future trait from libcore is not object safe yet. So no, that's not possible yet. Once Pin is considered object safe by the compiler, though, it will be.

withoutboats commented 6 years ago

@mikeyhew the bug seems connected to #53843 which manifests on stable and isn't tied to this feature

@andreytkachenko to work around the problem, there's currently a type called FutureObj you can use

andreytkachenko commented 6 years ago

@mikeyhew @withoutboats Thank you for your replies, I'll take a look on FutureObj

jbg commented 5 years ago

Now that FutureObj is removed in nightly, is there another workaround available?

mikeyhew commented 5 years ago

I didn't knowFutureObj was removed already, but it should be obsolete in a matter of weeks

withoutboats commented 5 years ago

Now that FutureObj is removed in nightly, is there another workaround available?

FutureObj is now being provided by the futures crate, because other changes to the API made it unnecessary for it to be in std.

ZhangHanDong commented 5 years ago

@mikeyhew need futures-preview-0.7

mikeyhew commented 5 years ago

54383 was just merged (🎉), which means that receiver types like Rc<Self> and Pin<&mut Self> are now object-safe.

The next step is to make the receiver types in the standard library be usable as method receivers without the arbitrary_self_types feature, while still requiring the feature flag for ones we don't want to stabilize. Since receiver types are checked using the Deref trait, and Deref is stable, we need some way to differentiate between receiver types that are "blessed" and those that require the feature flag. I have thought of a few ways to do this:

  1. use an unstable #[receiver] attribute on the struct definition. The problem with this is there is that we would only check the outermost struct, and in the case of Pin<P> you could make P be some random type that derefs to Self to circumvent the checks.
  2. use an unstable Receiver<T> trait, where we check if the receiver type implements Receiver<Self>. T implements Receiver<T>, Rc<T> implements Receiver<U> if T: Receiver<U>, etc. This would be cool, but unfortunately it doesn't work with the current trait system due to overlapping impls.
  3. use an unstable Receiver trait that is a subtrait of Deref, and check for it during the autoderef loop when we are checking if the receiver transitively derefs to Self. This is the most promising option.

I'm working on a branch that does option 3 right now, and will hopefully have a PR up soon.

withoutboats commented 5 years ago

@mikeyhew What about using the object safety traits as the limiter, since they are also unstable?

mikeyhew commented 5 years ago

@withoutboats That could work for trait methods, but I'm not sure how that would work with inherent methods, since Self isn't a type parameter