rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.77k stars 12.5k forks source link

Decide about generics in arbitrary self types #129147

Open traviscross opened 1 month ago

traviscross commented 1 month ago

Over here, @adetaylor gave an update about arbitrary self types that included this bit:

During the preparation of the RFC it was broadly felt that we should ban "generic" self types but we didn't really define what "generic" meant, and I didn't pin it down enough.

Some of the commentary:

It seems to be widely felt that:

impl SomeType {
 fn m<R: Deref<Target=Self>>(self: R) {}
}

would be confusing, but (per those comments) it's actually pretty hard to distinguish that from various legitimate cases for arbitrary self types with generic receivers:

impl SomeType {
  fn method1(self: MyBox<Self, impl Allocator>) {}
  fn method2<const ID: u64>(self: ListArc<Self, ID>) {}
}

I played around with different tests here on the self type in wfcheck.rs but none of them yield the right sort of filter of good/bad cases (which is unsurprising since we haven't quite defined what that means, but I thought I might hit inspiration).

From those comment threads, the most concrete proposal (from @joshtriplett) is:

just disallow the case where the top-level receiver type itself is not concrete.

I plan to have a crack at that, unless folks have other thoughts. cc @Nadrieril who had opinions here.

If we do this, I might need to revisit the diagnostics mentioned in the bits of RFC removed by this commit.

We never made a decision about this. Perhaps if we could offer more clarity, it would save @adetaylor some time here, so it may be worth us discussing.

@rustbot labels +T-lang +I-lang-nominated -needs-triage +C-discussion

cc @rust-lang/lang @adetaylor

adetaylor commented 2 weeks ago

Thanks for raising this @traviscross .

adetaylor commented 1 week ago

Current status: investigating whether I can improve the diagnostics here that can occur when generics are used and the self type doesn't unify cleanly. The canonical failure case is here and the suggestion to improve diagnostics here is from this comment by @compiler-errors.

Outcomes might be:

Once I've got to the bottom of this, I'll report back, with some thoughts on all the reasons we might have wanted to exclude generics (not just diagnostics).

adetaylor commented 1 week ago

Here's a summary of the concerns and considerations about why we might want to block generic arbitrary self types.

Also, I finally found the original suggestion to ban generics from @RalfJung , which led to the commit editing the RFC to do so.

IMO the only remaining reason to avoid generic receiver types is the desire to avoid "unknown unknowns" per this third point. This is a good reason, but there are counterarguments:

As such: I am now firmly of the opinion that we should just allow generic receivers, as the current nightly arbitrary self types v1 does.

joshtriplett commented 1 week ago

Thank you for the detailed and thorough analysis!

If @RalfJung is on board with this analysis, then 👍 for lifting the restriction on generics.

RalfJung commented 1 week ago

I just suggested that we should be conservative here. My gut feeling is that we should "know the receiver type constructor" (i.e., identify the relevant Receiver impl) without instantiating the generic, but having generics "inside" the receiver type is fine (if that Receiver impl is itself generic). However, I am not sure if that is even a well-defined notion.

I'm too out of the loop to follow the details of the suggestion here, this sounds like a question for @rust-lang/types. :)

compiler-errors commented 1 week ago

I don't see any reason why we should need to support generics that come from the method itself. That should be easy enough to detect.

adetaylor commented 1 week ago

I don't see any reason why we should need to support generics that come from the method itself. That should be easy enough to detect.

OK, to check I understand your proposal, we want to distinguish these cases:

struct MyBox<T>(T);
impl<T> Receiver for MyBox<T> {
  type Target = T;
}

struct Foo;
impl Foo {
  fn ok(self: MyBox<Foo>) {}
  fn not_ok1<X: Receiver<Target=Self>>(self: X) {}
  fn not_ok2<A: Allocator>(self: Box<Self, A>)) {}
}

To detect this I'm assuming we want to do something along these lines in wfcheck.rs:

The new diagnostic might be something like:

generic methods can't be used with arbitrary self types. Use one of the standard self types such as Self, &Self, &mut Self

(We can probably optimize to avoid two separate validation passes, but semantically, that's what you think we should aim for?)

I'm a bit concerned about the Box<T, A> case, since Box is already hard-coded as a valid receiver type. If we want to retain compatibility with code like this we would need to retain the special-casing for Box, which seems a bit sad. But perhaps we don't care since the allocator API isn't stable? Opinions?

RalfJung commented 1 week ago

I'm a bit concerned about the Box<T, A> case, since Box is already hard-coded as a valid receiver type.

I would imagine that has an impl like impl<T, A> Receiver for Box<T, A>. So fn not_ok2<A: Allocator>(self: Box<Self, A>)) should IMO be fine -- the concrete Receiver impl is determined statically.

It is only when using a Receiver bound (and maybe other traits like Deref? not sure what the system ends up looking like) that we get into situations where different instances could use Self in completely different ways. Maybe that's a problem, maybe not -- but if we allow that it should be carefully thought through.

adetaylor commented 1 week ago

After looking at the options, I agree with @compiler-errors 's plan here. Specifically, we'll reject methods where the outermost receiver type is:

All other self types pass the test, including types which refer to such type params in their generic arguments.

This filter seems to work as desired - I'm working on a PR now.

Rationale

Documenting the thought process here for posterity.

First,

It is only when using a Receiver bound (and maybe other traits like Deref)

I'm worried about explicitly filtering out Receiver because of course other traits (including as you note Deref) might implement Receiver. We could attempt to do a query for whether a given trait indirectly implements Receiver but I'd be worried we'll cause semver compatibility problems if we do that, or more generally cause surprises.

I had a various chats with @Darksonn about this this weekend (thanks!) She made me realize that the only thing which matters is the outermost type kind, not any generic parameters elsewhere in the type. I think that's what the rest of you have been telling me for days too but I hadn't figured that out :)

We discussed only allowing concrete paths as the outermost type. I think the fundamentals of that idea are sound, but it turns out to be a bit more complex:

The key case turns out to be our old friend, Pin<&mut T>:

// Both Self and A below are ty::Param
pub trait Future {
  fn poll(self: Pin<&mut Self>) {} // must compile
}

pub trait Foo {
  fn poll<A: Deref<Target=Self>>(self: Pin<&mut A>) {} // probably should NOT compile
}

So, IMO this proves that the "good vs bad" filter must involve distinguishing some ty::Params from others.

The ty::Params which are OK are those belonging to the impl block, e.g. Self. The ty::Params which are not OK are those belonging to the method. As we're evaluating the method signature, we have ready access to the latter list, so we'll blocklist them.

Hence - I'm now pretty sure that this is the right plan.

If that doesn't make any sense, don't worry - PR coming along soon which will make it clearer.

RalfJung commented 6 days ago

I'm not sure this quite aligns with my intuotion... If the receiver type is "&MyType” and we also have a "where MyType: Receiver", this is still just as generic as the cases we want to rule out, isn't it?

The key distinction is really whether we can prove that this is a receiver type without using the where clauses.

adetaylor commented 6 days ago

Is this the sort of case you're worried about?

adetaylor commented 5 days ago

I've attempted to come up with an alternative PR which rejects all self types that depend in any way on method parameters, as opposed to the previous PR which only rejected self types which were a type param.

The second approach should reject methods whose self type depends upon where clauses per the concerns raised here and explored here.

However, this second approach doesn't currently work due to the Box problem I was concerned about. In alloc::collections::LinkedList::Node:

impl<T> Node<T> {
    fn into_element<A: Allocator>(self: Box<Self, A>) -> T {
        self.element
    }
}

and this in slice:

impl<T> [T] {
    pub fn into_vec<A: Allocator>(self: Box<Self, A>) -> Vec<T, A> {
        // ...
    }
}

Presumably we'd find similar patterns in third party crates too. So... we can't just entirely ignore the type params belonging to the method.

It's very hard to think of a compromise here. Even if I could figure out a way to retain the type params but not their predicates, that won't help, because we need the : Allocator predicate to successfully match against the Deref impl. I can't think of a way to filter "good" predicates vs "bad" predicates.

Thanks to @nbdd0121 and (again) @Darksonn for help here.

I think the options are:

  1. Allow generics after all.
  2. Filter out most generics in practice using an approach like #130098. This isn't very satisfying from a language rules perspective, but if our goal is to maintain flexibility in the case of unknown unknowns, it probably does the trick by making generic self types very rare.
  3. Spot some compromise I can't spot
  4. Rewrite LinkedList::Node (seems possible by giving Node a PhantomData field), constrain this slice method to work with only the global allocator (presumably a significant compatibility break?), and accept we'll probably break some third party crates, and land something like #130120. It feels like this is probably not OK.
nbdd0121 commented 5 days ago

I feel the allocator bound is a legit use of arbitrary self types and therefore ignoring predicates on impl item would not be a good approach.

We could filter out only predicates that mention Deref/DerefMut, but it feels pretty ad-hoc and people can workaround by defining a new trait with Deref being it's super trait, so that's probably also not a good approach.

I would suggest that we drop the non-generic requirement all-together.

RalfJung commented 5 days ago

Is this the sort of case you're worried about?

No, as there we have a single impl<T> Deref for MyPtr<T>. But now imagine we have impl<T> Deref for MyPtr<Box<T>> and impl<T> Deref for MyPtr<Arc<T>> -- these impls could be doing entirely different things, and yet either of them could be relevant for dispatching fn m<T>(self: &MyPtr<T>) where MyPtr<T>: Deref<Target=Self>.

RalfJung commented 5 days ago

It's very hard to think of a compromise here

I've suggested a compromise multiple times: we have to be able to identify the unique impl that allows this to be a receiver, without knowing which values the generic type parameters take. That allows the allocator case but rejects the example from my previous comment.

I don't know how to implement that, as I know nothing about the trait solver, but as far as I can see this is entirely well-defined.

nbdd0121 commented 5 days ago

I don't think that'll work, since for the Deref case the Receiver impl can be resolved to the blanket impl that implements it for all Deref.

RalfJung commented 5 days ago

Ah, that's how these two traits interact? I was wondering about that. Yeah okay if we have such generic Receiver impls that makes it tricky then.

Anyway, if t-types says there is no problem here then that's fine for me as well.

nbdd0121 commented 5 days ago

I suppose we can first resolve Receiver impl, and if it resolves to the blanket impl then we try to resolve Deref and see if it's too generic. But that also feels a bit ad-hoc..

lcnr commented 2 days ago

Is this the sort of case you're worried about?

No, as there we have a single impl<T> Deref for MyPtr<T>. But now imagine we have impl<T> Deref for MyPtr<Box<T>> and impl<T> Deref for MyPtr<Arc<T>> -- these impls could be doing entirely different things, and yet either of them could be relevant for dispatching fn m<T>(self: &MyPtr<T>) where MyPtr<T>: Deref<Target=Self>.

so something like https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=feca59e56ec2092732906a7c10493f67?

RalfJung commented 2 days ago

Yeah, something like that. Those impls can now have completely different Target types and thus the method may be looked up quite differently... it's a bit hard for me to imagine what even happens in the general case.^^ Here's an example.

I am surprised that this even compiles... are we considering every inherent method of every type to be a potential candidate here to find out which one gets called? Won't that be a mess?