Open traviscross opened 1 month ago
Thanks for raising this @traviscross .
Current status: investigating whether I can improve the diagnostics here that can occur when generics are used and the self type doesn't unify cleanly. The canonical failure case is here and the suggestion to improve diagnostics here is from this comment by @compiler-errors.
Outcomes might be:
wfcheck.rs
to eliminate some of these classes of type mismatch - I'm not sure if this will turn out to be possible.Once I've got to the bottom of this, I'll report back, with some thoughts on all the reasons we might have wanted to exclude generics (not just diagnostics).
Here's a summary of the concerns and considerations about why we might want to block generic arbitrary self types.
Also, I finally found the original suggestion to ban generics from @RalfJung , which led to the commit editing the RFC to do so.
Diagnostics. With this sort of code we end up with a mismatched type
error. This comment from @compiler-errors says we should try to restrict the feature such that users can't end up in that circumstance quite as often.
mismatched type
error except by using the type of some other function parameter to bound a receiver like in this pretty contrived example (the actual method call is down here). Other cases like this don't trigger this error because the nature of a Self
type can't easily be represented; trait bounds get a bit recursive.mismatched type
errors will be sufficiently rare that we don't need to worry about constraining this feature just to avoid that confusing diagnostic.Complex interaction with anti-shadowing algorithms
Weak<T>
or NonNull<T>
as self
types. During the process of agreeing the RFC, we agreed that the complexity of this deshadowing approach was not worth its value - for now. Instead, such types need to be wrapped in newtype wrappers if they are to be used as self types.A<B>
it would pick B::foo
rather than A::foo
. This was surprising (which is the reason we dropped the idea) and the surprise levels might have been greater for generic types, where A::Target
didn't simply and always point to B
.Avoiding "unknown unknowns".
IMO the only remaining reason to avoid generic receiver types is the desire to avoid "unknown unknowns" per this third point. This is a good reason, but there are counterarguments:
Box<Self>
, Rc<Self>
, SomeCppPtr<Self>
, etc. In fact some are even more generic (Box<Self, A>
).Foo<A>
as their self type, then (for example) an upstream crate changes it to Foo<A,B>
and things break. It's hard to think of any definition of "generic" where we don't run into this sort of risk.As such: I am now firmly of the opinion that we should just allow generic receivers, as the current nightly arbitrary self types v1 does.
Thank you for the detailed and thorough analysis!
If @RalfJung is on board with this analysis, then 👍 for lifting the restriction on generics.
I just suggested that we should be conservative here. My gut feeling is that we should "know the receiver type constructor" (i.e., identify the relevant Receiver
impl) without instantiating the generic, but having generics "inside" the receiver type is fine (if that Receiver
impl is itself generic). However, I am not sure if that is even a well-defined notion.
I'm too out of the loop to follow the details of the suggestion here, this sounds like a question for @rust-lang/types. :)
I don't see any reason why we should need to support generics that come from the method itself. That should be easy enough to detect.
I don't see any reason why we should need to support generics that come from the method itself. That should be easy enough to detect.
OK, to check I understand your proposal, we want to distinguish these cases:
struct MyBox<T>(T);
impl<T> Receiver for MyBox<T> {
type Target = T;
}
struct Foo;
impl Foo {
fn ok(self: MyBox<Foo>) {}
fn not_ok1<X: Receiver<Target=Self>>(self: X) {}
fn not_ok2<A: Allocator>(self: Box<Self, A>)) {}
}
To detect this I'm assuming we want to do something along these lines in wfcheck.rs
:
X
for not_ok1
and A
for not_ok2
)self
type and if its type contains any of those type parameters, reject it with a new diagnosticThe new diagnostic might be something like:
generic methods can't be used with arbitrary self types. Use one of the standard
self
types such asSelf
,&Self
,&mut Self
(We can probably optimize to avoid two separate validation passes, but semantically, that's what you think we should aim for?)
I'm a bit concerned about the Box<T, A>
case, since Box
is already hard-coded as a valid receiver type. If we want to retain compatibility with code like this we would need to retain the special-casing for Box
, which seems a bit sad. But perhaps we don't care since the allocator API isn't stable? Opinions?
I'm a bit concerned about the Box<T, A> case, since Box is already hard-coded as a valid receiver type.
I would imagine that has an impl like impl<T, A> Receiver for Box<T, A>
. So fn not_ok2<A: Allocator>(self: Box<Self, A>))
should IMO be fine -- the concrete Receiver
impl is determined statically.
It is only when using a Receiver
bound (and maybe other traits like Deref
? not sure what the system ends up looking like) that we get into situations where different instances could use Self
in completely different ways. Maybe that's a problem, maybe not -- but if we allow that it should be carefully thought through.
After looking at the options, I agree with @compiler-errors 's plan here. Specifically, we'll reject methods where the outermost receiver type is:
&/Receiver
chain, or in current "arbitrary self types", the &/Deref
chain)All other self types pass the test, including types which refer to such type params in their generic arguments.
This filter seems to work as desired - I'm working on a PR now.
Documenting the thought process here for posterity.
First,
It is only when using a
Receiver
bound (and maybe other traits likeDeref
)
I'm worried about explicitly filtering out Receiver
because of course other traits (including as you note Deref
) might implement Receiver
. We could attempt to do a query for whether a given trait indirectly implements Receiver
but I'd be worried we'll cause semver compatibility problems if we do that, or more generally cause surprises.
I had a various chats with @Darksonn about this this weekend (thanks!) She made me realize that the only thing which matters is the outermost type kind, not any generic parameters elsewhere in the type. I think that's what the rest of you have been telling me for days too but I hadn't figured that out :)
We discussed only allowing concrete paths as the outermost type. I think the fundamentals of that idea are sound, but it turns out to be a bit more complex:
&
, &mut
) vs following chains of Deref
/Receiver
so ideally this test wouldn't distinguish either.#![feature(arbitrary_self_types_pointers)
is enabledstr
and dyn T
Self
, or a param defined on the impl
block instead of in the method signature. (e.g. Rust tests include some methods in impl Trait for T where T: blah {}
impls, and the self type works out to be the type param T
)The key case turns out to be our old friend, Pin<&mut T>
:
// Both Self and A below are ty::Param
pub trait Future {
fn poll(self: Pin<&mut Self>) {} // must compile
}
pub trait Foo {
fn poll<A: Deref<Target=Self>>(self: Pin<&mut A>) {} // probably should NOT compile
}
So, IMO this proves that the "good vs bad" filter must involve distinguishing some ty::Param
s from others.
The ty::Param
s which are OK are those belonging to the impl
block, e.g. Self
. The ty::Param
s which are not OK are those belonging to the method. As we're evaluating the method signature, we have ready access to the latter list, so we'll blocklist them.
Hence - I'm now pretty sure that this is the right plan.
If that doesn't make any sense, don't worry - PR coming along soon which will make it clearer.
I'm not sure this quite aligns with my intuotion... If the receiver type is "&MyType
The key distinction is really whether we can prove that this is a receiver type without using the where clauses.
I've attempted to come up with an alternative PR which rejects all self
types that depend in any way on method parameters, as opposed to the previous PR which only rejected self
types which were a type param.
The second approach should reject methods whose self
type depends upon where
clauses per the concerns raised here and explored here.
However, this second approach doesn't currently work due to the Box
problem I was concerned about. In alloc::collections::LinkedList::Node
:
impl<T> Node<T> {
fn into_element<A: Allocator>(self: Box<Self, A>) -> T {
self.element
}
}
and this in slice
:
impl<T> [T] {
pub fn into_vec<A: Allocator>(self: Box<Self, A>) -> Vec<T, A> {
// ...
}
}
Presumably we'd find similar patterns in third party crates too. So... we can't just entirely ignore the type params belonging to the method.
It's very hard to think of a compromise here. Even if I could figure out a way to retain the type params but not their predicates, that won't help, because we need the : Allocator
predicate to successfully match against the Deref
impl. I can't think of a way to filter "good" predicates vs "bad" predicates.
Thanks to @nbdd0121 and (again) @Darksonn for help here.
I think the options are:
LinkedList::Node
(seems possible by giving Node
a PhantomData
field), constrain this slice
method to work with only the global allocator (presumably a significant compatibility break?), and accept we'll probably break some third party crates, and land something like #130120. It feels like this is probably not OK.I feel the allocator bound is a legit use of arbitrary self types and therefore ignoring predicates on impl item would not be a good approach.
We could filter out only predicates that mention Deref/DerefMut, but it feels pretty ad-hoc and people can workaround by defining a new trait with Deref being it's super trait, so that's probably also not a good approach.
I would suggest that we drop the non-generic requirement all-together.
No, as there we have a single impl<T> Deref for MyPtr<T>
. But now imagine we have impl<T> Deref for MyPtr<Box<T>>
and impl<T> Deref for MyPtr<Arc<T>>
-- these impls could be doing entirely different things, and yet either of them could be relevant for dispatching fn m<T>(self: &MyPtr<T>) where MyPtr<T>: Deref<Target=Self>
.
It's very hard to think of a compromise here
I've suggested a compromise multiple times: we have to be able to identify the unique impl
that allows this to be a receiver, without knowing which values the generic type parameters take. That allows the allocator case but rejects the example from my previous comment.
I don't know how to implement that, as I know nothing about the trait solver, but as far as I can see this is entirely well-defined.
I don't think that'll work, since for the Deref
case the Receiver
impl can be resolved to the blanket impl that implements it for all Deref
.
Ah, that's how these two traits interact? I was wondering about that. Yeah okay if we have such generic Receiver
impls that makes it tricky then.
Anyway, if t-types says there is no problem here then that's fine for me as well.
I suppose we can first resolve Receiver
impl, and if it resolves to the blanket impl then we try to resolve Deref
and see if it's too generic. But that also feels a bit ad-hoc..
No, as there we have a single
impl<T> Deref for MyPtr<T>
. But now imagine we haveimpl<T> Deref for MyPtr<Box<T>>
andimpl<T> Deref for MyPtr<Arc<T>>
-- these impls could be doing entirely different things, and yet either of them could be relevant for dispatchingfn m<T>(self: &MyPtr<T>) where MyPtr<T>: Deref<Target=Self>
.
so something like https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=feca59e56ec2092732906a7c10493f67?
Yeah, something like that. Those impls can now have completely different Target
types and thus the method
may be looked up quite differently... it's a bit hard for me to imagine what even happens in the general case.^^ Here's an example.
I am surprised that this even compiles... are we considering every inherent method of every type to be a potential candidate here to find out which one gets called? Won't that be a mess?
Over here, @adetaylor gave an update about arbitrary self types that included this bit:
We never made a decision about this. Perhaps if we could offer more clarity, it would save @adetaylor some time here, so it may be worth us discussing.
@rustbot labels +T-lang +I-lang-nominated -needs-triage +C-discussion
cc @rust-lang/lang @adetaylor