Coherence can be bypassed by an indirect impl for a trait object

arielb1 commented 5 years ago

Comments

The check for manual impl Object for Object only makes sure there is no direct impl Object for dyn Object - it does not consider such indirect impls. Therefore, you can write a blanket impl<T: ?Sized> Object for T that conflicts with the builtin impl Object for dyn Object.

Reproducer

edit: minimal reproducer from https://github.com/rust-lang/rust/issues/57893#issuecomment-500250283

trait Object<U> {
    type Output;
}

impl<T: ?Sized, U> Object<U> for T {
    type Output = U;
}

fn foo<T: ?Sized, U>(x: <T as Object<U>>::Output) -> U {
    x
}

fn transmute<T, U>(x: T) -> U {
    foo::<dyn Object<U, Output = T>, U>(x)
}

I had some difficulties with getting the standard "incoherence ICE" reproducer, because the object candidate supersedes the impl candidate in selection. So here's a "transmute_lifetime" reproducer.

trait Object {
    type Output;
}

trait Marker<'b> {}
impl<'b> Marker<'b> for dyn Object<Output=&'b u64> {}

impl<'b, T: ?Sized + Marker<'b>> Object for T {
    type Output = &'static u64;
}

fn foo<'a, 'b, T: Marker<'b> + ?Sized>(x: <T as Object>::Output) -> &'a u64 {
    x
}

fn transmute_lifetime<'a, 'b>(x: &'a u64) -> &'b u64 {
    foo::<dyn Object<Output=&'a u64>>(x)
}

// And yes this is a genuine `transmute_lifetime`!
fn get_dangling<'a>() -> &'a u64 {
    let x = 0;
    transmute_lifetime(&x)
}

fn main() {
    let r = get_dangling();
    println!("{}", r);
}

Duplicates, see also

https://github.com/rust-lang/rust/issues/114389

scalexm commented 5 years ago

@arielb1 Interesting. I guess we would need a deeper check, something along the lines of:

impl<T> Object for SomeType<T> where WC {
/* ... */
}

// If the following goal has a non-empty set of solutions, reject the impl.
exists<T> {
    exists<U> {
        Unify(dyn Object<Output = U>, SomeType<T>),
        dyn Object<Output = U>: WC
    }
}

cc @nikomatsakis

arielb1 commented 5 years ago

Sure.

What makes this non-trivial is auto-traits (dyn Object + Send is a distinct wrt. unification from dyn Object), and to some extent binders (unless you ignore them in unification). i.e., supposing the impl you have is

impl<T> Object<'a> for SomeType<T> where WC

Then the "ideal" lowering would be something of this format:

// If the following goal has a non-empty set of solutions, reject the impl.
exists<T: type, 'a('x): lifetime -> lifetime, U('x): lifetime -> type, AT: list of auto-traits> [
    Unify(dyn for<'s> Object<'a('s), Output = U('s)> + AT, SomeType<T>),
    dyn for<'s> Object<'a('s), Output = U('s)> + AT: WC
}

If you ignore binders in unification (i.e., you consider for<'a> Object<'a, Output=&'a T> to be the same as Object<'x, Output=&'y T> in coherence), then you don't have to deal with the HRTB problem, but you still have to deal with auto-traits:

// If the following goal has a non-empty set of solutions, reject the impl.
exists<T: type, 'a: lifetime, U: type, AT: list of auto-traits> [
    Unify(dyn Object<'a, Output = U> + AT, SomeType<T>),
    dyn Object<'a, Output = U> + AT: WC
}

While I don't think this is insurmountable, it is fairly annoying and places some restrictions on how you encode auto-traits.

Aaron1011 commented 5 years ago

I'm interested in working on this.

Aaron1011 commented 5 years ago

It turns out that only one trait is necessary to reproduce this: (playground)

trait Object {
    type Output;
}

impl<T: ?Sized> Object for T {
    type Output = &'static u64;
}

fn foo<'a, T: ?Sized>(x: <T as Object>::Output) -> &'a u64 {
    x
}

fn transmute_lifetime<'a, 'b>(x: &'a u64) -> &'b u64 {
    foo::<dyn Object<Output=&'a u64>>(x)
}

// And yes this is a genuine `transmute_lifetime`!
fn get_dangling<'a>() -> &'a u64 {
    let x = 0;
    transmute_lifetime(&x)
}

fn main() {
    let r = get_dangling();
    println!("{}", r);
}

This seems quite bad, as simply writing a blanket impl is enough to expose the issue.

Aaron1011 commented 5 years ago

One way to fix this issue would be the following:

If a trait has any associated items, and a blanket impl exists for it, that trait cannot be object-safe.

This is pretty extreme - however, the only other solution I can think of would be to deny all blanket impls of a trait with associated items - which seems even worse, given that the trait might not have even been object-safe to begin with.

Centril commented 5 years ago

Unfortunately, it gets worse. Here's an implementation of mem::transmute in safe Rust:

trait Object<U> {
    type Output;
}

impl<T: ?Sized, U> Object<U> for T {
    type Output = U;
}

fn foo<T: ?Sized, U>(x: <T as Object<U>>::Output) -> U {
    x
}

fn transmute<T, U>(x: T) -> U {
    foo::<dyn Object<U, Output = T>, U>(x)
}

I think blame should be assigned to not normalizing <T as Object<U>>::Output to U in fn foo. If that happened, foo::<dyn Object<U, Output = T>, U>(x) would not type-check.

I checked the code above with godbolt and the problem has existed since Rust 1.0.0.

jonas-schievink commented 5 years ago

The first comments in the thread seem to indicate that the problem is instead with the blanket impl itself being accepted by the compiler.

If I understood correctly, @Centril's impl above can be applied to dyn Object<U, Output=T> for any U and T, and defines Output=U. However, a trait object like dyn Object<U, Output=T> already provides an implicit impl with the same input types, but a different output type, which is incoherent.

jonas-schievink commented 5 years ago

Seems like a good idea to discuss in language and compiler team, nominating.

Centril commented 5 years ago

Notably, in my reproducer above, if you change fn foo to:

fn foo<T: ?Sized + Object<U>, U>(x: <T as Object<U>>::Output) -> U {
    x
}

then you will get:


error[E0308]: mismatched types
  --> src/lib.rs:10:5
   |
9  | fn foo<T: ?Sized + Object<U>, U>(x: <T as Object<U>>::Output) -> U {
   |                                                                  - expected `U` because of return type
10 |     x
   |     ^ expected type parameter, found associated type
   |
   = note: expected type `U`
              found type `<T as Object<U>>::Output`

jonas-schievink commented 5 years ago

Is that not just due to where-clauses having precedence over impls? AFAIK trait selection will prefer the Object<U> impl in the bounds, which doesn't have an assoc. type specified, so the compiler can't normalize it.

Aaron1011 commented 5 years ago

I made a first attempt at a fix here: https://github.com/Aaron1011/rust/tree/fix/trait-obj-coherence

My idea was to extend the obligations generated by ty::wf::obligations to include projection predicates of the form:

<dyn MyTrait<type Assoc=T> as MyTrait>::Assoc = T.

This is combeind with an extra flag in SelectionContext to force impl candidates to be chosen over object candidates. In the case of an incoherent blanket impl (e.g. any of the reproduces in this thread), <dyn MyTrait<type Assoc=T> as MyTrait>::Assoc should project to the type specified in the blanket impl. This will fail to unify with T, correctly causing a compilation error.

Unfortunately, I wasn't able to get this working, due to how the well-formed predicate generation is integrated with the rest of the compiler. I would need to significantly refactor FulfillmentContext, SelectionContext, and/orPredicatein order to keep track of the extra flag that needs to be passed toSelectionContext`.

I'm hoping that someone can come up with a better approach. However, any solution will need to deal with the fact that SelectionContext masks attempts to detect the incoherence by discarding the impl candidate in favor of the object candidate.

RalfJung commented 5 years ago

This is interesting! The fact that foo in @Centril's example even gets accepted shows that we aggressively exploit coherence during type-checking: because we saw an impl that uses a certain associated type, we type-check assuming that that is the impl that got used here. Unfortunately, as the example here shows, that is in conflict with the "implicit impl" that we get for trait objects.

I wouldn't be sad if we would fix this by being less aggressive about exploiting coherence during type-checking -- that makes analysis much more complicated, and this example shows why. But there's likely already tons of code out there that exploits this. Another fix is to refuse to create trait objects that "contradict" the kind of coherence knowledge that the type checker might exploit, but that seems extremely fragile.

@Aaron1011 your plan seems to be to encode in the type of foo the fact that we exploited coherence? However this "exploitation" happens in the body of the function, so what exactly gets exploited cannot be known yet when computing the well-formedness predicate for foo. Thus I guess what you want to / have to do is to aggressively look for all things one might be able to exploit up-front, add them all to the well-formedness obligations, and then during type-checking the boy look only at the obligations and not exploit coherence any more (as doing so would introduce scary bugs if the two systems for getting such type equalities out of coherence would ever not agree).

Once again, exploiting assumptions implicitly instead of inferring them to an explicit form proves to be an issue. This reminds me that we still implicitly exploit WF everywhere instead of turning that into explicit assumptions...

pnkfelix commented 5 years ago

triage: P-high due to unsoundness. Leaving nominated in hope of discussing today. Not assigning to anyone yet.

pnkfelix commented 5 years ago

assigning to @nikomatsakis with expectation that they will delegate. (and removing nomination label)

Centril commented 5 years ago

Still nominated for t-lang.

arielb1 commented 5 years ago

@RalfJung

I am not sure that giving up on coherence is the right choice here. I think it is too easy to split the code into functions, such that no one function sees the coherence violation:

struct Data<T: ?Sized, U> where T: Object<U> {
    data: <T as Object<U>>::Output
}

trait Object<U> {
    type Output;
}

trait Mark {
    type Output;
}

impl<T: ?Sized, U: Mark<Output=V>, V> Object<U> for T {
    type Output = V;
}

fn iso_1<T, U>(data: T) -> Data<dyn Object<U, Output=T>, U> {
    // in any coherence-less solution, this shouldn't "see" the
    // blanket impl, as it might not apply (e.g., `Local` might
    // be in a third-party crate).
    Data { data }
}

fn iso_2<X: ?Sized, U, V>(data: Data<X, U>) -> V
    where U: Mark<Output=V>
{
    // similarly, this shouldn't "see" the trait-object impl.
    data.data
}

fn transmute_m<T, U, V>(data: T) -> V
    where U: Mark<Output=V>
{
    // This function *also* shouldn't see the blanket impl - `Local`
    // might be in a third-party crate.
    iso_2::<dyn Object<U, Output=T>, U, V>(
        iso_1::<T, U>(data)
    )
}

// These 3 functions could be in a child crate

struct Local<T>(T);

impl<T> Mark for Local<T> {
    type Output = T;
}

fn transmute<T, U>(x: T) -> U {
    // and this function shouldn't see *anything* that looks like a
    // problematic associated type.
    transmute_m::<_, Local<_>, _>(x)
}

arielb1 commented 5 years ago

I think that conditioning object safety on an object type being coherent is a reasonable-enough way out of this.

If we want to be tricky, we might put the condition "directly" on the impl - i.e., have ObjectCandidate not apply if an ImplCandidate might "theoretically" apply to any substitution of this type. This would however require doing coherence-style negative reasoning during selection, which would be ugly.

RalfJung commented 5 years ago

@arielb1 your iso_2 exploits coherence, so "giving up on exploiting coherence" would not let that function type-check.

arielb1 commented 5 years ago

@RalfJung what's the specific way that makes iso_2 exploit coherence?

The wrong derivation that it makes is <X as Object<U>>::Output = V :- U: Mark<Output=V>. I don't think we want to avoid making that sort of derivation "unconditionally" (or is it implicitly making some other derivation?).

arielb1 commented 5 years ago

It seems like a basic rule of Rust that we want to keep that it is possible to follow impls. e.g., with IntoIterator, you really want typeck to be able to assume that <T as IntoIterator>::Item = U := T: Iterator, <T as Iterator>::Item = U.

RalfJung commented 5 years ago

follow impls

That is exactly what I mean by "exploiting coherence": seeing an impl that applies and relying on the fact it is the only one that can ever apply.

Centril commented 5 years ago

We are also discussing this on Zulip.

pnkfelix commented 5 years ago

Discussed at T-compiler meeting. Assigning to @centril to take point on seeing that progress is made on this issue, in some form.

nikomatsakis commented 5 years ago

We discussed this in depth at today's lang team meeting. We enumerated a few possible solutions to the problem.

The problem

Presently, if you have an object-safe trait Object, the compiler 'auto-generates' an impl like

impl Object for dyn Object { }

However, we do not forbid you from making a potentially duplicate impl:

impl<T: ?Sized> Object for T { }

This in and of itself is not so harmful, but it causes a problem if you have associated types:

trait Object<U> {
    type Output;
}

In this case, the compiler would "auto-generate" an impl like:

impl<A, B> Object<A> for dyn Object<A, Output = B> {
    type Output = B;
}

However, the user could provide an impl that has a different value for the associated type:

impl<A, B> Object<B> for A
where
    B: ?Sized,
{
    type Output = B;
}

Now the problem is that different parts of the compiler could select different impls, resulting in distinct values for Output. In @Centril's example (https://github.com/rust-lang/rust/issues/57893#issuecomment-500250283), there is a helper function foo which only knows that it has some T: ?Sized, so it uses the user's impl to resolve Output:

fn foo<T: ?Sized, U>(x: <T as Object<U>>::Output) -> U {
    //                  ^^^^^^^^^^^^^^^^^^^^^^^^^
    // Resolves to `U`, so `foo` thinks it has an argument of
    // type `U` -- therefore, it can be returned.
    x
}

On the other hand, the transmute function invokes foo with the type dyn Object<U, Output = T>:

fn transmute<T, U>(x: T) -> U {
    foo::<dyn Object<U, Output = T>, U>(x)
    //                           ^
    // Output is normalized to `T`
}

This winds up resolving using the "auto-generated" impl, and hence Output is normalized to T -- therefore, we can give an x: T, and we get back a U. (Reading the source, I'm actually not entirely sure why this happens, I might have thought it would result in ambiguity... but in any case it clearly does.)

OK -- I was supposed to start with the solutions, and @Centril was supposed to write-up the problem, but I couldn't help myself. Sorry @Centril. I wanted to be sure I understood it. I'll do the solutions in the next comment.

nikomatsakis commented 5 years ago

We explored three possible solutions.

The most obvious is to "give up on coherence". Actually we didn't go into this in depth; @arielb1 felt that it cannot be made sound, and I'm inclined to agree. I think this Zulip chat contains more details.

nikomatsakis commented 5 years ago

The next solution was modifying the rules around object safety such that the trait Object is not considered "object safe". The idea would be roughly that a trait is not "object safe" if there is some impl that could apply to the object. This was somewhat inspired by https://github.com/rust-lang/rust/issues/50781 -- which is a rather different, but also concerning, soundness hole around object types. (And one I have to revisit after writing these to see if we can somehow solve it in a similar way.)

In truth, we didn't dig too hard into this option. It would be "backwards incompatible" in a somewhat serious way, in that if someone has traits like

trait Foo { 
    fn method(&self) { }
}
impl<T: ?Sized> Foo for T { 
}

today, the Foo is "object safe" and hence one can have dyn Foo without a problem. (If you invoke method on a dyn Foo, I believe it will dispatch through the vtable, because we'll see it at monomorphization time, but I'm not 100% sure.)

This would be a bit of a departure in the sense that none of the other object safety rules have to do with the set of impls. That said, I believe that @mikeyhew did a lot of refactoring as part of the custom self-type work to make that possible, so it could work out.

Still, it has some "strangeness" to it -- for example, it requires creating a dyn Foo type so that we can test whether any impl applies to it -- but such a type is only well-formed if there is no such impl. (As noted in the next comment, we could probably do this check in a more syntactic way, actually.)

nikomatsakis commented 5 years ago

The final solution we considered was to suppress the "auto-generation" of the dyn Foo: Foo impl if there exists some user-given impl that might apply to dyn Foo. The idea then is that Foo is object-safe but if you supply an overlapping impl, that's the impl we're going to use.

This has some interesting potential to offer. It might serve as a replacement for the where Self: Sized hack that we use in types like Iterator to permit a trait to be object-safe while still employing lots of complex methods. I'll look at that in the next comment.

My first thought was that we might model this (in chalk terms) by first constructing a program environment for the trait which does not involve the automatic rule. This could be done but feels like a bit of a pain. In particular, solving those impls might require lowering other traits, and we'd have to test those traits to see if they are object safe and hence get an automatic rule, which seems to require a kind of DAG between traits. Uncool.

In practice, however, the impl in question must be be implemented for some type parameter T where T: ?Sized. I don't think there would be any other way to match a dyn Foo object type (at least, not without being generic over "trait bounds" or some such thing). So we might be able to do a simpler, more syntactic check?

The same seems to apply to option two.

nikomatsakis commented 5 years ago

One final note:

We discussed, somewhat speculatively, the idea that if we took the third option, where users can write their own impl of a trait for dyn Trait, that it might enable an alternative to the where Self: Sized hack. The idea would be that if you have a trait that contains non-object-safe methods, you can define it yourself for the dyn type. This would be a non-trivial bit of design though and it's not necessarily a big win -- it's roughly the approach we opted against in creating object safe in the first place, though with the (nice!) advantage that the compiler will automatically create the dyn Foo impl when it is easy to do so.

The basic idea would be that you can always create a dyn Foo type for any trait, but you can't actually invoke non-object-safe methods through that type. This changes the MIR definition because, for now, MIR knows nothing about virtual dispatch -- that is effectively encapsulated into a "magic impl" that the compiler generates. But in this model, the MIR would have to be extended with some notion of a virtual call, such that when one invokes obj.foo() on some variable obj: dyn Foo, that is not (from MIR's perspective) invoking Foo::foo but rather a virtual call. This lets you write the impl itself that bottoms out in virtual method dispatch.

nikomatsakis commented 5 years ago

OK, I'm taking a bit of time today to look into this. I want to start by implementing some version of the check proposed here:

The final solution we considered was to suppress the "auto-generation" of the dyn Foo: Foo impl if there exists some user-given impl that might apply to dyn Foo. The idea then is that Foo is object-safe but if you supply an overlapping impl, that's the impl we're going to use.

just to get an idea what kind of scenarios arise when we do this.

A few thoughts so far:

First off, I was wrong when I said that "the impl in question must be be implemented for some type parameter T where T: ?Sized". It seems clear that one could write (for example):

trait TraitA { type Item: ?Sized; }

trait TraitB<T> { }

impl<X: TraitA> TraitB<X> for X::Item { }

impl TraitA for () {
    type Item = dyn TraitB<()>;
}

A more elaborate version of this can be found in this playground link.

So clearly the (semi-)syntactic check I was first envisioning is too simplistic. What we want is to see whether there is some impl of SomeTrait whose self type could be dyn SomeTrait (or equivalent to it). This can happen in basically three ways:

a type parameter that is ?Sized
an associated type projection (with any parameters) that is ?Sized
a dyn SomeTrait type (which we already check for)
an opaque type, but those are disallowed and the semantics are unclear anyway
a type alias that equates to one of those things

(Our current "eager" expansion of type alias makes the last point a non-issue, but it's something to keep in mind for the future.)

So I guess we should be able to implement a suitable check that looks for those things and, if found, "disables" the compiler-supplied dyn Foo: Foo impl. I'll dig a bit more into what that actually means in my next comment.

(At first I thought cyclic traits could be a problem:

impl Trait1 for dyn Trait2 { }
impl Trait2 for dyn Trait1 { }

but actually I think I was just confused there. The key point here is that, for the type dyn Trait1, there is only one impl of Trait1 that could apply (the compiler-supplied one). The same holds for dyn Trait2.)

nikomatsakis commented 5 years ago

So say we do have a suitable test we can apply to a trait to "disable the compiler-supplied impl". What does this actually mean?

To start, it means that traits have three categories:

object safe -- in which dyn Foo is allowed, and compiler ensures that dyn Foo: Foo
object safe, degenerate -- in which dyn Foo is allowed, but dyn Foo: Foo may not hold
non-object-safe -- dyn Foo is not allowed

Now, RFC 2027 kind of merges the last two categories, which is nice. Unfortunately I've got to rebase PR https://github.com/rust-lang/rust/pull/57545 and get it landed, since poor @bovinebuddha kind of lost patience with me I think (and quite reasonably so). But I don't think that really needs to block this ultimately.

In very concrete terms, I think that the meaning of the "object safe degenerate" case is as simple as there being no "object candidate" in the select.rs (and project.rs) code. This would not prevent dyn Foo: Foo from holding, but for that to hold, some other Foo impl would have to exist.

Now one interesting issue is that the compiler present kind of hardcodes the fact that invoking Foo functions on a dyn Foo type routes to virtual dispatch. But I think ultimately this also comes from the object candidate, so actually removing that candidate should "just work".

Here are the high-level steps I think we need to take:

implement a new query that does this "syntactic check" I described to test whether a given trait should be "degenerate"
extend the notion of object safety with a 3-part enum
modify the object safety query to return degenerate if this check applies
modify select.rs to not generate the object candidate for degenerate traits
add some tests

Tests

Let me log some tests.

The weaponized transmute should error.
This playground link should print "Hello for X::Item" on the second execution, I think, because the <dyn Foo as Foo>::foo call will get dispatched through the user's impl.
We can make an equivalent version of the previous that does not use associated types.
Question: we could readily permit impl dyn Trait for dyn Trait now, but have it use these semantics. Maybe make such an example, just for fun.
Make some more examples showing that projection of associated types is going through the user impl.
Add a case where the syntactically identified, overlapping impl winds up not applying (something like this I would expect to error, because X: TraitC is not true) when we need to prove dyn Foo: Foo.

Centril commented 5 years ago

@nikomatsakis Thanks for the investigation.

Let me log some tests.

We should also include @arielb1's various versions on the weaponized transmute which "giving up on coherence" did not address.

To start, it means that traits have three categories:

object safe -- in which dyn Foo is allowed, and compiler ensures that dyn Foo: Foo

object safe, degenerate -- in which dyn Foo is allowed, but dyn Foo: Foo may not hold

non-object-safe -- dyn Foo is not allowed

Now, RFC 2027 kind of merges the last two categories, which is nice. Unfortunately I've got to rebase PR #57545 and get it landed, since poor @bovinebuddha kind of lost patience with me I think (and quite reasonably so). But I don't think that really needs to block this ultimately.

It would be good to consider what implications this has for quantifying over traits (e.g. fn foo<trait B>(...) and projecting to associated traits (e.g. MyType: Self::KeyBound).

I considered something like struct Foo<dyn trait A>(Box<dyn trait A + Send>); where dyn trait A quantifies an object safe trait. However, with the degenerate case, you can no longer assume that dyn A: A holds in a generic setting.

I believe you need to write where dyn A: A to ensure that it does. That's unfortunate syntactic salt but it seems it would do the job so we would not give up on expressiveness. Alternatively, we can assert dyn A: A implicitly and give up on expressiveness for degenerate cases. This where dyn A: A could also serve as a WF assertion which might make dyn trait A unnecessary as syntax.

nikomatsakis commented 5 years ago

@Centril

That all sounds about right. I think if we assume that we adopt RFC 2027, then dyn A would always be a legal type, and hence it seems pretty reasonable to make dyn trait A mean "some trait where dyn A: A".

On a completely different but related note, I've long wished that we make "dyn-compatibility" more explicit -- something like dyn trait Foo { .. } might be an explicit assertion that Foo is a fully dyn-compatible (aka "object safe").

nikomatsakis commented 5 years ago

Zulip thread where we're discussing work on this issue

nikomatsakis commented 5 years ago

Update: I've got a working prototype now that removes the object candidate, based on the model I described. It is able to bootstrap successfully but for some reason I do not yet understand it afils miserably at running tests, which all die with some obscure ICE while loading the stdlib. Investigating.

nikomatsakis commented 5 years ago

Update: I still don't really understand where those errors are coming from. However, I also added a warning to be emitted on every trait that has a "potentially overlapping impl". The bad news is that there are a reasonable number.

Of these, Any is obviously the biggest concern -- it is only really useful in dyn Any form. It's also kind of amazing that this overlap didn't occur to us before, given that example. (I do remember being vaguely concerned that the check in the compiler wasn't good enough, but I wish I had listened more to the concern; ah well!)

nikomatsakis commented 5 years ago

Heh, I wonder if my bug is actually caused by the Any case! It must be. I think this code is failing:

https://github.com/rust-lang/rust/blob/23f890f10202a71168c6424da0cdf94135d3c40c/src/librustc_metadata/cstore_impl.rs#L56-L58

The problem here is that downcast_ref invokes type_id -- it wants to be invoking type_id through the vtable, to get the type-id of the underlying type. However, we are now suppressing the "vtable-based implementation" and so we are getting this implementation of type_id instead:

https://github.com/rust-lang/rust/blob/23f890f10202a71168c6424da0cdf94135d3c40c/src/libcore/any.rs#L97-L100

In other words, we're getting the type-id for the type dyn Any, not the hidden type. (This is what I meant by "uh this is obviously a coherence issue".) Therefore, our call to expect fails.

nikomatsakis commented 5 years ago

I'm not really sure what the best solution is here, but one thought is that, in all of the above cases, the "vtable impl" could be considered a specialization of the overlapping, generic case.

We might therefore consider a different approach. Instead of skipping the generation of the dyn impl, we might rather make it more official -- we could add it to the coherence check (presently, it is skipped, because we only consider impls that the user actually wrote). In that case, all the above impls would generate errors, but if we added default to their functions, they would compile, and behave as before.

Moreover, the "weaponized" cases would all fail -- they would be invalid specializations unless default were added to their type parameters, and in that case, the compiler would not assume that it can normalize based on those impls.

Aaron1011 commented 5 years ago

@nikomatsakis: What would be the migration path for any crates with traits similar to Borrow or BorrowMut? I'm concerned that a specialization-based approach would leave such crates with no path forward, other than waiting on the stabilization of specialization.

nikomatsakis commented 5 years ago

I was thinking about this more over the weekend. @Aaron1011, in answer to your question, it seems obvious we're going to have to manage the timing of our fix here pretty carefully. For one thing, we've talked about stabilizing some subset of specialization, and maybe we should try to execute on that before we make any hard errors here.

One thing that's worth pointing out -- the only thing that users need to be able to do is to declare individual items as default (i.e., eligible to be specialized). They don't need to write the specializing impl themselves.

Leaving the timing question aside, I had some further thoughts.

First, we don't want to unilaterally introduce a impl Foo for dyn Foo impl, clearly -- for example, this doesn't make sense for traits that are not object safe. So you could imagine some logic like:

If trait is object safe, introduce the impl Foo for dyn Foo impl.

However, this is kind of a mess. In particular, you may not have meant for your trait to be used as a dyn Trait -- for example, I'm not sure we anticipated dyn Borrow, although there are a few users -- but now you may be getting coherence errors due to that dyn Foo impl that you never asked for.

This comes back to the original mistake (or so I view it now) of having traits implicitly become object safe if they can be. In retrospect, I think I would've preferred a design where users "opted in" to having a trait be dyn-compatible, perhaps by writing

dyn trait Iterator { .. .}

or something like that. It would then be a hard-error if the trait were not dyn compatible. If we had that design, then it would be fairly natural that dyn trait Iterator also introduces an impl of Iterator for dyn Iterator. In fact, since we are moving to a world where dyn Iterator is always a valid type, even if Iterator is not dyn-compatible, you can view dyn trait Iterator as a kind of "convenient short-hand" where the compiler generates the impl for you, but in theory you could write it yourself (as we discussed in a lang-team meeting). This obviously ignores the fact that you don't have the primitives you need to write that impl (i.e., to do virtual dispatch), but that is something we could in principle correct.

Of course, requiring dyn trait opt-in would not be a compatible change at this point. It could in principle be done across an Edition boundary, although the "warning" would be a bit tricky (in particular, we can only suggest adding dyn to the trait if we see the trait used as a dyn value locally, we can't know about all downstream uses).

One thing we could do is something like this:

We enable dyn trait declarations
- in such a case, we generate an impl and check the coherence/specialization rules I suggested
- we also require the existing rules be satisfied
For backwards compatibility, in Rust 2018 and before, a trait can still be implicitly dyn compatible, but only if there is no overlapping impl (i.e., the check I implemented here)
- if there is an overlapping impl, then we still create the object candidate -- but we error out when you create a trait object
in Rust 2021, we require dyn trait declarations

Note this last point: this is not the logic that exists on the branch. My thought here is that the "drop the object candidate" logic has been shown to change the semantics of existing code (i.e., the Any branch), so merely making such a change would be bad.

This change means that if people want to use dyn Foo for some trait Foo that has an overlapping impl, they have to explicitly declare the trait to be dyn trait Foo, in which case the coherence rules apply.

(To be clear, I think this change requires an RFC -- but it may make sense to adapt my branch such that I can create a PR and test the impact.)

nikomatsakis commented 4 years ago

So I've been distracted here. I think that the crater results here appear to be promising, but there is still a bit more analysis needed. Those results show that relatively few craters are affected, but what I didn't figure out yet is what kind of fix is needed for those crates. In particular, part of the fix I've proposed leans on specialization, and it'd be good to know if those crates would require default declarations in order to be compatible.

The answer is definitely yes for some of them, but maybe not all. It's worth doing the investigation. To do so, we basically want to look at the logs and see if there is indeed an impl.

So far I just took a quick look at the axiom 1.0 crate. It's actually raises an interesting point -- here are its crater logs. What we can see that there is a warning issues with the text "impl_potentially_overlapping_dyn_trait", indicating that we found an impl that potentially overlaps with the dyn ActorMessage (per these rules). However, [that impl requires other traits, like Serialize, and of course dyn ActorMessage: Serialize would not hold. However, if we look to the future, we would have to consider the possibility of trait objects like dyn (ActorMessage + Serialize) -- and such a trait object naturally could match against that impl. So I think that this is indeed a case where the correct impl would require a default declaration, so that it can be overridden by the (compiler-supplied) impl for dyn ActorMessage.

To track the progress of the investigation, I created a hackmd. I'm going to try and leave some notes in crates I've looked at, and if others want to pitch in, please feel free to do so!

nikomatsakis commented 4 years ago

We held a lang-team design meeting (minutes and video available here) where we discussed this issue in depth.

After the meeting, I realized that one simpler fix (at least for now) would be to revoke "dyn safety" if:

there is a potentially dyn-overlapping impl and
the trait has associated items that are not fns (@cramertj pointed out that associated types which require where Self: Sized could be permitted)

I've implemented this in PR #66037 and it is being tested on crater.

withoutboats commented 4 years ago

I just wanted to mention since its not highlighted in the notes that this intersects with #67562. The proposal here is to make the compiler treat the Any impl as specializable by the compiler generated trait object impl. There are several variations on this proposal, but we should remember: we cannot allow end users to specialize Any without making Any an unsafe trait, because the safety of the code in std::any depends on the invariant that the TypeId returned by Any's method is the correct TypeId for the erased type (currently upheld through by virtue of the sole unspecializable impl of Any which covers all possible instantiations).

Any of these solutions are sufficient:

Don't let end users specialize these impls the compiler forces to be treated as "default" for sound resolution.
Mark Any as something like default(crate) (downstream users cannot specialize)
Mark Any as an unsafe crate.

The first two are both mentioned in the document and it sounded like the lang team was in favor of both. My impression of #67562 is that the libs team would prefer not to see Any specializable by end users, and to keep its unsafety encapsulated in the any module.

nikomatsakis commented 4 years ago

I'm dumping here some notes from the lang team dropbox paper:

No progress
Relevant issue: rust-lang/rust#57893
Relevant PR: rust-lang/rust#66037
What the PR implemented:
- Making dyn Trait not dyn safe if:
  - there is a dyn overlapping impl and
  - the trait has associated non-functions
    - that do not have where Self: Sized (not implemented)
Crater impact virtually negligible, but the 1 item we did identify is interesting
- has to do with the [TryStream](https://docs.rs/futures/0.3.1/futures/stream/trait.TryStream.html) trait, which is used roughly like this:

    trait StreamX {
        type Item;  
    }

    trait TryStreamX: StreamX  {
       type Ok;
       type Error;
    }

    impl<S, T, E> TryStreamX for S
        where S: ?Sized + StreamX<Item = Result<T, E>>
    {
        type Ok = T;
        type Error = E;
    }

the crate in question has a dyn TryStream which is an error now
- Why? The concern is that S in the above impl might (in the future) become
  - dyn TryStreamX<Ok = A, Error = B> + StreamX<Item = Result<C, D>>
- in which case type Ok would be A from the default impl but C from the impl above
I’m not sure if there is some path forward apart from breaking dyn TryStream, which is a bit of a shame
- note that a trait alias is really what we want here
other alternative that I haven’t tried to implement yet, but which I discussed before
- in a dyn-overlapping impl, all associated items must be default
  - and compiler (for backwards compatibility) will add the default implicitly
    - (but user would not be permitted to specialize with their own impls, as boats noted)
  - this is to reflect that the compiler introduces its own impl
- but impact on TryStream would be pretty severe here, I think
  - you wouldn’t be able to take T: Stream<Item = Result<...>>
  - you would not be able to do T::Ok
impact of existing PR
- you cannot use dyn TryStream but you can do fn foo<T: TryStream<..>>()>
- you could write dyn Stream<Item = ..>
really this code wants to be a trait alias
- This would require trait aliases to support associated types, but that seems feasible
- Trait alias semantics: “A Stream of Result<T, E> can be called a TryStream, where the associated types Ok and Error refer to T and E respectively”
next step:
- revisit the soundness hole and see if we can “weaponize” the TryStream example in a similar way
- revisit the write-up covering the various options before us:
  - i.e., this change is modifying what dyn safety means, has very slight impact
    - concerns about the fact that “whether trait X is dyn safe depends on its impls”
  - other options might be introducing default, but this has potentially broader impact

joshtriplett commented 4 years ago

I'm un-tagging this as P-high. In practice, it is not actually P-high. We'd like to see someone work on this, but leaving it perpetually tagged P-high is not helping with that.

steffahn commented 3 years ago

I got some more example code in #80783, which I closed as it’s a duplicate.

I guess they aren’t any worse (as in “more unsafe”) than the examples that we already have in this thread.

steffahn commented 3 years ago

@nikomatsakis

revisit the soundness hole and see if we can “weaponize” the TryStream example in a similar way

It surely can:

use futures::{Stream, TryStream};

fn transmute<B, A>(x: B) -> A {
    foo::<dyn TryStream<Ok = B, Error = (), Item = Result<A, ()>>, A, B>(x)
}

fn foo<S: ?Sized,  A, B>(x: <S as TryStream>::Ok) -> A
where
    S: Stream<Item = Result<A, ()>>,
{
    x
}

static X: u8 = 0;
fn main() {
    let x = transmute::<&u8, &[u8; 1000000]>(&X);
    println!("{:?}", x);
}

(playground)

Edit:

Apparently, not even changing it to

pub trait TryStream:
    Stream<Item = Result<<Self as TryStream>::Ok, <Self as TryStream>::Error>>

changes anything o.O (playground)

steffahn commented 3 years ago

Apparently, not even changing it to
pub trait TryStream:
    Stream<Item = Result<<Self as TryStream>::Ok, <Self as TryStream>::Error>>
changes anything o.O

Which means that we can get unsafety without any dyn-overlapping implementation:

trait SuperTrait {
    type A;
    type B;
}

trait Trait: SuperTrait<A = <Self as SuperTrait>::B> {}

fn transmute<A, B>(x: A) -> B {
    foo::<A, B, dyn Trait<A = A, B = B>>(x)
}

fn foo<A, B, T: ?Sized>(x: T::A) -> B
where
    T: Trait<B = B>,
{
    x
}

static X: u8 = 0;
fn main() {
    let x = transmute::<&u8, &[u8; 1000000]>(&X);
    println!("{:?}", x);
}

(in case this isn’t known yet, it might be worth opening another issue..)

alercah commented 2 years ago

From this comment, is there not a soundness issue already with Any?

In particular, given T, U: Any + ?Sized, you cannot assume that calling type_id on values of those types will produce unique type IDs, because one of them might be dyn Any. You cannot, as far as I can tell, actually get the type ID of dyn Any using type_id; you can only get it with TypeId::of.

nikomatsakis commented 1 year ago

@bors labels +T-types +P-high

Sadly, though I'd prefer to pass it to others, the types team has to take ownership of this one. =)

bjorn3 commented 1 year ago

@rustbot label +T-types +P-high

rust-lang / rust