rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.61k stars 12.48k forks source link

`TypeId` exposes equality-by-subtyping vs normal-form-syntactic-equality unsoundness. #97156

Closed lcnr closed 6 months ago

lcnr commented 2 years ago

EDIT by @BoxyUwU playground

type One = for<'a> fn(&'a (), &'a ());
type Two = for<'a, 'b> fn(&'a (), &'b ());

mod my_api {
    use std::any::Any;
    use std::marker::PhantomData;

    pub struct Foo<T: 'static> {
        a: &'static dyn Any,
        _p: PhantomData<*mut T>, // invariant, the type of the `dyn Any`
    }

    impl<T: 'static> Foo<T> {
        pub fn deref(&self) -> &'static T {
            match self.a.downcast_ref::<T>() {
                None => unsafe { std::hint::unreachable_unchecked() },
                Some(a) => a,
            }
        }

        pub fn new(a: T) -> Foo<T> {
           Foo::<T> {
                a: Box::leak(Box::new(a)),
                _p: PhantomData,
            } 
        }
    }
}

use my_api::*;

fn main() {
    let foo = Foo::<One>::new((|_, _| ()) as One);
    foo.deref();
    let foo: Foo<Two> = foo;
    foo.deref();
}

has UB from hitting the unreachable_unchecked because TypeId::of::<One>() is not the same as TypeId::of::<Two>() despite them being considered the same types by the type checker. Originally this was thought to be a nightly-only issue with feature(generic_const_exprs) but actually the weird behaviour of TypeId can be seen on stable and result in crashes or UB in unsafe code.

original description follows below:


#![feature(const_type_id, generic_const_exprs)]

use std::any::TypeId;
// `One` and `Two` are currently considered equal types, as both
// `One <: Two` and `One :> Two` holds.
type One = for<'a> fn(&'a (), &'a ());
type Two = for<'a, 'b> fn(&'a (), &'b ());
trait AssocCt {
    const ASSOC: usize;
}
const fn to_usize<T: 'static>() -> usize {
    const WHAT_A_TYPE: TypeId = TypeId::of::<One>();
    match TypeId::of::<T>() {
        WHAT_A_TYPE => 0,
        _ => 1000,
    } 
}
impl<T: 'static> AssocCt for T {
    const ASSOC: usize = to_usize::<T>();
}

trait WithAssoc<U> {
    type Assoc;
}
impl<T: 'static> WithAssoc<()> for T where [(); <T as AssocCt>::ASSOC]: {
    type Assoc = [u8; <T as AssocCt>::ASSOC];
}

fn generic<T: 'static, U>(x: <T as WithAssoc<U>>::Assoc) -> <T as WithAssoc<U>>::Assoc
where
    [(); <T as AssocCt>::ASSOC]:,
    T: WithAssoc<U>,
{
    x
}

fn unsound<T>(x: <One as WithAssoc<T>>::Assoc) -> <Two as WithAssoc<T>>::Assoc
where
    One: WithAssoc<T>,
{
    let x: <Two as WithAssoc<T>>::Assoc = generic::<One, T>(x);
    x
}

fn main() {
    println!("{:?}", unsound::<()>([]));
}

TypeId being different for types which are considered equal types allows us to take change the value of a projection by switching between the equal types in its substs and observing that change by looking at their TypeId. This is possible as switching between equal types is allowed even in invariant positions.

This means that stabilizing const TypeId::of and allowing constants to flow into the type system, e.g. some minimal version of feature(generic_const_exprs), will be currently unsound.

I have no idea on how to fix this. I don't expect that we're able to convert higher ranked types to some canonical representation. Ah well, cc @rust-lang/project-const-generics @nikomatsakis

lcnr commented 2 years ago

I don't expect that we're able to convert higher ranked types to some canonical representation

Ok, I have an idea on how to convert higher ranked types to some canonical representation. I just don't think that idea is very good. I also don't know how I would extend this to implication types in the future

CAD97 commented 2 years ago

To clarify, is

One and Two are equal types with different type ids

the issue here, and the rest is just exploiting this to demonstrate the unsoundness? (At least this seems to break the guarantees I understand TypeId to be providing... it just isn't exploitable without a way to substitute the "equal" types.)

If this holds, then the resolution is to ether give One and Two the same type id or to make One and Two nonequal types.

Giving One and Two the same type id would be disastrous, as it would allow e.g. dynamically casting for<'a> fn(&'a T) -> &'a T to for<'a> fn(&'a T) -> &'static T.

cc also @eddyb #95845, as using the v0 mangling for TypeId would erase the lifetimes from higher ranked types AIUI. The current impl of #95845 still uses the current hash, so wouldn't be problematic, but this is something to keep in mind w.r.t. the information contained in TypeId vs the v0 type mangling.

(The issue OP ccs itself; this is likely a typo?)

eddyb commented 2 years ago

There is a lot of confusion here wrt all the pieces involved.

Both legacy and v0 mangling, and both the TypeId hash and a v0-mangling-based implementation (like #95845), use the same encoding for higher-ranked lifetimes (it's why the v0 mangling mentions lifetimes at all!), though legacy symbols/type hashes only as the input to the hash.

You couldn't not mangle them because otherwise TypeId::of::<One> and TypeId::of::<Two> would clash in their symbol names (and that's actually how I found out that v0 mangling needed to reason about for<'a>, when implementing it, heh).

That all uses "binder anonymization" so the fully normalized types are:

The issue here is this broken fragment of type unification code: https://github.com/rust-lang/rust/blob/cd282d7f75da9080fda0f1740a729516e7fbec68/compiler/rustc_infer/src/infer/equate.rs#L154-L159

This treats (T <: U) + (U <: T) as meaning T = U. But that's not sound at all!

Too much of Rust's typesystem depends on T = U matching "identity" (judgmental syntactic equality if you want) with "enough normalization", so a definition equivalent to "subtyping both ways" only works if it's compatible, and in this case it's not.

I don't think there's an "obvious choice" for which of One or Two should be "the normal form" for the other (Two seems more likely but it still feels dubious), so the correct thing to do (at least in the interim) is to stop using subtyping entirely when dealing with strong equality, and only support anonymizing the binders (which is the only thing done for the maximal normalization used by symbol names and TypeId, tho sadly I don't think we always do it, yet).

Even if we normalized one of One or Two to the other, equality should still not use subtyping, and instead normalize to a common type first, to avoid the "unproven assertion" that "both-ways subtyping" implies "equality" (as opposed to the trivial truth in the other direction, which is given by subtyping reflexivity AFAIK).

eddyb commented 2 years ago

Retitled the issue to reflect the situation as I understand it (TypeId only exposes this the same way e.g. symbol names or any other type-equality-sensitive part of rustc might).

RalfJung commented 2 years ago

One and Two are equal types with different type ids.

In which sense are these types "equal"? Two allows references with different lifetimes, One does not, so those look like different types to me.

Going off the issue title, are you saying they each are a subtype of the other? That is different from saying they are "equal" though. Might be good to update the issue description because this confused the heck out of me until I read https://github.com/rust-lang/rust/issues/97156#issuecomment-1130957415.

Cc https://github.com/rust-lang/rust/issues/97156

You are Cc'ing an issue in itself? probably a typo, not sure which issue you meant though.

oli-obk commented 2 years ago

Two allows references with different lifetimes, One does not, so those look like different types to me.

Well... from a user perspective any function that will be allowed as a function pointer of type One will also be allowed as a function pointer of type Two. Also any caller to One can just also call Two without any visible behaviour (if the function pointers actually point to the same function).

So they are structurally different, but behaviour wise they are the same. If we had sufficiently smart normalization we could normalize One to Two, just gets hard to do that normalization properly in the presence of more complicated variance (mutable references and interior mutability)

eddyb commented 2 years ago

@RalfJung FWIW I changed the issue title, it was something else before.

In which sense are these types "equal"?

Specifically unification in the Equate mode - we seem to have special-cased "each of the two types is a subtype of the other" to imply "equality" without justifying/enforcing it in a way the rest of the compiler - which mostly relies on Ty<'tcx>: Eq (and I assume that sort of "syntactic match" is equivalent to judgmental equality), especially after post-monomorphization normalization.


@oli-obk The normalization wouldn't need to be perfectly clever. You would just perform it before checking for equality (without involving subtyping), just like we do with anonymizing the lifetimes bound by the for<...> to avoid their order (or DefIds) affecting any semantics.

That is, we have to remove the unsound equality-via-subtyping logic and try to cover any breakage with a subset of the possible higher-ranked normalization (e.g. if a lifetime only appears in the outermost fn pointer type's input types, and its variance is the default for that position, so not invariant or otherwise flipped, then it can be split into one lifetime per position) - we could maybe even make this be part of anonymize_late_bound_regions, since that has to replace every one of those lifetime positions anyway.


EDIT: I regret using the word "judgmental" (or its misspelling, apparently), because I just remembered that type equality judgments can definitely involve subtyping, that distinction is more about "being built-in" vs reified proofs of type equality (or "paths" in HoTT). ("syntactic" should be correct though, I'll switch to that)

nikomatsakis commented 2 years ago

A few thoughts:

How I understand the bug

Some parts of rustc do indeed assume that T1 <: T2 and T2 <: T1 imply T1 == T2 (I think of this as "semantic equality", in this sense, though I realize it's actually a distinct concept and I should probably find another term). However, for many Rust types, there is no canonical type T3 that both T1 and T2 can be converted to, or at least, we don't know of such a type (especially given the directions I would like to go with our types, I do not think that canonical types will exist in general, see below).

For the compiler itself, lacking a canonical type T3 is not a big issue, although it does make things inconvenient, since checking semantic equality is more complex. (Associated types throw a similar wrench into these things too, hence the desire for lazy normalization.)

The typeid function however "works best" if canonical types exist, though I would say that for soundness they are not strictly required:

Whether or not the "nondeterministic type-id" is a problem depends on how you choose to draw the lines:

Ways to fix the bug

Based on the above analysis, it seems like we can ...

I'm not really aware of other solutions, I'm curious if folks have any ideas there.

RalfJung commented 2 years ago

Having two notions of equality (one more fine-grained than the other) is certainly possible, but needs some care to avoid accidentally using the wrong one in the wrong spot. I am not enough of a subtyping expert to know if there are any common pitfalls with this approach. It would of course be nicer, conceptually, to make subtyping antisymmetric, i.e., "a <= b /\ b <= a -> a == b".

I'm not really aware of other solutions, I'm curious if folks have any ideas there.

Some algebraic thoughts: whatever equality on types ends up being, the property we need is that all types in an equivalence class have the same type ID. The easiest way to achieve this is to pick a unique representative for each equivalence class, have a normalization function that returns said representative, and use its type ID. However, in principle it is conceivable that one could generate a suitable type ID directly, without having such a normalization function. I don't know if that is actually any easier in this case, but it might be worth thinking about.

Jules-Bertholet commented 1 year ago

Random thought:

For the specific case of for<'a> fn(&'a (), &'a ()) and for<'a, 'b> fn(&'a (), &'b ()), could both of these be normalized to fn(&'empty (), &'empty ()) as described here?

lcnr commented 1 year ago

Afaik we're currently moving towards removing ReEmpty so that won't work :laughing:

also, for<'a> fn(&'a ()) needs to have a different TypeId as fn(&'static ()) (as these are different types which are both 'static) and as we erase regions during codegen, changing for<'a> fn(&'a ()) to fn(&'erased ()) would make it indistinguishable from fn(&'static ()).

Jules-Bertholet commented 1 year ago

Afaik we're currently moving towards removing ReEmpty

What's the reasoning behind this change? Any discussion I can read up on?

For context, from what I understand, 'empty is needed for an impl <T> Trait for fn(T) to include impl <T> Trait for fn(&T). And that is needed to allow making coherence_leak_check a hard error without breaking people's use cases. And making coherence_leak_check a hard error might be needed to make pattern types play well with type inference.

lcnr commented 1 year ago

For context, from what I understand, 'empty is needed for an impl <T> Trait for fn(T) to include impl <T> Trait for fn(&T).

Well 'empty is the reason why we even lint coherence_leak_check right now afaik :sweat_smile:

I don't your idea proposed in the linked internals thread is possible:

T implements Trait. In this case, if any of T's supertypes or subtypes also implement Trait, those impls must come from the same impl block as the impl of Trait for T. Additionally, if U is a supertype of T, and U does not implement Trait, then none of U's supertypes may implement Trait. If any of these rules is violated, it's a coherence error.

I don't think this works, because not only does the same impl has to apply for coherence, but this must also recursively hold for all associated types. If we again look at for<'a> fn(&'a ()) and fn(&'static ()): These have to be different types with different TypeIds. But we currently have for<'a> fn(&'a ()) <: fn(&'static ()).

lcnr commented 1 year ago

What's the reasoning behind this change? Any discussion I can read up on?

Not sure, you can try asking about it in the #t-types stream on zulip :sweat_smile: i think that's mostly been mentioned as a sidenote during other discussions.

Jules-Bertholet commented 1 year ago

I don't think this works, because not only does the same impl has to apply for coherence, but this must also recursively hold for all associated types.

I'm not sure I understand what you are saying here?

lcnr commented 1 year ago

I don't think this restriction can be enforced with the best example being const fn TypeId::of (which we should stabilize at some point) for for<'a> fn(&'a ()) and fn(&'static ()).

Does that make sense to you? I don't have a clear image of why you need that invariant, so I might not make much sense here

Jules-Bertholet commented 1 year ago

The "one impl block" rule isn't necessary for soundness. It's meant to make some footguns harder to shoot, not to uphold some absolutely necessary invariants. To elaborate:

Let's say you have a trait Trait, and a type T, such that T: Trait does not hold, but there exists a type U such that T <: U and U: Trait. Then, the following code works just fine:

fn takes_a_u(_: U) {}

fn takes_a_t(t: T) {
    // `t: T` coerced to supertype `U`
    takes_a_u(t);
}

However, the following doesn't work today:

fn takes_impl_trait(_: impl Trait) {}

fn takes_a_t(t: T) {
    // Compile error!
    // `t: T` not coerced to supertype `U`
    takes_impl_trait(t);
}

To make pattern types ergonomic, the second example most likely needs to compile. The issue is that there might be more than one possible U that satisfies the trait bounds, so there needs to be a way to pick a "best" one among them. The proposed coherence rules are meant to ensure that:

  1. This "best" supertype can be chosen deterministically.
  2. It should be difficult to construct a situation where adding a new trait impl to a type breaks code. Like how implementing IntoIterator for arrays needed a per-edition hack because of the interaction with auto-ref; I want to avoid that kind of thing.

The "one impl block" rule is meant to ensure #2. But "difficult" doesn't need to mean "impossible", as type inference breakage is technically "minor breakage." It's OK in terms of soundness for trait impls to have different behavior for subtypes with different TypeId's, but it should be hard to do accidentally.

Does that make sense?

programmerjake commented 1 year ago

can someone explain how this could be a blocking issue for const type_name? naively one could think that the result of type_name is purely for debugging and not intended to be compared against and has no stable value, so it shouldn't matter if it returns different values for the same type in different places...

lcnr commented 1 year ago

can someone explain how this could be a blocking issue for const type_name?

using const type_name you can move different type names for the same type into the trait system. While this mostly just surprising at runtime, it is actually unsound at compile time. The snippet in the issue description causes a segfault without using unsafe code.

BoxyUwU commented 1 year ago

As of #103291 I cannot manage to create an unsoundness repro for this since it's not possible to compare TypeId at compile time anymore. nonetheless this is still an issue and should be fixed.

soqb commented 1 year ago

i think the real issue here is not that equal types have different TypeIds, nor that we can compare TypeIds at compile time. the issue stems from the interaction between generic_const_exprs and const_type_id.

probably the most obvious path to reconciling this issue in this view is disallowing reflexive methods TypeId::of, std::any::type_name (and maybe size_of/align_of) when defining an associated type, so in the issue's example we would get a compiler error:

impl<T: 'static> WithAssoc<()> for T where [(); <T as AssocCt>::ASSOC]: {
    // ERROR: cannot use reflexive method `TypeId::of` in a const context of an associated type.
    type Assoc = [u8; <T as AssocCt>::ASSOC];
}
BoxyUwU commented 1 year ago

I disagree, its extremely weird to have a function that returns different values when all the inputs are the same. if two types T and U are the same type then foo::<T> and foo::<U> should not do different things. TypeId::of violates this which is an odd thing to have in the language.

Jules-Bertholet commented 1 year ago

Could we decide that TypeId::of<T>() requires T: ~const Any, where any T: 'static is Any, but const Any requires T to also have no higher-ranked lifetimes?

BoxyUwU commented 1 year ago

I have no idea if that's realistic to implement. I dislike that solution regardless, it's weird to have TypeId::of have such different requirements when used at compile time when there is still weird behaviour when used at runtime. I don't think compile timeness is special here at all, soundness bugs are nice to demonstrate using the type system because you can't argue any unsafe code is wrong, its an unarguable way of demonstrating unsoundness.

The fact that the "weird behaviour" is unsound in the type system imo is a good indicator that this is just undesirable behaviour. I am confident you could create an unsoundness example involving unsafe code and library invariants that seem completely reasonable but break down from this bug.

I cannot see any issue here other than that equal types have unequal typeids, that's just wrong imo ^^

BoxyUwU commented 1 year ago

example:

type One = for<'a> fn(&'a (), &'a ());
type Two = for<'a, 'b> fn(&'a (), &'b ());

mod my_api {
    use std::any::Any;
    use std::marker::PhantomData;

    pub struct Foo<T: 'static> {
        a: &'static dyn Any,
        _p: PhantomData<*mut T>, // invariant, the type of the `dyn Any`
    }

    impl<T: 'static> Foo<T> {
        pub fn deref(&self) -> &'static T {
            match self.a.downcast_ref::<T>() {
                None => unsafe { std::hint::unreachable_unchecked() },
                Some(a) => a,
            }
        }

        pub fn new(a: T) -> Foo<T> {
           Foo::<T> {
                a: Box::leak(Box::new(a)),
                _p: PhantomData,
            } 
        }
    }
}

use my_api::*;

fn main() {
    let foo = Foo::<One>::new((|_, _| ()) as One);
    foo.deref();
    let foo: Foo<Two> = foo;
    foo.deref();
}

Imo this is completely reasonable unsafe code but is currently fn main has UB. Foo is invariant over T so the type cant change and dyn Any has type T. You should be able to assume that dyn Any does infact have type T. There is no compile time comparing of TypeId anywhere in this code.

type-system only reproductions of unsoundness are nice because you cant look at some unsafe code and argue its wrong because actually the bug in the language isn't a bug. currently there's no way to write a type-system only repro afaik and the previous way used generic_const_exprs but that doesnt mean that this bug is now fixed, or that generic_const_exprs is at fault.

BoxyUwU commented 1 year ago

Tbh given the above code example I'm just gonna remove the F-generic_const_exprs and requires-nightly labels since the TypeId weirdness is visible on stable in ways that can lead to unsoundness or crashes in code that i cant imagine us not expecting to work

danielhenrymantilla commented 8 months ago

By the way this makes ::qcell's TCell{,Owner} API unsound, as pointed out by @kvverti over Discord.

use ::qcell::{TCell, TCellOwner};

type Q1 = for<'any> fn(fn(&'any ()));
type Q2 = fn(fn(&'static ()));
let a = TCellOwner::<Q1>::new();
let b = TCellOwner::<Q2>::new(); // does not panic since `TypeId::of::<Q2>()` is distinct.
let x = TCell::new(42);
::core::mem::swap(
    x.rw(&mut { a }),
    x.rw(&mut { b }), // Q2 -> Q1 even though `TCellOwner<_>` is invariant
);

Adding to what @BoxyUwU mentioned, it is interesting to observe that we currently have TypeId : Eq rather than TypeId : PartialEq, which makes @nikomatsakis'

we have only one direction: type_id::<T1>() == type_id::<T2>() implies T1 == T2, but not vice versa

even more of a footgun for the unaware.