`TypeId` and Polymorphization

davidtwco commented 4 years ago

When polymorphization is enabled (-Zpolymorphization=on), then this can currently be observed by TypeId. For example, the following code when built with a polymorphization-enabled compiler will fail the asserts (and succeeds normally; playground):

fn nop<T>() {}

fn test_the_waters<A: 'static, B: 'static>(a: A, _: B) {
    let any: Box<dyn std::any::Any> = Box::new(a);
    assert!(any.downcast_ref::<A>().is_some());
    assert!(any.downcast_ref::<B>().is_none());
    assert!(any.downcast_ref::<fn()>().is_none());
}

fn main() {
    test_the_waters(nop::<u32>, nop::<u64>);
}

This issue was discovered while investigating an earlier regression. After #74538 landed, the example shown below, where a polymorphized closure is indirectly passed to TypeId, would result in an ICE (discussion from that investigation):

use std::any::TypeId;

pub fn foo<T: 'static>(_: T) -> TypeId {
    TypeId::of::<T>()
}

fn outer<T: 'static>() -> TypeId {
    foo(|| ())
}

fn main() {
    assert!(outer::<u8>() != outer::<u16>());
}

In #74717, the check introduced in #74538 was modified so that this case (and another unrelated case) would no longer ICE for polymorphized functions (discussion regarding that PR). There have been more recent discussion regarding this in a dedicated Zulip topic.

Unresolved questions:

[ ] What is the appropriate behaviour here w/r/t polymorphization?
- Current consensus appears to be that functions involving reflection cannot be polymorphized.
[ ] How do we fix this?
- Without triggering cycle errors, detecting when polymorphization is not appropriate due to later use of TypeId seems very challenging.

lqf96 commented 4 years ago

I actually have a crazy idea on this and am not sure if it would help with the problem... The idea is that we can polymorphize core::mem::size_of::<T>, core::mem::align_of::<T> and any::TypeId::of::<T> by implicitly including the VTable of trait Any for all T types to be instantiated. Thus, the following function

pub fn foo<T: 'static>(_: T) -> (usize, usize, TypeId) {
    (size_of::<T>, align_of::<T>, TypeId::of::<T>())
}

would be turned into

pub fn foo(vtable: &AnyVTable) -> (usize, usize, TypeId) {
    // `vtable.type_id_impl` implements virtual method `Any::type_id`
    (vtable.size, vtable.align, (vtable.type_id_impl)())
}

In case the parameter of type T needs to be used, an implicit VTable reference to type T could be added as an extra parameter of the function. This approach is basically "erasing" the T type parameter, and may be applied to any other traits in the future to support type-erased generics. However, this requires changes to the function signature and depending on the implementation of the Rust compiler this may not be viable.

davidtwco commented 4 years ago

However, this requires changes to the function signature and depending on the implementation of the Rust compiler this may not be viable.

I can't speak to how viable this exact suggestion would be, but I think it would run into the same issues that we see when detecting when not to polymorphize functions like foo.

Consider the following example,

pub fn bar<A>() {
    let _ = foo(|| ());
}

pub fn foo<B: 'static>(_: B) -> (usize, usize, TypeId) {
    (size_of::<B>, align_of::<B>, TypeId::of::<B>())
}

bar and bar::{{closure}}#0 will be polymorphized over A (their substitutions will continue to contain generic parameters). Within monomorphization collection, at the point where we are collecting foo, bar and its closure will already have been polymorphized and so the only context that we would have while collecting foo was that it was instantiated with B being polymorphized bar::{{closure}}#0, and so the intrinsics would be invoked with something polymorphized.

This is hard to fix because we'd need to know how bar::{{closure}}#0 would be used to change how we polymorphize it, and we can't do that without hitting cycle errors in the query system.

With that context, given that the caller of foo would need to change under your proposal, we would run into a similar issue that we do just trying to avoid polymorphizing in this case (if we wanted to take this approach all of the time then that might be different, but I suspect we wouldn't). It's possible that there's a subtle difference that I've missed or that I'm just incorrect.

lqf96 commented 4 years ago

Ok I kinda got it... So currently the difficulty lies more on the dependency and potential cycle issues of different queries, rather than on how polymorphization could be done theoretically. I do have one more questions though - why would it be possible to polymorphize size_of and align_of then? Shouldn't it be that you always need to look into the function body to know what intrinsics are getting called on what generic parameters, and why is TypeId different from the other two? I suspect this might have something to do with the queries used by TypeId-related intrinsics, but since I'm not familiar with the Rust compiler I am not certain about that.

davidtwco commented 4 years ago

Why would it be possible to polymorphize size_of and align_of then?

As I understand it, size_of and align_of compute the layout of a type to get those values. If a generic parameter is in a type then that is a result of polymorphization (with the current analysis). Bugs notwithstanding, that generic parameter is unused, so the layout computation won't have actually looked at the generic parameter.

Most cases look something like the example shown below. Calling size_of::<B> will mark B used in callee, and A will be marked used in caller because it is used in the substitutions to callee.

fn caller<A>(x: A) -> usize {
    callee::<A>()
}

fn callee<B>() -> usize {
    size_of::<B>()
}

It's only when the closures have inherited unused parameters, and those closures are passed to other functions, that polymorphized parameters can end up being provided to intrinsics. However, even in that case, the unused generic parameter in the closure that was passed to the function wouldn't affect the output of size_of or align_of (I think, not 100% sure, but that's my understanding).

type_name and type_id are troublesome because an unused parameter will change the output, and that's observable. type_name and type_id both work fine if you give them a polymorphized function, but you can observe that polymorphization has happened.

lcnr commented 4 years ago

However, even in that case, the unused generic parameter in the closure that was passed to the function wouldn't affect the output of size_of or align_of (I think, not 100% sure, but that's my understanding).

If the size of the closure were to depend on the unused generic parameter, that parameter would not be used, would it :laughing: And we aren't able to pull the polymorphized generic parameter out of the closure (at least we shouldn't be able to do so), so we can't look at the polymorphized param itself.

vlad20012 commented 1 year ago

Couldn't we use a next Rust edition to change the behavior of TypeId w/r/t polymorphization? It seems pretty intuitive that closures created in one instance of a generic function should have the same type as the same closure created in another instance of the same generic function if the closure does not use type parameters that differ between these instances. Of course it would require an RFC, and perhaps it's a good time for such RFC to try to land it into 2024 edition :)

bjorn3 commented 1 year ago

This can't change between editions. No type system changes are possible using editions, only syntactical changes. Even if the type_id intrinsic were to check the edition of the caller, this would be the edition of the TypeId::of function, which is whatever the edition of libcore is, not the edition of your user code. If TypeId::of were to return different values depending on the calling crate's edition somehow, I think you may get unsoundness in Any due to inconsistent type id values.

Kobzol commented 1 year ago

Would it be possible to enable polymorphization only in leaf functions, if we know that there's no reflection and no other function is called?

bjorn3 commented 1 year ago

Doing it only for leaf functions would erase most benefit I think. Almost every function in rust calls at least one other function.

For

fn nop<T>() {}

fn test_the_waters<A: 'static, B: 'static>(a: A, _: B) {
    let any: Box<dyn std::any::Any> = Box::new(a);
    assert!(any.downcast_ref::<A>().is_some());
    assert!(any.downcast_ref::<B>().is_none());
    assert!(any.downcast_ref::<fn()>().is_none());
}

fn main() {
    test_the_waters(nop::<u32>, nop::<u64>);
}

could we pass unpolymorphized nop::<u32> and nop::<u64> to test_the_waters as A and B are used type parameters? And for

use std::any::TypeId;

pub fn foo<T: 'static>(_: T) -> TypeId {
    TypeId::of::<T>()
}

fn outer<T: 'static>() -> TypeId {
    foo(|| ())
}

fn main() {
    assert!(outer::<u8>() != outer::<u16>());
}

consider the T type parameter of outer as used as it is used in the closure type passed to the used T type parameter of foo? In this case the closure itself could still be polymorphized as it is ABI compatible with the non-polymorphized versions. This would block a decent amount of polymorphization in the presence of closures and function pointers, but should be sound I think.

Kobzol commented 1 year ago

I was thinking more about small functions on generic containers, like Vec::push (or rather some simpler functions in Vec, push, might not be that small :) ). But these will probably be inlined into their caller anyway, and then the problem just moves one step further, that's tricky.

lcnr commented 1 year ago

For context: I believe we should fundamentally change polymorphization. I badly implemented that idea in https://github.com/lcnr/rust/pull/6. I stopped because I didn't understand the type system that well back then and because this is a bit :/ with the old trait solver. Also, my priorities have changed since then to focus on #107374 instead.

The tl;dr of the new approach is as follows:

query unused_generic_params instead only considers params used if they are directly used in relevant ways. Relevant ways aren 1) layout computation 2) const eval 3) fn type_of and fn type_name. It ignores nested function calls. Should probably be called query shallowly_used_params or whatever instead.
during monomorphization we collect all nested mono items for the fully monomorphic instance, but remember how much of our concrete generic arguments was used. We then polymorphize our current mono item after this.

This approach fixes the TypeId issue, because T of TypeId::of is considered as required, so any call to TypeId::of forces the monomorphic argument for T to remain fully monomorphic.

I don't think it makes sense to keep pushing on polymorphization for now as there are still bigger wins via improvements to the type systems and less involved mir opts. This approach will also be easier to get right once our type system is better :grin:

vlad20012 commented 1 year ago

I don't think it makes sense to keep pushing on polymorphization for now as there are still bigger wins via improvements to the type systems and less involved mir opts.

I believed that polymorphization is about reducing the size of produced binaries and not about compilation performance

Kobzol commented 1 year ago

I think that linkers are nowadays relatively good at removing identical functions from the final binary (not sure if rustc uses ICF when compiling Rust programs by default though, but you can probably set it yourself). My impression was that polymorphization was mostly done to reduce compilation time and the amount of IR that we stuff into LLVM.

rust-lang / rust

`TypeId` and Polymorphization #75325