Open davidtwco opened 4 years ago
I actually have a crazy idea on this and am not sure if it would help with the problem... The idea is that we can polymorphize core::mem::size_of::<T>
, core::mem::align_of::<T>
and any::TypeId::of::<T>
by implicitly including the VTable of trait Any
for all T
types to be instantiated. Thus, the following function
pub fn foo<T: 'static>(_: T) -> (usize, usize, TypeId) {
(size_of::<T>, align_of::<T>, TypeId::of::<T>())
}
would be turned into
pub fn foo(vtable: &AnyVTable) -> (usize, usize, TypeId) {
// `vtable.type_id_impl` implements virtual method `Any::type_id`
(vtable.size, vtable.align, (vtable.type_id_impl)())
}
In case the parameter of type T
needs to be used, an implicit VTable reference to type T
could be added as an extra parameter of the function. This approach is basically "erasing" the T
type parameter, and may be applied to any other traits in the future to support type-erased generics. However, this requires changes to the function signature and depending on the implementation of the Rust compiler this may not be viable.
However, this requires changes to the function signature and depending on the implementation of the Rust compiler this may not be viable.
I can't speak to how viable this exact suggestion would be, but I think it would run into the same issues that we see when detecting when not to polymorphize functions like foo
.
Consider the following example,
pub fn bar<A>() {
let _ = foo(|| ());
}
pub fn foo<B: 'static>(_: B) -> (usize, usize, TypeId) {
(size_of::<B>, align_of::<B>, TypeId::of::<B>())
}
bar
and bar::{{closure}}#0
will be polymorphized over A
(their substitutions will continue to contain generic parameters). Within monomorphization collection, at the point where we are collecting foo
, bar
and its closure will already have been polymorphized and so the only context that we would have while collecting foo
was that it was instantiated with B
being polymorphized bar::{{closure}}#0
, and so the intrinsics would be invoked with something polymorphized.
This is hard to fix because we'd need to know how bar::{{closure}}#0
would be used to change how we polymorphize it, and we can't do that without hitting cycle errors in the query system.
With that context, given that the caller of foo
would need to change under your proposal, we would run into a similar issue that we do just trying to avoid polymorphizing in this case (if we wanted to take this approach all of the time then that might be different, but I suspect we wouldn't). It's possible that there's a subtle difference that I've missed or that I'm just incorrect.
Ok I kinda got it... So currently the difficulty lies more on the dependency and potential cycle issues of different queries, rather than on how polymorphization could be done theoretically. I do have one more questions though - why would it be possible to polymorphize size_of
and align_of
then? Shouldn't it be that you always need to look into the function body to know what intrinsics are getting called on what generic parameters, and why is TypeId
different from the other two? I suspect this might have something to do with the queries used by TypeId
-related intrinsics, but since I'm not familiar with the Rust compiler I am not certain about that.
Why would it be possible to polymorphize
size_of
andalign_of
then?
As I understand it, size_of
and align_of
compute the layout of a type to get those values. If a generic parameter is in a type then that is a result of polymorphization (with the current analysis). Bugs notwithstanding, that generic parameter is unused, so the layout computation won't have actually looked at the generic parameter.
Most cases look something like the example shown below. Calling size_of::<B>
will mark B
used in callee
, and A
will be marked used in caller
because it is used in the substitutions to callee
.
fn caller<A>(x: A) -> usize {
callee::<A>()
}
fn callee<B>() -> usize {
size_of::<B>()
}
It's only when the closures have inherited unused parameters, and those closures are passed to other functions, that polymorphized parameters can end up being provided to intrinsics. However, even in that case, the unused generic parameter in the closure that was passed to the function wouldn't affect the output of size_of
or align_of
(I think, not 100% sure, but that's my understanding).
type_name
and type_id
are troublesome because an unused parameter will change the output, and that's observable. type_name
and type_id
both work fine if you give them a polymorphized function, but you can observe that polymorphization has happened.
However, even in that case, the unused generic parameter in the closure that was passed to the function wouldn't affect the output of
size_of
oralign_of
(I think, not 100% sure, but that's my understanding).
If the size of the closure were to depend on the unused generic parameter, that parameter would not be used, would it :laughing: And we aren't able to pull the polymorphized generic parameter out of the closure (at least we shouldn't be able to do so), so we can't look at the polymorphized param itself.
Couldn't we use a next Rust edition to change the behavior of TypeId
w/r/t polymorphization? It seems pretty intuitive that closures created in one instance of a generic function should have the same type as the same closure created in another instance of the same generic function if the closure does not use type parameters that differ between these instances. Of course it would require an RFC, and perhaps it's a good time for such RFC to try to land it into 2024 edition :)
This can't change between editions. No type system changes are possible using editions, only syntactical changes. Even if the type_id
intrinsic were to check the edition of the caller, this would be the edition of the TypeId::of
function, which is whatever the edition of libcore is, not the edition of your user code. If TypeId::of
were to return different values depending on the calling crate's edition somehow, I think you may get unsoundness in Any
due to inconsistent type id values.
Would it be possible to enable polymorphization only in leaf functions, if we know that there's no reflection and no other function is called?
Doing it only for leaf functions would erase most benefit I think. Almost every function in rust calls at least one other function.
For
fn nop<T>() {}
fn test_the_waters<A: 'static, B: 'static>(a: A, _: B) {
let any: Box<dyn std::any::Any> = Box::new(a);
assert!(any.downcast_ref::<A>().is_some());
assert!(any.downcast_ref::<B>().is_none());
assert!(any.downcast_ref::<fn()>().is_none());
}
fn main() {
test_the_waters(nop::<u32>, nop::<u64>);
}
could we pass unpolymorphized nop::<u32>
and nop::<u64>
to test_the_waters
as A
and B
are used type parameters? And for
use std::any::TypeId;
pub fn foo<T: 'static>(_: T) -> TypeId {
TypeId::of::<T>()
}
fn outer<T: 'static>() -> TypeId {
foo(|| ())
}
fn main() {
assert!(outer::<u8>() != outer::<u16>());
}
consider the T
type parameter of outer
as used as it is used in the closure type passed to the used T
type parameter of foo
? In this case the closure itself could still be polymorphized as it is ABI compatible with the non-polymorphized versions. This would block a decent amount of polymorphization in the presence of closures and function pointers, but should be sound I think.
I was thinking more about small functions on generic containers, like Vec::push
(or rather some simpler functions in Vec
, push
, might not be that small :) ). But these will probably be inlined into their caller anyway, and then the problem just moves one step further, that's tricky.
For context: I believe we should fundamentally change polymorphization. I badly implemented that idea in https://github.com/lcnr/rust/pull/6. I stopped because I didn't understand the type system that well back then and because this is a bit :/ with the old trait solver. Also, my priorities have changed since then to focus on #107374 instead.
The tl;dr of the new approach is as follows:
query unused_generic_params
instead only considers params used if they are directly used in relevant ways. Relevant ways aren 1) layout computation 2) const eval 3) fn type_of
and fn type_name
. It ignores nested function calls. Should probably be called query shallowly_used_params
or whatever instead.This approach fixes the TypeId
issue, because T
of TypeId::of
is considered as required, so any call to TypeId::of
forces the monomorphic argument for T
to remain fully monomorphic.
I don't think it makes sense to keep pushing on polymorphization for now as there are still bigger wins via improvements to the type systems and less involved mir opts. This approach will also be easier to get right once our type system is better :grin:
I don't think it makes sense to keep pushing on polymorphization for now as there are still bigger wins via improvements to the type systems and less involved mir opts.
I believed that polymorphization is about reducing the size of produced binaries and not about compilation performance
I think that linkers are nowadays relatively good at removing identical functions from the final binary (not sure if rustc
uses ICF when compiling Rust programs by default though, but you can probably set it yourself). My impression was that polymorphization was mostly done to reduce compilation time and the amount of IR that we stuff into LLVM.
When polymorphization is enabled (
-Zpolymorphization=on
), then this can currently be observed byTypeId
. For example, the following code when built with a polymorphization-enabled compiler will fail the asserts (and succeeds normally; playground):This issue was discovered while investigating an earlier regression. After #74538 landed, the example shown below, where a polymorphized closure is indirectly passed to
TypeId
, would result in an ICE (discussion from that investigation):In #74717, the check introduced in #74538 was modified so that this case (and another unrelated case) would no longer ICE for polymorphized functions (discussion regarding that PR). There have been more recent discussion regarding this in a dedicated Zulip topic.
Unresolved questions:
TypeId
seems very challenging.