Open KodrAus opened 4 years ago
I'm not sure I totally understand the steps needed to re-stabilize const fn TypeId::of()
.
- [ ] Come up with a new scheme for collision resistant type ids
- We need a new scheme for building type ids that's collision resistant. We'd like to do this before stabilizing type ids in constant contexts
Does this affect the soundness of const fn TypeId::of()
? If we are really worried about collisions, then shouldn't the entirety of TypeId
be unstable, and not just the const
version? Or are we worried that a future collision-resistant implementation of TypeId
may not satisfy const
, i.e. we want to keep open the possibility of a non-const implementation in the future?
- [ ] Figure out any safeguards we need to prevent messing with the bits of type id directly
- How can we prevent abuses of transmuting type ids into integers at compile time (where we have fewer tools to deal with them)? See the discussion in #75923
This seems impossible. If someone wants to transmute()
to the type of the private implementation (i.e. u64
) in an unsafe
code block, even though the documentation for TypeId
explicitly says not to do this, can we really stop them? (They can already do it at runtime, not sure what "tools" prevent this.) Can we really stop anyone from doing unsafe things with unsafe
code?
Motivation: It would be extremely useful to make const
comparisons between TypeId
in order to infer whether two types are equal at compile time. Currently this only appears to be possible using specialization (#31844), which is unsound and will not be stabilized anytime soon.
A perhaps related concern (may need a separate tracking issue) is that #derive[(PartialEq, Eq, ...)] struct TypeId
does not generate const impl
for these traits (see #77695), so even if const fn TypeId::of()
gets re-stabilized we still won't be able to compare types at compile time without writing a const impl PartialEq for TypeId
, etc.
@KodrAus Going through the comment that was the basis of reverting stabilization (https://github.com/rust-lang/rust/pull/75923#issuecomment-696676511):
TypeId
is one a few types (mem::Discriminant
is the only other one I can think of) that's a thin wrapper for a private integer, at least right now. This is ripe for abuse no matter what we say/document it as.
Sure, but this suggests that retrieving a TypeId
should be unsafe
not unstable
. Any
is currently both stable and safe despite heavily relying on const fn TypeId::of()
. If there are serious concerns that const fn TypeId::of()
may result in collisions then Any
should be unstable as well.
I'm not saying we should cater to this, but rather avoid stabilizing even worse abuse-enabling tools like
const fn TypeId::of
. Like I said, I am already aware of usecases where compile-timetransmute(TypeId::of::<T>())
is an uniquely useful tool and someone might not think twice before releasing a crate which uses it somewhere internally and becomes widely-used.
I think @benkay86 summed this up fairly well.
If someone wants to
transmute()
to the type of the private implementation (i.e.u64
) in anunsafe
code block, even though the documentation forTypeId
explicitly says not to do this, can we really stop them? (They can already do it at runtime, not sure what "tools" prevent this.) Can we really stop anyone from doing unsafe things withunsafe
code?
And finally, @eddyb remarks that we could potentially re-stabilize it if we are unlikely to change it.
If we decide we're unlikely to change the implementation of
TypeId
, then we can re-stabilize theconst fn
aspect, and live with the consequences. But if we do want to change the implementation away from a hash, we have to either do that first, or at least teachmiri
about it somehow so that it disallows using the value as a plain integer (for arithmetic or e.g. as an array type length).
It has been nearly 10 months since it was marked unstable and there have been no changes regarding this issue. At this point it seems like the implementation of TypeId
is unlikely to change any time in the near future.
Plus even if we assume all of these points are true, these concerns only apply to intrinsics::type_id::<T>() -> u64
not const fn TypeId::of::<T>() -> TypeId
.
If anything, the heavy reliance on const fn TypeId::of()
inside Any
suggests that it is stable but could potentially be unsafe to use directly. If there are concerns that TypeId
can be abused without unsafe code then const fn TypeId::of()
should be marked as unsafe instead. However I do not think this is necessary since you would need to use transmute()
to extract the raw type id integer from TypeId
.
intrinsics::type_id::<T>()
should remain unstable as it provides access to the raw type id integer, but const fn TypeId::of()
should be stabilized as it provides adequate protection.
At this point it seems like the implementation of
TypeId
is unlikely to change any time in the near future.
Nobody tried AFAIK. 10 months is pretty short on the scale of Rust evolution, especially when you take into account the year 2020.
But also I believe it got stalled because of this potential breakage: https://github.com/rust-lang/rust/pull/75923#issuecomment-699096904
In order to guarantee code that transmute
s to u64
gets broken, we can't just replace TypeId
with a pointer, but because we still want the 64-bit hash (for both "fast reject (!=
)" and Hash
, see https://github.com/rust-lang/rust/pull/75923#issuecomment-684917161), we can probably do this:
struct TypeId {
hash: u64,
// `str` can be replaced with a length-prefixed string to keep `TypeId`'s size down
v0_mangled_type: &'static str,
}
impl PartialEq for TypeId {
fn eq(&self, other: &TypeId) -> bool {
self.hash == other.hash // fast reject (!=)
&& (
self.v0_mangled_type as *const str == other.v0_mangled_type as *const str // fast accept (==)
|| self.v0_mangled_type == other.v0_mangled_type
)
}
}
That way, it's always more than 8 bytes, no matter the target pointer size, and so a transmute
to u64
will always error.
As for the linker concerns, if we take this specific approach, the only difference relying on their features would make is increasingly the situations in which the "fast accept (==
)" path gets taken, i.e. an optimization.
I could try make a PR for this, but it would likely at least require https://github.com/rust-lang/rust/pull/87194 (to avoid breaking when unstable const
generic features are used).
If there are concerns that
TypeId
can be abused without unsafe code thenconst fn TypeId::of()
should be marked as unsafe instead.
But we don't want that. You can't change TypeId
or Any
to be unsafe
, they're stable, and nearly-sound.
struct TypeId { hash: u64, // `str` can be replaced with a length-prefixed string to keep `TypeId`'s size down v0_mangled_type: &'static str, } impl PartialEq for TypeId { fn eq(&self, other: &TypeId) -> bool { self.hash == other.hash // fast reject (!=) && ( self.v0_mangled_type as *const str == other.v0_mangled_type as *const str // fast accept (==) || self.v0_mangled_type == other.v0_mangled_type ) } }
Maybe it would be better to use this to shrink TypeId
to only 16 bytes on 64-bit platforms:
use core::{ptr, slice};
#[repr(C)]
struct StrWithLen<T: ?Sized> {
len: u16, // assume all TypeId strings are <64kB
text: T,
}
pub struct TypeId {
hash: u64,
text: &'static StrWithLen<()>, // actually StrWithLen<str>
}
impl PartialEq for TypeId {
fn eq(&self, rhs: &Self) -> bool {
if self.hash != rhs.hash {
false
} else if ptr::eq(self.text, rhs.text) {
true
} else {
unsafe {
let lhs_text = slice::from_raw_parts(
&self.text.text as *const _ as *const u8,
self.text.len as usize,
);
let rhs_text = slice::from_raw_parts(
&rhs.text.text as *const _ as *const u8,
rhs.text.len as usize,
);
lhs_text == rhs_text
}
}
}
}
Yeah, that's what I meant by "length-prefixed". There's a "standard" way to do that using extern { type }
, I just didn't want to type it in.
Update: finally opened #95845 with a (v0) mangling-based TypeId
representation (should unblock const fn
TypeId::of
, while at the same time making specific uses of TypeId
incompatible with CTFE, for now).
blocking the stabilization of this on #97156
Why would anyone use transmute
to extract integer value from TypeId
?
Use Hash::hash
to avoid unsafe
ty :)
I know that Hash
implementation for TypeId
does not guarantee to emit inner value from TypeId
unmodified, but transmuting opaque type is no better.
And why it is a problem anyway? I thought that transmuting arbitrary u64
back to TypeId
is.
I'd say this is blocked on #99189 for now, if for no other reason than the "structural match" aspect.
(i.e. future-proofing TypeId
by disallowing using TypeId
constants as patterns and requiring ==
)
to work around #97156 and #99189, surely we can just add a new const unsafe fn const_of() -> TypeId
to TypeId
or even a new ConstTypeId
type that uses the v0 mangled type. it seems the only problem with making TypeId::of()
const
is that it must be either non-const
and safe or const
and unsafe
and since making a method unsafe
is a breaking change, we cannot make it const
. it seems obvious to me just to add a new const unsafe fn
method and/or type that can explicitly warn the user of its soundness holes and potentially deprecate TypeId::of()
.
I'd say this is blocked on #99189 for now, if for no other reason than the "structural match" aspect. (i.e. future-proofing
TypeId
by disallowing usingTypeId
constants as patterns and requiring==
)
@eddyb What is the reasoning behind disallowing a structural match? While I can not name any specific reasons for its inclusion, I also can not think of any reasons to remove it either.
Take for example this code. It does not perform any complex pattern matching or require that a match
statement be used, but it arguably makes the code a bit cleaner and easier to read than the alternative.
trait ConstTypeId {
const ID: TypeId;
}
impl<T: 'static> ConstTypeId for T {
const ID: TypeId = TypeId::of::<T>();
}
match foo.type_id {
u16::ID | u32::ID | u64::ID => do_foo(&foo.data),
Foo::ID | Bar::ID | Baz::ID => do_bar(&foo.data),
_ => do_default(&foo.data),
}
This would generally serve the use case of "I should probably be using an enum
, but there are a lot of variants and it is a pain to write it out". I'm partially joking, but there are some use cases where an enum
can be more cumbersome such as when dealing with highly generic systems and associating resources with anonymous traits. That being said, I am unsure if those systems actually make use of use a match
as to ==
.
The problem is any future changes to TypeId
that would require address-based comparison in a custom PartialEq
implementation.
(See all previous discussion across this issue and my two closed PRs on the matter of non-trivial TypeId
representations)
Honestly, our structural match rules are very loose and prone to bite us eventually.
What that is means that in principle no core/alloc/std type with private and unguaranteed internals should auto-derive PartialEq, Eq
and provide a compile-time constructor, because that "locks in" the fact that the impls are auto-derived, i.e. that all fields themselves are Eq
.
It's not a clear independent opt-in, for ergonomics reasons, but that also makes it forwards-incompatable to "let it happen". (This applies to any crates but the standard library doesn't get to just release a new semver-incompat version, and editions offer only limited protection, and tend to have 3yr timeframes)
EDIT: oh, another note on structural match: I believe it was never an intentionally designed feature, just match
falling back to (even overloaded!) ==
, from pre-1.0 days, that we didn't remove in time, so we had to paper over it with whatever semantics let most existing code keep compiling (IIRC we only took a stance on floats and even that took many years).
This would generally serve the use case of "I should probably be using an enum, but there are a lot of variants and it is a pain to write it out". but there are some use cases where an enum can be more cumbersome such as when dealing with highly generic systems and associating resources with anonymous traits.
I actually am just writing such a system, and that's how I found out about this issue. And given the prospect that structural matching of TypeId
could be gone in the future, I ended up using a set of macros to generate a big enum and a set of helper function to make working with it easier.
Not arguing for keeping it, just want to provide a data point.
It seems const fn type_id()
is going to take a long time to stabilize due to many concerns about soundness and corner cases of people doing something with the returned TypeId
other than testing equality against another TypeId
. Would it be possible instead to make Any::is<T>(&self)
const
? This way all the internals of TypeId
stay inside the standard library.
I think a const-enabled Any::is
that hid the specific equality mechanism would satisfy my use-case for const type ids too.
@eddyb did we have any lingering objection to being able to do this kind of type inspection at all in CTFE, or was it just the issue of structural match on TypeId
that was the blocker?
I guess that most of the other use-cases would be covered by allowing TypeId::of::<T>()
in const context, without allowing to do anything with it but assigning to a constant.
Looks like TypeId suddenly stopped implementing StructuralEq. Why? I was previously able to use this function without a problem:
const PIXMAP_TYPE_ID: TypeId = TypeId::of::<Pixmap>();
const MASK_TYPE_ID: TypeId = TypeId::of::<Mask>();
const VEC_U8_TYPE_ID: TypeId = TypeId::of::<Vec<u8>>();
fn type_name_for_id(id: TypeId) -> &'static str {
match id {
PIXMAP_TYPE_ID => "Pixmap",
MASK_TYPE_ID => "Mask",
VEC_U8_TYPE_ID => "Vec<u8>",
_ => "Unknown"
}
}
Removing structural match for TypeId
was done in https://github.com/rust-lang/rust/pull/103291 in part to unblock exactly this feature, see the discussion above for reasoning behind the removal (shortened in https://github.com/rust-lang/rust/issues/77125#issuecomment-1221451165).
Would it be possible to restore backward compatibility by having the compiler convert a match
with TypeId constants into an else if
chain when there were only a handful of branches, and a lookup in a const std::collections::HashMap
otherwise?
Yes, but that will require stabilization of const_trait_impls
first, as otherwise we can't call PartialEq methods inside the const fn.
Most discussion about type ids in const
contexts is happening here (naturally, since there's nothing to do with a type id except compare them), but there's also #101871 tracking using ==
in const contexts.
I think https://github.com/rust-lang/rust/issues/10389 needs to be resolved first, becuase depending on the resolution, it may not be possible to calculate the TypeId at compile time; it might end up like C++ std::type_index
that uses the address of the RTTI structure as the ID.
We can compute and equate type ids that are addresses just fine at compile time. You just can't inspect the address (so no sorting type ids, no hashing them, no printing them)
blocking the stabilization of this on #97156
This was recently marked fixed with https://github.com/rust-lang/rust/pull/118247
Edit: started a discussion https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Const.20.60TypeId.3A.3Aof.60.2C.20.60TypeId.3A.3Amatches.60.2C.20.60type_name.60.2C
Edit2: From the discussion, this is still blocked by the possibility of collision https://github.com/rust-lang/rust/issues/10389
Any further updates on this ?
Still blocked on https://github.com/rust-lang/rust/issues/10389.
This is a tracking issue for using
TypeId::of
in constant contexts. The feature gate for the issue is#![feature(const_type_id)]
.About tracking issues
Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.
Steps
Unresolved Questions
Implementation history