rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.3k stars 12.72k forks source link

Tracking Issue for `const fn` `type_id` #77125

Open KodrAus opened 4 years ago

KodrAus commented 4 years ago

This is a tracking issue for using TypeId::of in constant contexts. The feature gate for the issue is #![feature(const_type_id)].

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Unresolved Questions

Implementation history

benkay86 commented 3 years ago

I'm not sure I totally understand the steps needed to re-stabilize const fn TypeId::of().

  • [ ] Come up with a new scheme for collision resistant type ids
  • We need a new scheme for building type ids that's collision resistant. We'd like to do this before stabilizing type ids in constant contexts

Does this affect the soundness of const fn TypeId::of()? If we are really worried about collisions, then shouldn't the entirety of TypeId be unstable, and not just the const version? Or are we worried that a future collision-resistant implementation of TypeId may not satisfy const, i.e. we want to keep open the possibility of a non-const implementation in the future?

  • [ ] Figure out any safeguards we need to prevent messing with the bits of type id directly
  • How can we prevent abuses of transmuting type ids into integers at compile time (where we have fewer tools to deal with them)? See the discussion in #75923

This seems impossible. If someone wants to transmute() to the type of the private implementation (i.e. u64) in an unsafe code block, even though the documentation for TypeId explicitly says not to do this, can we really stop them? (They can already do it at runtime, not sure what "tools" prevent this.) Can we really stop anyone from doing unsafe things with unsafe code?

Motivation: It would be extremely useful to make const comparisons between TypeId in order to infer whether two types are equal at compile time. Currently this only appears to be possible using specialization (#31844), which is unsound and will not be stabilized anytime soon.

A perhaps related concern (may need a separate tracking issue) is that #derive[(PartialEq, Eq, ...)] struct TypeId does not generate const impl for these traits (see #77695), so even if const fn TypeId::of() gets re-stabilized we still won't be able to compare types at compile time without writing a const impl PartialEq for TypeId, etc.

jmeggitt commented 3 years ago

Can this be re-stabilized now?

@KodrAus Going through the comment that was the basis of reverting stabilization (https://github.com/rust-lang/rust/pull/75923#issuecomment-696676511):

Part 1

TypeId is one a few types (mem::Discriminant is the only other one I can think of) that's a thin wrapper for a private integer, at least right now. This is ripe for abuse no matter what we say/document it as.

Sure, but this suggests that retrieving a TypeId should be unsafe not unstable. Any is currently both stable and safe despite heavily relying on const fn TypeId::of(). If there are serious concerns that const fn TypeId::of() may result in collisions then Any should be unstable as well.

Part 2

I'm not saying we should cater to this, but rather avoid stabilizing even worse abuse-enabling tools like const fn TypeId::of. Like I said, I am already aware of usecases where compile-time transmute(TypeId::of::<T>()) is an uniquely useful tool and someone might not think twice before releasing a crate which uses it somewhere internally and becomes widely-used.

I think @benkay86 summed this up fairly well.

If someone wants to transmute() to the type of the private implementation (i.e. u64) in an unsafe code block, even though the documentation for TypeId explicitly says not to do this, can we really stop them? (They can already do it at runtime, not sure what "tools" prevent this.) Can we really stop anyone from doing unsafe things with unsafe code?

Part 3

And finally, @eddyb remarks that we could potentially re-stabilize it if we are unlikely to change it.

If we decide we're unlikely to change the implementation of TypeId, then we can re-stabilize the const fn aspect, and live with the consequences. But if we do want to change the implementation away from a hash, we have to either do that first, or at least teach miri about it somehow so that it disallows using the value as a plain integer (for arithmetic or e.g. as an array type length).

It has been nearly 10 months since it was marked unstable and there have been no changes regarding this issue. At this point it seems like the implementation of TypeId is unlikely to change any time in the near future.

TL;DR

Plus even if we assume all of these points are true, these concerns only apply to intrinsics::type_id::<T>() -> u64 not const fn TypeId::of::<T>() -> TypeId.

If anything, the heavy reliance on const fn TypeId::of() inside Any suggests that it is stable but could potentially be unsafe to use directly. If there are concerns that TypeId can be abused without unsafe code then const fn TypeId::of() should be marked as unsafe instead. However I do not think this is necessary since you would need to use transmute() to extract the raw type id integer from TypeId.

intrinsics::type_id::<T>() should remain unstable as it provides access to the raw type id integer, but const fn TypeId::of() should be stabilized as it provides adequate protection.

eddyb commented 3 years ago

At this point it seems like the implementation of TypeId is unlikely to change any time in the near future.

Nobody tried AFAIK. 10 months is pretty short on the scale of Rust evolution, especially when you take into account the year 2020.

But also I believe it got stalled because of this potential breakage: https://github.com/rust-lang/rust/pull/75923#issuecomment-699096904

In order to guarantee code that transmutes to u64 gets broken, we can't just replace TypeId with a pointer, but because we still want the 64-bit hash (for both "fast reject (!=)" and Hash, see https://github.com/rust-lang/rust/pull/75923#issuecomment-684917161), we can probably do this:

struct TypeId {
    hash: u64,
    // `str` can be replaced with a length-prefixed string to keep `TypeId`'s size down
    v0_mangled_type: &'static str,
}
impl PartialEq for TypeId {
    fn eq(&self, other: &TypeId) -> bool {
        self.hash == other.hash // fast reject (!=)
        && (
            self.v0_mangled_type as *const str == other.v0_mangled_type as *const str // fast accept (==)
            || self.v0_mangled_type == other.v0_mangled_type
        )
    }
}

That way, it's always more than 8 bytes, no matter the target pointer size, and so a transmute to u64 will always error.

As for the linker concerns, if we take this specific approach, the only difference relying on their features would make is increasingly the situations in which the "fast accept (==)" path gets taken, i.e. an optimization.

I could try make a PR for this, but it would likely at least require https://github.com/rust-lang/rust/pull/87194 (to avoid breaking when unstable const generic features are used).

If there are concerns that TypeId can be abused without unsafe code then const fn TypeId::of() should be marked as unsafe instead.

But we don't want that. You can't change TypeId or Any to be unsafe, they're stable, and nearly-sound.

programmerjake commented 3 years ago
struct TypeId {
    hash: u64,
    // `str` can be replaced with a length-prefixed string to keep `TypeId`'s size down
    v0_mangled_type: &'static str,
}
impl PartialEq for TypeId {
    fn eq(&self, other: &TypeId) -> bool {
        self.hash == other.hash // fast reject (!=)
        && (
            self.v0_mangled_type as *const str == other.v0_mangled_type as *const str // fast accept (==)
            || self.v0_mangled_type == other.v0_mangled_type
        )
    }
}

Maybe it would be better to use this to shrink TypeId to only 16 bytes on 64-bit platforms:

use core::{ptr, slice};

#[repr(C)]
struct StrWithLen<T: ?Sized> {
    len: u16, // assume all TypeId strings are <64kB
    text: T,
}

pub struct TypeId {
    hash: u64,
    text: &'static StrWithLen<()>, // actually StrWithLen<str>
}

impl PartialEq for TypeId {
    fn eq(&self, rhs: &Self) -> bool {
        if self.hash != rhs.hash {
            false
        } else if ptr::eq(self.text, rhs.text) {
            true
        } else {
            unsafe {
                let lhs_text = slice::from_raw_parts(
                    &self.text.text as *const _ as *const u8,
                    self.text.len as usize,
                );
                let rhs_text = slice::from_raw_parts(
                    &rhs.text.text as *const _ as *const u8,
                    rhs.text.len as usize,
                );
                lhs_text == rhs_text
            }
        }
    }
}
eddyb commented 3 years ago

Yeah, that's what I meant by "length-prefixed". There's a "standard" way to do that using extern { type }, I just didn't want to type it in.

eddyb commented 2 years ago

Update: finally opened #95845 with a (v0) mangling-based TypeId representation (should unblock const fn TypeId::of, while at the same time making specific uses of TypeId incompatible with CTFE, for now).

lcnr commented 2 years ago

blocking the stabilization of this on #97156

zakarumych commented 2 years ago

Why would anyone use transmute to extract integer value from TypeId? Use Hash::hash to avoid unsafety :)

I know that Hash implementation for TypeId does not guarantee to emit inner value from TypeId unmodified, but transmuting opaque type is no better.

And why it is a problem anyway? I thought that transmuting arbitrary u64 back to TypeId is.

eddyb commented 2 years ago

I'd say this is blocked on #99189 for now, if for no other reason than the "structural match" aspect. (i.e. future-proofing TypeId by disallowing using TypeId constants as patterns and requiring ==)

soqb commented 2 years ago

to work around #97156 and #99189, surely we can just add a new const unsafe fn const_of() -> TypeId to TypeId or even a new ConstTypeId type that uses the v0 mangled type. it seems the only problem with making TypeId::of() const is that it must be either non-const and safe or const and unsafe and since making a method unsafe is a breaking change, we cannot make it const. it seems obvious to me just to add a new const unsafe fn method and/or type that can explicitly warn the user of its soundness holes and potentially deprecate TypeId::of().

jmeggitt commented 2 years ago

I'd say this is blocked on #99189 for now, if for no other reason than the "structural match" aspect. (i.e. future-proofing TypeId by disallowing using TypeId constants as patterns and requiring ==)

@eddyb What is the reasoning behind disallowing a structural match? While I can not name any specific reasons for its inclusion, I also can not think of any reasons to remove it either.

Take for example this code. It does not perform any complex pattern matching or require that a match statement be used, but it arguably makes the code a bit cleaner and easier to read than the alternative.

trait ConstTypeId {
    const ID: TypeId;
}

impl<T: 'static> ConstTypeId for T {
    const ID: TypeId = TypeId::of::<T>();
}

match foo.type_id {
    u16::ID | u32::ID | u64::ID => do_foo(&foo.data),
    Foo::ID | Bar::ID | Baz::ID => do_bar(&foo.data),
    _ => do_default(&foo.data),
}

This would generally serve the use case of "I should probably be using an enum, but there are a lot of variants and it is a pain to write it out". I'm partially joking, but there are some use cases where an enum can be more cumbersome such as when dealing with highly generic systems and associating resources with anonymous traits. That being said, I am unsure if those systems actually make use of use a match as to ==.

eddyb commented 2 years ago

The problem is any future changes to TypeId that would require address-based comparison in a custom PartialEq implementation.

(See all previous discussion across this issue and my two closed PRs on the matter of non-trivial TypeId representations)

Honestly, our structural match rules are very loose and prone to bite us eventually. What that is means that in principle no core/alloc/std type with private and unguaranteed internals should auto-derive PartialEq, Eq and provide a compile-time constructor, because that "locks in" the fact that the impls are auto-derived, i.e. that all fields themselves are Eq.

It's not a clear independent opt-in, for ergonomics reasons, but that also makes it forwards-incompatable to "let it happen". (This applies to any crates but the standard library doesn't get to just release a new semver-incompat version, and editions offer only limited protection, and tend to have 3yr timeframes)

EDIT: oh, another note on structural match: I believe it was never an intentionally designed feature, just match falling back to (even overloaded!) ==, from pre-1.0 days, that we didn't remove in time, so we had to paper over it with whatever semantics let most existing code keep compiling (IIRC we only took a stance on floats and even that took many years).

yshui commented 1 year ago

This would generally serve the use case of "I should probably be using an enum, but there are a lot of variants and it is a pain to write it out". but there are some use cases where an enum can be more cumbersome such as when dealing with highly generic systems and associating resources with anonymous traits.

I actually am just writing such a system, and that's how I found out about this issue. And given the prospect that structural matching of TypeId could be gone in the future, I ended up using a set of macros to generate a big enum and a set of helper function to make working with it easier.

Not arguing for keeping it, just want to provide a data point.

benkay86 commented 1 year ago

It seems const fn type_id() is going to take a long time to stabilize due to many concerns about soundness and corner cases of people doing something with the returned TypeId other than testing equality against another TypeId. Would it be possible instead to make Any::is<T>(&self) const? This way all the internals of TypeId stay inside the standard library.

KodrAus commented 1 year ago

I think a const-enabled Any::is that hid the specific equality mechanism would satisfy my use-case for const type ids too.

@eddyb did we have any lingering objection to being able to do this kind of type inspection at all in CTFE, or was it just the issue of structural match on TypeId that was the blocker?

zakarumych commented 1 year ago

I guess that most of the other use-cases would be covered by allowing TypeId::of::<T>() in const context, without allowing to do anything with it but assigning to a constant.

Pr0methean commented 1 year ago

Looks like TypeId suddenly stopped implementing StructuralEq. Why? I was previously able to use this function without a problem:

const PIXMAP_TYPE_ID: TypeId = TypeId::of::<Pixmap>();
const MASK_TYPE_ID: TypeId = TypeId::of::<Mask>();
const VEC_U8_TYPE_ID: TypeId = TypeId::of::<Vec<u8>>();

fn type_name_for_id(id: TypeId) -> &'static str {
    match id {
        PIXMAP_TYPE_ID => "Pixmap",
        MASK_TYPE_ID => "Mask",
        VEC_U8_TYPE_ID => "Vec<u8>",
        _ => "Unknown"
    }
}
madsmtm commented 1 year ago

Removing structural match for TypeId was done in https://github.com/rust-lang/rust/pull/103291 in part to unblock exactly this feature, see the discussion above for reasoning behind the removal (shortened in https://github.com/rust-lang/rust/issues/77125#issuecomment-1221451165).

Pr0methean commented 1 year ago

Would it be possible to restore backward compatibility by having the compiler convert a match with TypeId constants into an else if chain when there were only a handful of branches, and a lookup in a const std::collections::HashMap otherwise?

oli-obk commented 1 year ago

Yes, but that will require stabilization of const_trait_impls first, as otherwise we can't call PartialEq methods inside the const fn.

KodrAus commented 1 year ago

Most discussion about type ids in const contexts is happening here (naturally, since there's nothing to do with a type id except compare them), but there's also #101871 tracking using == in const contexts.

briansmith commented 10 months ago

I think https://github.com/rust-lang/rust/issues/10389 needs to be resolved first, becuase depending on the resolution, it may not be possible to calculate the TypeId at compile time; it might end up like C++ std::type_index that uses the address of the RTTI structure as the ID.

oli-obk commented 10 months ago

We can compute and equate type ids that are addresses just fine at compile time. You just can't inspect the address (so no sorting type ids, no hashing them, no printing them)

tgross35 commented 8 months ago

blocking the stabilization of this on #97156

This was recently marked fixed with https://github.com/rust-lang/rust/pull/118247

Edit: started a discussion https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Const.20.60TypeId.3A.3Aof.60.2C.20.60TypeId.3A.3Amatches.60.2C.20.60type_name.60.2C

Edit2: From the discussion, this is still blocked by the possibility of collision https://github.com/rust-lang/rust/issues/10389

avhz commented 3 months ago

Any further updates on this ?

bjorn3 commented 3 months ago

Still blocked on https://github.com/rust-lang/rust/issues/10389.