Tracking Issue for pointer metadata APIs

KodrAus commented 3 years ago

This is a tracking issue for the RFC 2580 "Pointer metadata & VTable" (rust-lang/rfcs#2580). The feature gate for the issue is #![feature(ptr_metadata)].

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

[x] Implement the RFC (cc @rust-lang/libs @rust-lang/lang -- can anyone write up mentoring instructions?)
[ ] Adjust documentation (see instructions on rustc-dev-guide)
[ ] Stabilization PR (see instructions on rustc-dev-guide)

Unresolved Questions

Language-level:

Is it, or should it be UB (through validity or safety invariants) to have a raw trait object wide pointer with an dangling vtable pointer? A null vtable pointer? If not, DynMetadata methods like size may need to be unsafe fn. Or maybe something like *const () should be metadata of trait objects instead of DynMetadata.
Right now, there is some inconsistency here: size_of_val_raw(ptr) is unsafe, but metadta(ptr).size_of() does the same thing and is safe.
should Metadata be required to be Freeze

API level:

Is *const () appropriate for the data component of pointers? Or should it be *const u8? Or *const Opaque with some new Opaque type? (Respectively *mut () and NonNull<()>)
Should ptr::from_raw_parts and friends be unsafe fn?
Should Thin be added as a supertrait of Sized? Or could it ever make sense to have fat pointers to statically-sized types?
Should DynMetadata not have a type parameter? This might reduce monomorphization cost, but would force that the size, alignment, and destruction pointers be in the same location (offset) for every vtable. But keeping them in the same location is probaly desirable anyway to keep code size small.

API bikesheds:

Name of new items: Pointee (v.s. Referent?), Thin (ThinPointee?), DynMetadata (VTablePtr?), etc
Location of new items in core::ptr. For example: should Thin be in core::marker instead?

Implementation history

[ ] #81172 Initial implementation

Tracked APIs

Last updated for https://github.com/rust-lang/rust/pull/81172.

pub trait Pointee {
    /// One of `()`, `usize`, or `DynMetadata<dyn SomeTrait>`
    type Metadata;
}

pub trait Thin = Pointee<Metadata = ()>;

pub const fn metadata<T: ?Sized>(ptr: *const T) -> <T as Pointee>::Metadata {}

pub const fn from_raw_parts<T: ?Sized>(*const (), <T as Pointee>::Metadata) -> *const T {}
pub const fn from_raw_parts_mut<T: ?Sized>(*mut (), <T as Pointee>::Metadata) -> *mut T {}

impl<T: ?Sized> NonNull<T> {
    pub const fn from_raw_parts(NonNull<()>, <T as Pointee>::Metadata) -> NonNull<T> {}

    /// Convenience for `(ptr.cast(), metadata(ptr))`
    pub const fn to_raw_parts(self) -> (NonNull<()>, <T as Pointee>::Metadata) {}
}

impl<T: ?Sized> *const T {
    pub const fn to_raw_parts(self) -> (*const (), <T as Pointee>::Metadata) {}
}

impl<T: ?Sized> *mut T {
    pub const fn to_raw_parts(self) -> (*mut (), <T as Pointee>::Metadata) {}
}

/// `<dyn SomeTrait as Pointee>::Metadata == DynMetadata<dyn SomeTrait>`
pub struct DynMetadata<Dyn: ?Sized> {
    // Private pointer to vtable
}

impl<Dyn: ?Sized> DynMetadata<Dyn> {
    pub fn size_of(self) -> usize {}
    pub fn align_of(self) -> usize {}
    pub fn layout(self) -> crate::alloc::Layout {}
}

unsafe impl<Dyn: ?Sized> Send for DynMetadata<Dyn> {}
unsafe impl<Dyn: ?Sized> Sync for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Debug for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Unpin for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Copy for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Clone for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Eq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialEq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Ord for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialOrd for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Hash for DynMetadata<Dyn> {}

SimonSapin commented 2 years ago

There needs to be some pointer or reference that is "unsized" to a trait object at some point in order for the appropriate vtable to be generated. I think that can be a raw pointer with null data component: std::ptr::metadata::<dyn Trait>(std::ptr::null<T>())

kupiakos commented 2 years ago

What if the Metadata for trait objects were MaybeUninit<DynMetadata<dyn Trait>>? That way in order to use it you would have to first go through an unsafe assume_init while still allowing safe access to raw slice length data.

fogti commented 2 years ago

How should the code determine if it is ok to .assume_init()? If there is an automated way to do so, I think Option<DynMetadata<dyn Trait>> might be a better idea (if this is still insufficient, maybe something more opaque should be returned which allows inspection, although DynMetadata should already have that "role")

kupiakos commented 2 years ago

How should the code determine if it is ok to .assume_init()?

You shouldn't be able to for raw pointers - simple as that. The information necessary to do so isn't available at runtime since the underlying type of the trait object is not known. The state of thin raw pointers in Rust is that it is always safe to construct one, but in order to use/dereference it, you need to have unsafe somewhere to assert that the invariants of the type are being upheld (the simplest being "this has to be a valid memory location", but this includes other rules like "is the bit pattern at this location valid for type T"). When you call the safe methods on DynMetadata, you're implicitly asserting that the DynMetadata is valid by dereferencing the internal pointer, when it should be explicit because it make invoke UB.

By calling assume_init, you declare that you're upholding the invariants of DynMetadata and as such its safe methods can be called. Otherwise, you can cause UB based on whether the raw pointer is legit as shown in https://github.com/rust-lang/rust/issues/81513#issuecomment-798158332. With the change, it'd end up looking like:

let ptr: *const dyn Send = make_weird_raw_ptr();
let meta: MaybeUninit<DynMetadata<dyn Send>> = metadata(ptr);
// I'm asserting that the `DynMetadata` is valid, but I violated the invariant that it's derived from valid metadata!
// This causes UB, but that's expected since I didn't uphold the invariants of unsafe code.
let size = unsafe { meta.assume_init() }.size();

That invariant is currently phrased in ptr::from_raw_parts as:

For trait objects, the metadata must come from a pointer to the same underlying erased type.

This is what states, but there is no unsafe to for the programmer to assert this invariant anywhere, which breaks a common assumption Rust programmers make about raw pointers: safe operations won't dereference them without some declaration or check that invariants are upheld. I personally like this language a bit better:

For trait object pointers to dyn Trait with an underlying type T, the metadata must have been derived from a valid dyn Trait reference of the same underlying type, such as what is returned by ptr::metadata(&T as &dyn Trait).

I believe that should be sufficient, since you can't create a raw pointer to a dyn Trait without first going through an intermediate reference, which ensures the vtable exists. This is a more strict version, requiring that vtables can't ever be duplicated, which we may not want to guarantee:

Given any two wide pointers to trait objects dyn Trait of the same underlying type, the DynMetadata<dyn Trait> metadata must be identical.

The only way to assert that the metadata is valid automatically is if it's a reference, since that's an invariant of references. This is why @RalfJung suggested having a safe version of from_raw_parts for references and I presume an unsafe version for raw pointers. That may end up being the most tractable option.

Anyways, this is a long way to say that raw pointers, wide and thin, shouldn't have any invariants on them - that's why you use raw pointers after all! It's when using the pointers where the invariants should kick in, and what the assume_init is meant to represent.

RalfJung commented 2 years ago

This is what states, but there is no unsafe to for the programmer to assert this invariant anywhere

Well, there is -- when dereferencing the raw pointer.

Anyways, this is a long way to say that raw pointers, wide and thin, shouldn't have any invariants on them - that's why you use raw pointers after all! It's when using the pointers where the invariants should kick in, and what the assume_init is meant to represent.

There IMO should be one invariant even on thin raw pointers: they must not be uninit. IOW, I think that MaybeUninit::<*const u8>::uninit().assume_init() should be UB (and similar for uninit integers). Also see https://github.com/rust-lang/unsafe-code-guidelines/issues/71.

So I don't think using MaybeUninit for the metadata is a good idea, since I don't think we want to allow literally uninit memory in metadata.

kupiakos commented 2 years ago

This is what states, but there is no unsafe to for the programmer to assert this invariant anywhere

Well, there is -- when dereferencing the raw pointer.

Yes, but dereferencing a raw pointer requires an unsafe block. metadata(dyn_ptr).layout() as-is does not, and that's the problem.

So I don't think using MaybeUninit for the metadata is a good idea, since I don't think we want to allow literally uninit memory in metadata.

I agree. I initially chose MaybeUninit<T> because it is a common way to represent a T that may have an invalid bitfield that can be asserted valid through an unsafe gate, like Pin has with its Drop guarantee. Of course, it's actually the canonical way to represent uninitialized data and the typed way to represent that safely. Especially because it'd be in the standard library, it would imply that raw wide pointers may contain uninitialized data, and they should not be able to, as integer-represented types.

Should there just be a wrapper type then? <dyn Trait as Pointee>::Metadata then becomes:

#[derive(Clone, Copy, Debug, Eq, Hash, Ord, PartialOrd)]
pub struct RawDynMetadata<T: ?Sized>(DynMetadata<T>);
impl<T: ?Sized> RawDynMetadata<T> {
  // or some better name
  /// # Safety
  /// - `T` must be a trait object.
  /// - This `RawDynMetadata` must have been derived from the metadata of a valid reference to `T`.
  pub unsafe fn assume_valid(self) -> DynMetadata<T> {
    self.0
  }
}

A nice advantage is that is it still safe for slice metadata on raw pointers, especially since we don't have #71146 yet. One can also already safely store and use the accessible metadata of a trait object reference with Layout::for_value.

RalfJung commented 2 years ago

Yes, but dereferencing a raw pointer requires an unsafe block. metadata(dyn_ptr).layout() as-is does not, and that's the problem.

Could you specify what exactly the problem is? The current API is sound, i.e., you will (to my knowledge) not get UB here without using unsafe code. So it must be some other property you are looking for that is violated.

However, there is indeed an inconsistency here in that size_of_val_raw is unsafe, but using metadata().size_of() one can implement the same thing entirely in safe code.

SimonSapin commented 2 years ago

Has the lang team formally decided what validity invariants *const dyn Trait has? (Or more generally, pointer metadata) I feel that should be the starting point.

WaffleLapkin commented 2 years ago

Currently, Pointee::Metadata is bound by the following traits: Copy, Send, Sync, Ord, Hash, Unpin. While all of them are required, they are not actually sufficient for being Metadata. The compiler must also be able to know the size (and align) of the pointed value to generate code for std::mem::size_of_val (align_of_val).

Shouldn't Metadata be bound by a trait that would provide such functions? If we ever want to support DSTs with custom metadata, this seems required

pub trait Pointee {
    #[lang = "metadata_type"]
    type Metadata: Copy + Send + Sync + Ord + Hash + Unpin + const PointerMetadata<Self>;

    // P.S. I'm not sure if `const Trait` bounds are currently possible, 
    //      we may need to wait until they are implemented, before stabilizing this feature.
}

pub unsafe trait PointerMetadata<Target> {
    fn size_of_val(val: &Target) -> usize;

    fn align_of_val(val: &Target) -> usize;
}

Example implementations

```rs // Naming is bikeshadable #[non_exhaustive] pub struct SizedMetadata; // Compiler can probably have a fast-path for sized types/slices/strs/trait objects to lower costs unsafe impl const PointerMetadata for SizedMetadata { fn size_of_val(: &T) -> usize { mem::size_of::() } fn align_of_val(_: &T) -> usize { mem::align_of::() } } pub struct SliceLen(usize); unsafe impl const PointerMetadata<[T]> for SliceLen { fn size_of_val(val: &[T]) -> usize { let Self(len) = metadata(val); len * mem::size_of::() } fn align_of_val(_: &[T]) -> usize { mem::align_of::() } } pub struct StrLen(usize); unsafe impl const PointerMetadata for StrLen { fn size_of_val(val: &str) -> usize { let this = metadata(val); this.0 } fn align_of_val(_: &str) -> usize { 1 } } unsafe impl const PointerMetadata for DynMetadata { fn size_of_val(val: &T) -> usize { let this = metadata(val); this.size_of() } fn align_of_val(val: &T) -> usize { let this = metadata(val); this.size_of() } } // Theoretical #[non_exhaustive] pub struct ThinByteSliceMetadata; // ThinByteSlice is a custom DST type (usize, [u8]) unsafe impl const PointerMetadata for ThinByteSliceMetadata { fn size_of_val(val: &ThinByteSlice) -> usize { // Safety: `ThinByteSlice` is guara;teed to have `len: usize` as it's first field unsafe { *val.to_raw_parts().0.cast::() } } fn align_of_val(val: &ThinByteSlice) -> usize { mem::align_of::() } } ```

Note that size_of_val_raw and align_of_val_raw currently prohibit calling themselves with a pointee type that doesn't have a slice, trait object or extern type as its last field, so it's ok to require a reference in PointerMetadata methods.

While such trait allows being implemented for usize/()/etc instead of new types I would strongly argue against this, as it makes extensibility harder and adds weird methods like usize::align_of_val.

An extension that could make `align_of_val_raw` safe(r?)

```rs // Implemented for SliceLen<_>, StrLen<_>, DynMetadata<_>, SizedMetadata<_> pub unsafe trait PointerMetadataThatDoesNotNeedReference: PointerMetadata { fn size_of_val_raw(val: *const Target) -> usize; fn align_of_val_raw(val: *const Target) -> usize; } // std::mem pub /* unsafe ? */ const fn align_of_val_raw(val: *const T) -> usize where T: ?Sized ::Metadata: ~const PointerMetadataThatDoesNotNeedReference { ::Metadata::align_of_val_raw(val) } // same for size ```

The only downside of this proposal that I see is that the compiler would be forced to generate PointerMetadata impls like this one:

struct S<T: ?Sized> {
    a: A,
    tail: T,
}

// A lot of compiler magic required to actually support this
unsafe impl<T: ?Sized> PointerMetadata<S<T>> for T::Metadata {
    fn size_of_val(val: &Target) -> usize {
        mem::size_of::<A>() + T::Metadata::size_of_val(val.tail)
    }

    fn align_of_val(val: &Target) -> usize {
        cmp::max(mem::align_of::<A>(), T::Metadata::align_of_val(val.tail))
    }
}

Or otherwise, we lose the guarantee that metadata of (..., T) has the same type as metadata of T.

I think there should be a way around this, but I can't quite see how we can make this part better...

WaffleLapkin commented 2 years ago

Actually, now that I've thought about size_of_val/align_of_val functions a little bit more, I may have an idea how to remove the downside from https://github.com/rust-lang/rust/issues/81513#issuecomment-976781622.

The solution is just to inverse dependencies. Instead of SliceLen<_> implementing PointerMetadata, size_of_val/align_of_val should be implemented by the type itself. This would allow the compiler-generated impl to be a lot less magical and even allowed by the current coherent rules/etc.

pub trait Pointee {
    #[lang = "metadata_type"]
    type Metadata: Copy + Send + Sync + Ord + Hash + Unpin;

    #[rustc_only_trait_resolvable]
    const fn size_of_self(&self) -> usize;

    #[rustc_only_trait_resolvable]
    const fn align_of_self(&self) -> usize;

    // P.S. `const fn` in traits aren't currently supported, so this isn't implementable yet
}

(#[rustc_only_trait_resolvable] is a theoretical annotation similar to #[rustc_skip_array_during_method_dispatch] that disables x.size_of_self() and T::size_of_self(x) to be resolved, while only allowing Pointee::size_of_self(x) or <T as Pointee>::size_of_self(x))

This design actually seems a lot cleaner and simpler than the one I've previously proposed while still allowing for future extension with custom DSTs.

SimonSapin commented 2 years ago

they are not actually sufficient for being Metadata

The compiler has no need to be able to work with arbitrary impl Pointee for $Something {…} custom definitions. In Rust 1.56 such an impl is always disallowed because the compiler already generates impls of Pointee for all types.

size_of_val is not limited to methods available through the Pointee trait (in the way generic library code would be) since it is a compiler intrinsic that can have special cases for all "kinds" of types supported by that compiler: arrays, trait objects, etc.

If the language is to ever gains support for custom DSTs (I’m not sure this is a necessity), then the RFC proposing to add them will need to define a mechanism for how size_of_val should work for the new kind(s) of types. Maybe that would involve new methods in the Pointee trait, maybe not.

I believe that stabilizing the Pointee trait as-is does not prevent adding those methods later if needed for custom DSTs, because no custom impl of Pointee is allowed today.

matthieu-m commented 2 years ago

Experience report, and the lack of CoerceUnsized.

I ported this afternoon the storage-poc repository from the rfc2580 crate to the core::ptr::Pointee, and it was a painless experience.

The resulting code is cleaner, thanks to the integration of from_raw_parts and to_raw_parts with *const T and NonNull<T>, and works just as well.

The one slight disappointment I have is that the CoerceUnsized situation is unfortunately not solved. That is, if we look at the RawBox type, where S::Handle<T> would be NonNull<T> for a regular allocator, and is (T::Metadata) in the test case at line #159:

pub struct RawBox<T: ?Sized + Pointee, S: SingleElementStorage> {
    storage: ManuallyDrop<S>,
    handle: S::Handle<T>,
}

impl<T, U, S> CoerceUnsized<RawBox<U, S>> for RawBox<T, S>
    where
        T: ?Sized + Pointee,
        U: ?Sized + Pointee,
        S: SingleElementStorage,
        S::Handle<T>: CoerceUnsized<S::Handle<U>>,
{
}

#[test]
fn slice_storage() {
    let storage = SingleElement::<[u8; 4]>::new();
    let mut boxed: RawBox<[u8], _> = RawBox::new([1u8, 2, 3], storage).unwrap().coerce();

    assert_eq!([1u8, 2, 3], &*boxed);

    boxed[2] = 4;

    assert_eq!([1u8, 2, 4], &*boxed);
}

We would hope that the call to coerce() is unnecessary, as we would expect that the [u8; 3]::Metadata could be coerced into [u8]::Metadata, however it is not the case.

It is not clear, to me, if this the issue comes from [u8; 3]::Metadata == (), but the plain fact is that we have a Box that is not automatically coerced which is a slight blow to usability.

SimonSapin commented 2 years ago

Sorry, I don’t quite follow as there seems to be a lot of context involved as to what traits exist in storage-poc. However isn’t this an issue with CoerceUnsized unrelated to pointer metadata?

By the way explicit T: Pointee bounds should be unnecessary. The trait resolver has built-in knownledge that T: Pointee for any T, so the associated type can be used without a bound. For example:

https://github.com/rust-lang/rust/blob/84f962a89bac3948ed116f1ad04c2f4793fb69ea/library/core/src/ptr/metadata.rs#L93

matthieu-m commented 2 years ago

Sorry, I don’t quite follow as there seems to be a lot of context involved as to what traits exist in storage-poc. However isn’t this an issue with CoerceUnsized unrelated to pointer metadata?

It's not clear to me where the issue lies, since it comes up with the interaction of the two features. Here is a reduced example on the playground:

#![feature(coerce_unsized)]
#![feature(ptr_metadata)]

use core::{ops::CoerceUnsized, ptr::Pointee};

struct Handle<T: ?Sized>(<T as Pointee>::Metadata);

impl<T, U> CoerceUnsized<Handle<U>> for Handle<T>
    where
        T: ?Sized,
        U: ?Sized,
        <T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>,
{
}

fn main() {
    //let _: Box<[u8]> = Box::new([1u8, 2, 3]);
    let _: Handle<[u8]> = Handle::<[u8; 3]>(());
}

The Box line compiles, the Handle line doesn't.

If we dig further, we see that Metadata doesn't properly implement CoerceUnsized:

#![feature(coerce_unsized)]
#![feature(ptr_metadata)]

use core::{ops::CoerceUnsized, ptr::Pointee};

struct Foo<T: CoerceUnsized<<[u8] as Pointee>::Metadata> >(T);

fn foo(_: Foo<<[u8; 3] as Pointee>::Metadata>) {}

Fails with:

error[E0277]: the trait bound `(): CoerceUnsized<usize>` is not satisfied
  --> src/main.rs:18:11
   |
18 | fn foo(_: Foo<<[u8; 3] as Pointee>::Metadata>) {}
   |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `CoerceUnsized<usize>` is not implemented for `()`
   |
note: required by a bound in `Foo`
  --> src/main.rs:16:15
   |
16 | struct Foo<T: CoerceUnsized<<[u8] as Pointee>::Metadata> >(T);
   |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `Foo`

And given that <[u8; 3] as Pointee::Metadata> is just (), I don't see how it could, actually, implement CoerceUnsized properly, so I am tempted to think that the problem is here.

By the way explicit T: Pointee bounds should be unnecessary. The trait resolver has built-in knownledge that T: Pointee for any T, so the associated type can be used without a bound.

Ah nice! Thanks for the hint.

RalfJung commented 2 years ago

A CoerceUnsized bound on the metadata does not really make any sense. It's not the metadata that gets unsiezd, after all. The metadata is produced during unsizing. E.g. when unsizing [u32; 3] to [u32] the metadata 3 is produced. But nowhere are we "unsizing" the original metadata type () to the new metadata type usize.

This feels like an XY-problem to me, so maybe try explaining what it is you want to achieve with the metadata APIs here (instead of how you want to achieve that).

matthieu-m commented 2 years ago

This feels like an XY-problem to me, so maybe try explaining what it is you want to achieve with the metadata APIs here (instead of how you want to achieve that).

It may well be!

I pointed to the storage-poc crate to provide the background; the idea of the crate is that instead of storing a pointer (NonNull<T>), one stores a handle (Handle<T>) which may or may not be a pointer under the hood.

In the case of the RawBox<T, S> example above, it uses inline storage (an embedded array of bytes) in which the handle "points". The handle is not a pointer, though, as the box can be moved around, and since a box only ever stores a single element, the handle doesn't need any index or anything. It just needs the pointee metadata.

Therefore, in the example above, the handle is Handle<T>(<T as Pointee>::Metadata), and the box is RawBox<T, S>(S, S::Handle<T>).

The goal is for RawBow to be coercible. At the moment, the rules for implementing CoerceUnsized requires that the last field be CoerceUnsized and do not allow any computation -- it just magically happens -- and therefore I conclude that <T as Pointee>::Metadata must be CoerceUnsized, as this is necessary for Handle<T> to be CoerceUnsized, which in turn is necessary for RawBox<T, S> to be CoerceUnsized.

I may very well be misunderstanding the rules, though, and of course there is the option of allowing custom logic in CoerceUnsized although this seems overkill to me.

RalfJung commented 2 years ago

My first impression is that the Rust unsizing system is simply not up to the task you are asking for here. Rust unsizing only works on pointers, and this is hard-coded pretty deeply in the compiler. You would need that system to be made more flexible so that one can talk about just the metadata generation part of the unsizing coercion without the part where that metadata is used to create a new wide pointer. Presumably such an extension of the unsizing system should suitably interact with the metadata APIs tracked in this issue, but that extension goes way beyond the scope of the metadata APIs.

At the moment, the rules for implementing CoerceUnsized requires that the last field be CoerceUnsized

This is because that last field is the one that becomes unsized during the coercion (e.g. when RcBox<[u32; 3]> is coerced to RcBox<[u32]>). That's not what happens in your case so going further down this track won't lead anywhere.

matthieu-m commented 2 years ago

Presumably such an extension of the unsizing system should suitably interact with the metadata APIs tracked in this issue, but that extension goes way beyond the scope of the metadata APIs.

Thanks, that's very helpful. This means I'm not doing anything wrong and it's just not supported today (at all).

The next question, then, is: Does the current implementation, using () as metadata for Sized types, allow such an extension of the unsizing system?

Or, more prosaically: Are we comfortable stabilizing () as metadata for Sized types, knowing that it likely closes the door to ever implementing CoerceUnsized for it?

(The same question does not immediately apply to usize as metadata for slice types, as those cannot be unsized further)

Simon had expressed concerns about a strongly-typed metadata piece for Sized types due to the amount of types this could explode into, however I note that the current implementation seems to use a strong-typed metadata piece for trait types already, so it's not clear if this is still a concern.

RalfJung commented 2 years ago

Or, more prosaically: Are we comfortable stabilizing () as metadata for Sized types, knowing that it likely closes the door to ever implementing CoerceUnsized for it?

I don't think we would ever want to implement CoerceUnsized for the metadata of Sized types. It's not the metadata that is being unsized, after all! So IMO that would just make no sense.

However I should also add that I do not consider myself an unsizing expert. I have no idea what it would take to support types like yours.

nbdd0121 commented 2 years ago

Metadata unsizing could be done today:

#![feature(unsize)]
#![feature(ptr_metadata)]

use core::marker::Unsize;
use core::ptr::Pointee;

fn unsize_metadata<T: ?Sized, U: ?Sized>(t: <T as Pointee>::Metadata) -> <U as Pointee>::Metadata
where
    T: Unsize<U>,
{
    (core::ptr::from_raw_parts::<T>(core::ptr::null(), t) as *const U)
        .to_raw_parts()
        .1
}

fn main() {
    let len = unsize_metadata::<[u8; 3], [u8]>(());
    println!("{}", len);
}

So we could have a new-type wrapping around metadata to create a "strongly typed" metadata:

#![feature(unsize)]
#![feature(ptr_metadata)]

use core::marker::Unsize;
use core::ptr::Pointee;

struct TypedMetadata<T: ?Sized>(pub <T as Pointee>::Metadata);

impl<T> TypedMetadata<T> {
    fn of() -> Self {
        TypedMetadata(core::ptr::null::<T>().to_raw_parts().1)
    }
}

impl<T: ?Sized> TypedMetadata<T> {
    fn unsize<U: ?Sized + Unsize<U>>(self) -> TypedMetadata<U> {
        TypedMetadata(
            (core::ptr::from_raw_parts::<T>(core::ptr::null(), self.0) as *const U)
                .to_raw_parts()
                .1,
        )
    }
}

// This couldn't be done today, but we could support this in compiler similar to pointers.
// impl<T: ?Sized + Unsize<U>, U: ?Sized> CoerceUnsized<TypedMetadata<U>> for TypedMetadata<T> {}

fn main() {
    let len = TypedMetadata::<[u8; 3]>::of().unsize::<[u8]>().0;
    println!("{}", len);
}

SimonSapin commented 2 years ago

This is not metadata that is being "unsized". Unsizing means going from (a pointer/reference to) a statically-sized type to (a pointer/reference to) a dynamically-sized type.

In your example, *const [u8; 3] is unsized to *const [u8], where [u8] is dynamically-sized. Your unsize_metadata function then extracts the metadata of that unsized pointer. It also relies on ptr::null() which has an implicit T: Sized bound.

Implementing the CoerceUnsized trait for metadata types (even if more-strongly-typed) does not make sense. Maybe this "return pointer metadata after unsizing null()" operation can be useful but it’s a different operation from "unsize this pointer or reference" and shouldn’t be shoehorned into CoerceUnsized.

nbdd0121 commented 2 years ago

TypedMetadata could be understood as a ZST pointer to null.

SimonSapin commented 2 years ago

That’s the kind of thing I mean by "shoehorning". Maybe not impossible but I don’t think it’s a good idea because that’s just not what CoerceUnsized means / is for.

matthieu-m commented 2 years ago

Implementing the CoerceUnsized trait for metadata types (even if more-strongly-typed) does not make sense. Maybe this "return pointer metadata after unsizing null()" operation can be useful but it’s a different operation from "unsize this pointer or reference" and shouldn’t be shoehorned into CoerceUnsized.

This is fair, but leaves us with the problem unsolved.

A great benefit of the ability to split/join a pointer and its associated metadata is the creation of custom handles which contain "something" (not necessarily a pointer) and the associated metadata, and today such custom handles cannot implement CoerceUnsized making them less ergonomic than the language-supported pointers they emulate.

Implementing CoerceUnsized for metadata would solve the issue quite naturally and offer the side benefit that no user-written (with arbitrarily complex logic) would run during coercion.

Other possibilities include having a specific lang-item typed metadata for this situation, which itself implements CoerceUnsized, perhaps a Null<T> (as a parallel with NonNull<T>), which I guess would be a different proposal and have your preference?

Kixunil commented 2 years ago

I'm writing a new crate with a bunch of unsafe code that needs to avoid core::mem::swap() so I had to opt into creating my own struct RefMut<'a> (the real signature is a bit more complicated but that's not important here). I'd prefer this to be custom DST to not need a reborrow() method and other things.

However my metadata happens to be &'same_lifetime_as_t mut Meta. It'd be much nicer to have this as a true reference, not a pointer so that the compiler can take advantage of optimizations and maybe it could provide additional checking. The Copy bound prevents this though. Thinking about this, I have some ideas on how to solve this without causing problems around lack of Copy:

pub trait Pointee {
    type Metadata: Copy + Send + Sync + Ord + Hash + Unpin;
    type MetadataMut: CopyOrReborrow + Send + Sync + Ord + Hash + Unpin + Into<Self::Metadata>;
}

pub trait CopyOrReborrow {
    type Output<'a>; // maybe some funny lifetime bounds
    fn copy_or_reborrow(&mut self) -> Self::Output;
}

impl<T: Copy> CopyOrReborrow for T {
    type Output<'a> = T;

    fn copy_or_reborrow(&mut self) -> Self::Output {
        *self
    }
}

impl<T> CopyOrReborrow for &'_ mut T {
    type Output<'a> = &'a mut T;

    fn copy_or_reborrow(&mut self) -> Self::Output {
        &mut *self
    }
}

fn metadata<T: Pointee>(value: &T) -> T::Metadata { ... }

fn metadata_mut<T: Pointee>(value: &mut T) -> T::MetadataMut { ... }

I believe that stabilizing the Pointee trait as-is does not prevent adding those methods later if needed for custom DSTs, because no custom impl of Pointee is allowed today.

By the way explicit T: Pointee bounds should be unnecessary. The trait resolver has built-in knownledge that T: Pointee for any T, so the associated type can be used without a bound.

I see this as a problem because such would imply being able to do mem::swap() of DSTs which would destroy my use case.

cuviper commented 2 years ago

It seems that <T as Pointee>::Metadata is always invariant in T -- is that expected?

For example, these two constructs seem like they should be equivalent, but the latter fails: playground

#![feature(ptr_metadata)]

use std::ptr::{NonNull, Pointee};

pub struct Pointer<T: ?Sized>(NonNull<T>);

pub fn covariant_pointer<'a>(pointer: Pointer<&'static str>) -> Pointer<&'a str> {
    pointer
}

pub struct Parts<T: ?Sized>(NonNull<()>, <T as Pointee>::Metadata);

pub fn covariant_parts<'a>(parts: Parts<&'static str>) -> Parts<&'a str> {
    parts
}

error[E0308]: mismatched types
  --> src/lib.rs:14:5
   |
14 |     parts
   |     ^^^^^ lifetime mismatch
   |
   = note: expected struct `Parts<&'a str>`
              found struct `Parts<&'static str>`
note: the lifetime `'a` as defined here...
  --> src/lib.rs:13:24
   |
13 | pub fn covariant_parts<'a>(parts: Parts<&'static str>) -> Parts<&'a str> {
   |                        ^^
   = note: ...does not necessarily outlive the static lifetime

The Metadata is just () here, since &str is sized, but it also fails if you make that a slice or trait object with a lifetime inside. I guess it is expected that trait objects are invariant, but I think with sized and slice versions of T it should be covariant.

petertodd commented 2 years ago

Rust really needs a way to opt-in to bivariance in this case. Maybe for associates types that are constrained to be 'static?

On January 23, 2022 12:51:03 AM GMT+02:00, Josh Stone @.***> wrote:

It seems that <T as Pointee>::Metadata is always invariant in T -- is that expected?

For example, these two constructs seem like they should be equivalent, but the latter fails: playground
#![feature(ptr_metadata)]

use std::ptr::{NonNull, Pointee};

pub struct Pointer<T: ?Sized>(NonNull<T>);

pub fn covariant_pointer<'a>(pointer: Pointer<&'static str>) -> Pointer<&'a str> {
 pointer
}

pub struct Parts<T: ?Sized>(NonNull<()>, <T as Pointee>::Metadata);

pub fn covariant_parts<'a>(parts: Parts<&'static str>) -> Parts<&'a str> {
 parts
}
error[E0308]: mismatched types
 --> src/lib.rs:14:5
 |
14 | parts
 | ^^^^^ lifetime mismatch
 |
 = note: expected struct `Parts<&'a str>`
 found struct `Parts<&'static str>`
note: the lifetime `'a` as defined here...
 --> src/lib.rs:13:24
 |
13 | pub fn covariant_parts<'a>(parts: Parts<&'static str>) -> Parts<&'a str> {
 | ^^
 = note: ...does not necessarily outlive the static lifetime
The Metadata is just () here, since &str is sized, but it also fails if you make that a slice or trait object with a lifetime inside. I guess it is expected that trait objects are invariant, but I think with sized and slice versions of T it should be covariant.

-- Reply to this email directly or view it on GitHub: https://github.com/rust-lang/rust/issues/81513#issuecomment-1019371954 You are receiving this because you commented.

Message ID: @.***>

nikomatsakis commented 2 years ago

Invariance is definitely expected (all projection is invariant), but also problematic here.

SkiFire13 commented 2 years ago

Probably OT, but isn't this the same issue that's preventing TyKind from being refactored into its own crate for chalk integration?

dimpolo commented 2 years ago

There seems to be a problem with the combination of the ptr_metadata and trait_upcasting features. I'm not sure if posting this over at #65991 might have been better, please let me know.

I'll use ThinBox as a motivating example but it applies to other custom "thin" implementations of datastructures.

The problem is that you can't implement

impl<T: ?Sized + Unsize<U>, U: ?Sized> CoerceUnsized<ThinBox<U>> for ThinBox<T> {} // error[E0277]: the trait bound `WithHeader<<T as Pointee>::Metadata>: CoerceUnsized<WithHeader<<U as Pointee>::Metadata>>` is not satisfied

which then leads to:

#![feature(trait_upcasting)]
trait Foo {}
trait Bar: Foo {}

impl Foo for i32 {}
impl Bar for i32 {}

let bar: Box<dyn Bar> = Box::new(123);
let foo: Box<dyn Foo> = bar;  // works fine

let bar: ThinBox<dyn Bar> = ThinBox::new_unsize(123);
let foo: ThinBox<dyn Foo> = bar;  // error[E0308]: mismatched types

This might be similar to what @matthieu-m mentioned here in relation to storage-poc

I don't know enough about these features to suggest a solution, but it would seem highly desirable to not prematurely close the door on such use-cases.

SimonSapin commented 2 years ago

The problem is that you can't implement […] CoerceUnsized<ThinBox> for ThinBox<T>

That sounds expected to me.

CoerceUnsized and the implicit coercion it enables are all about having the compiler automatically creating a wide pointer-like value from the corresponding thin point-like value and target !Sized type. ThinBox however is still thin, so the compiler can’t know where to put the new metadata. "Unsizing" a ThinBox necessarily has to be a library API.

SimonSapin commented 2 years ago

It’s even worse than that for ThinBox: if T and U have differently-sized metadata, then ThinBox<T> and ThinBox allocate heap memory with different layouts so converting between them requires a new allocation. So users of want a ThinBox would likely prefer some other API that does the "usizing" while creating their first ThinBox, without going through an intermediate ThinBox<T>.

This lesser flexibility related to allocation layout is the tradeoff to make in exchange for thin pointers to DSTs.

dimpolo commented 2 years ago

If I understand correctly, you're saying that CoerceUnsized only does "thin" to "wide". That then rules out any "thin" to "thin" conversions. Is it then fair to say that trait upcasting is misusing CoerceUnsized because it does a "wide" to "wide" cast?

SimonSapin commented 2 years ago

I didn’t know that trait upcasting also involved CoerceUnsized. The important part of https://github.com/rust-lang/rust/issues/81513#issuecomment-1031355360 is that when the compiler implicitly generates new pointer metadata it puts it in a wide pointer, the target of the conversion. Although in the upcasting case, it probably also needs to know where to read the previous metadata. So if there is metadata in the source pointer, it needs to be in a wide pointer too.

dimpolo commented 2 years ago

I think I understand. There is currently only one place where the compiler knows how to find the vtable and that is inside a wide pointer. If it's somewhere else, no upcasting coercion for you.

Maybe we could have something like this in the future though to make manual implementations less error prone:

impl<Dyn: ?Sized> DynMetadata<Dyn> {
    pub fn upcast<Dyn2: ?Sized>(self) -> DynMetadata<Dyn2>
    where
        Dyn: Unsize<Dyn2>,
    {}
}

SimonSapin commented 2 years ago

Calling a method on the metadata type means you already have access to the metadata, which doesn’t help with the compiler not knowing where to find it in ThinBox.

Instead I would expected something like:

impl<T> ThinBox<T> {
    fn try_upcast(self) -> Result<ThinBox<U>, Self> where /* something */ {…}
}

to be provided by the thin box library. The library implementation of this might create a temporary raw pointer with the current metadata, upcast it with as, then extract the metadata of that new raw pointer.

Anything more integrated into the language gets into Custom DST territory and opens a lot more design question IMO.

vojtechkral commented 2 years ago

As mentiond in the PR linked above - formatted debug prints of pointers based on what kind of pointer they are - I am wondering wheter there should be a distinct trait for Pointee::Metadata, rather than a list of constraints in the Pointee trait itself. (Named something like PointerMetedata or so.)

Arguably it's not much different re the Pointee interface in practice, but would enable to write fns/impls for T: PointerMetedata in the future...

droundy commented 2 years ago

I'm wondering whether the pointer could be changed from const *() to a new const *Opaque<T>, so users could have a little help from the compiler in keeping straight which "artificially thin" pointers are to which types.

Maybe not worth the added complexity, but it would feel nicer not to lose all type information on to_raw_parts().

SimonSapin commented 2 years ago

What would Opaque be? Could you spell out some more the specific definitions and signatures you have in mind?

alercah commented 2 years ago

I'm not in a position to really write this up properly right now, but I think I may have a good case for adding a new type of pointer with additional, entirely dynamic metadata that couldn't be uniquely associated with the pointee type.

This wouldn't affect Pointee::Metadata but it would maybe mean we should consider pointee_metadata instead of metadata. Hopefully I will be able to write up the full idea someplace.

droundy commented 2 years ago

I was thinking of something like

struct Opaque<T>(PhantomData<T>);

Hopefully with a nicer name than Opaque.

eddyb commented 2 years ago

I think <[T] as Pointee>::Metadata == usize is a forward compatibility hazard: if we ever to allow [T] to be used with T: !Sized element types, it forces us into either:

giving up and staying with the current slice types ~forever
using some ~~(unsound?)~~ specialization to branch on whether T: Sized and use SliceMetadata<T> instead of usize in the T: !Sized case

It seems preferable to introduce SliceMetadata<T> today, before Pointee can be stabilized. (And to avoid any unforeseen typesystem interactions, it should include a <T as Pointee>::Metadata even if T: Sized would hold for now)

EDIT: @BoxyUwU has pointed out to me that while specializing on T: Sized would probably not run into soundness issues (assuming custom DSTs don't accidentally require fragile bounds or something), but there's a worse problem:

This compiles today, but if Pointee impls start using specialization, that will block normalization and break it:

fn foo<T>(x: usize) -> <[T] as Pointee>::Metadata {
    x
}

(it's possible for us to eventually allow normalization of associated types that involve specialization, in some cases, but it's again a bunch of additional complexity to work around the hardcoded usize)

nikomatsakis commented 2 years ago

@eddyb what would the metadata for a [[T]] be, do you think? I do wonder whether it makes sense to use slice types there or to encourage people to build their own types (it seems to me like there is no "obviously correct" semantics to assign)

eddyb commented 2 years ago

(it seems to me like there is no "obviously correct" semantics to assign)

Even if that is the case, a lot of decisions we make now could lock us into a future where all existing types are limited and users will end up having to use custom DSTs for everything else - this does not seem optimal to me.

I think it's even worse than just [T] with unsized T: tuples and structs do not wrap their inner metadata, so if we ever want to e.g. support (T, U) with unsized T and U, we'd have:

when T: Sized: <(T, U)>::Metadata == U::Metadata
when T: !Sized: <(T, U)>::Metadata == (T::Metadata, U::Metadata)
- ideally not a tuple though (@Gankra's blog post uses an Aggregate type here)

This is, again, a "type(class) match" on T: Sized, which I would think we'd really want to avoid.

Maybe exposing actually user-visible types was a mistake altogether and we should wrap them in a type invariant on the pointee type, with no user-visible normalization.

That is, we could hide metadata types in something like this (with perma-unstable RustcProvided*):

struct MetadataOf<T: ?Sized>(<T as RustcProvidedPointee>::RustcProvidedMetadata);
impl<T: ?Sized> Pointee for T {
    type Metadata = MetadataOf<T>; // maybe not even an assoc type at that point.
}

It's not perfect but at least the Metadata is bound by enough auto traits for nothing other than the size to "leak out" of the definition of MetadataOf, I don't think.

eddyb commented 2 years ago

Another problem I've come across is "dynamic alignment". Consider something like this:

struct WithPrefixes<T: ?Sized> {
    _prefix16: u16
    _prefix8: u8,
    tail: T,
}

Today, we have two kinds of DSTs wrt alignment (static vs dynamic), which results in:

WithPrefixes<[T]> has a static tail offset of round_up(3, align_of::<T>())
WithPrefixes<dyn Trait> has a dynamically computed tail offset
- round_up(3, max(2, align_of_val(&self.tail))) with align_of_val reading from dyn Trait's vtable
- the metadata is (today) DynMetadata<dyn Trait>, i.e. just the dyn Trait vtable

We could remove the runtime max and round_up by instead having this setup:

// Assuming this is `DynMetadata` today and `DynVtable` exists:
struct DynMetadata<T: ?Sized>(&'static DynVtable<T>);

struct WithPrefixesDynVtable<T: ?Sized> {
    tail_vtable: DynVtable<T>,
    tail_offset: usize,
}
struct WithPrefixesDynMetadata<T: ?Sized>(&'static WithPrefixesDynVtable<T>);

Then getting the offset of the tail field is as cheap as align_of_val(&self.tail) today.

However, note that WithPrefixes<T> now can have different metadata from T, which is another change from the current Pointee setup (and would also benefit from having the real Metadata type hidden).

Also, long-term we probably need something like this to support Option<dyn Trait>, if we want to ever attempt it - at least, it makes far more sense to have precomputed field offsets (and a tag decoder fn pointer, pretty much exactly mem::discriminant as fn(_) -> _) in an "extended vtable", than try to generate some code that uses only dyn Trait vtable, somehow.

SimonSapin commented 2 years ago

using some ~(unsound?)~ specialization to branch on whether T: Sized

Today’s impls of the Pointee trait are made up through compiler magic, so this kind of branch/special case would be possible without #![feature(specialization)]

eddyb commented 2 years ago

using some ~(unsound?)~ specialization to branch on whether T: Sized

Today’s impls of the Pointee trait are made up through compiler magic, so this kind of branch/special case would be possible without #![feature(specialization)]

That's worse - the "compiler magic" has none of the checks or analyses in place, that we've at least tried to add to specialization. And unless we make that "compiler magic" work more like specialization before stabilization, Pointee will be able to do things on stable that specialization is disallowed even on nightly!

That is, I'm not aware of a way to write a T: Sized "type-level branch" that resolves during user-facing type-checking, as associated type specialization is intentionally firewalled from it (and I would expect we'd held CTFE to the same standard, but I haven't done a thorough investigation of e.g. const fns in traits).

(Meanwhile, the way raw pointer casts work seem to imply the equality of metadata, just like the Pointee docs, despite that not really being something we should've ever promised in any way)

We've done a much better job with std::mem::Discriminant, of hiding the "compiler magic" in a newtype, and we should be doing the same thing here IMO.

Anyway, I can't really block any of this and I'm guessing people are more interested in custom DSTs than removing artificial limitations that only really exist because we barely managed to get the built-in DSTs we have today.

The tragic irony being, of course, that most of the implementation work on custom DSTs would be passing arbitrary amounts of metadata around. The same blocker as for a lot of the built-in stuff we never got to do.

bjorn3 commented 2 years ago

I think <[T] as Pointee>::Metadata == usize is a forward compatibility hazard: if we ever to allow [T] to be used with T: !Sized element types, it forces us into either:
* giving up and staying with the current slice types ~forever

* using some ~(unsound?)~ specialization to branch on whether `T: Sized` and use `SliceMetadata<T>` instead of `usize` in the `T: !Sized` case
It seems preferable to introduce SliceMetadata<T> today, before Pointee can be stabilized. (And to avoid any unforeseen typesystem interactions, it should include a <T as Pointee>::Metadata even if T: Sized would hold for now)

Using a SliceMetadata wrapper wouldn't solve anything I think. [[T]] would need unsized metadata, right? Unsized metadata is impossible as std::ptr::metadata has to work for any *const T where T: ?Sized. A function can't return an unsized value.

vojtechkral commented 2 years ago

Excuse my ignorance, but how would [[T]] even work?

alercah commented 2 years ago

I think [U] where the metadata of &U is allowed to vary between elements would be a ludicrous thing,[1]. because it would require &[U] to store a metadata for each element, which would make the pointer itself unsized. But there are some imaginable variations:

[U] where U is unsized could be syntactic sugar for [(::Metadata, U)], sort of a Pascal-style array. In this case, no change to the metadata of &[U] would be needed. So while there's definitely some potential use cases for Pascal-style strings, say, there's no need for any further conversation here I think.
[U] could require that all elements have the same metadata. This would make &[[T]] into a typical multi-dimensional array, definitely a case that sees use in practice. The expectation would presumably be that the outer reference would carry the (singular) metadata to avoid needing multiple copies of it. I could see some special syntax to remind users that the inner size must be the same, e.g. &[[T]; dyn] but that's neither here nor there.

Honestly, I think the future-compatibility concerns are worth heeding here. Stabilising that size==stride has caused me a significant amount of headache, and completely cut us off from ABI interoperability with languages where that isn't true (h/t @Gankra for bringing other examples to light), so I'm all for conservatism here.

Given the options, I'd be inclined to just make SliceMetadata opaque for now like DynMetadata is, and throw a len() method onto it to get the actual length. I don't think there's any reason to consider unification with usize a feature, more of a design accident.

[1]: U being implicitly unsized since if it is sized, its metadata is () and does not vary.

rust-lang / rust