rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
94.73k stars 12.21k forks source link

Tracking Issue for pointer metadata APIs #81513

Open KodrAus opened 3 years ago

KodrAus commented 3 years ago

This is a tracking issue for the RFC 2580 "Pointer metadata & VTable" (rust-lang/rfcs#2580). The feature gate for the issue is #![feature(ptr_metadata)].

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Unresolved Questions

Language-level:

API level:

API bikesheds:

Implementation history

Tracked APIs

Last updated for https://github.com/rust-lang/rust/pull/81172.

pub trait Pointee {
    /// One of `()`, `usize`, or `DynMetadata<dyn SomeTrait>`
    type Metadata;
}

pub trait Thin = Pointee<Metadata = ()>;

pub const fn metadata<T: ?Sized>(ptr: *const T) -> <T as Pointee>::Metadata {}

pub const fn from_raw_parts<T: ?Sized>(*const (), <T as Pointee>::Metadata) -> *const T {}
pub const fn from_raw_parts_mut<T: ?Sized>(*mut (), <T as Pointee>::Metadata) -> *mut T {}

impl<T: ?Sized> NonNull<T> {
    pub const fn from_raw_parts(NonNull<()>, <T as Pointee>::Metadata) -> NonNull<T> {}

    /// Convenience for `(ptr.cast(), metadata(ptr))`
    pub const fn to_raw_parts(self) -> (NonNull<()>, <T as Pointee>::Metadata) {}
}

impl<T: ?Sized> *const T {
    pub const fn to_raw_parts(self) -> (*const (), <T as Pointee>::Metadata) {}
}

impl<T: ?Sized> *mut T {
    pub const fn to_raw_parts(self) -> (*mut (), <T as Pointee>::Metadata) {}
}

/// `<dyn SomeTrait as Pointee>::Metadata == DynMetadata<dyn SomeTrait>`
pub struct DynMetadata<Dyn: ?Sized> {
    // Private pointer to vtable
}

impl<Dyn: ?Sized> DynMetadata<Dyn> {
    pub fn size_of(self) -> usize {}
    pub fn align_of(self) -> usize {}
    pub fn layout(self) -> crate::alloc::Layout {}
}

unsafe impl<Dyn: ?Sized> Send for DynMetadata<Dyn> {}
unsafe impl<Dyn: ?Sized> Sync for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Debug for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Unpin for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Copy for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Clone for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Eq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialEq for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Ord for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> PartialOrd for DynMetadata<Dyn> {}
impl<Dyn: ?Sized> Hash for DynMetadata<Dyn> {}
matthieu-m commented 3 years ago

After experimenting with custom implementations of Box, I think there is a strong case for having strongly typed meta-data for all kinds of pointers.

The pre-allocator representation of Box is:

struct Box<T: ?Sized> { ptr: NonNull<T>, }

The post-allocator representation is very similar:

struct Box<T: ?Sized, A: Allocator = Global> {
    allocator: A,
    ptr: NonNull<T>,
}

Both automatically implements CoerceUnsized<Box<U>> where T: Unsize<U>, and all is well.

If one wants to make Box generic over its storage, then the representation becomes:

pub struct RawBox<T: ?Sized + Pointee, S: SingleElementStorage> {
    storage: S,
    handle: S::Handle<T>,
}

If S::Handle<T> == NonNull<T>, then Box is still coercible; however, in the case of inline storage, that is:

Hence, in the case of inline storage, S::Handle<T> is best defined as <T as Pointee>::Metadata.

In order to have Box<T> : CoerceUnsized<Box<U>> where T: Unsize<U>:

And of course, Box being coercible is very much desirable.


As a result, I believe a slight change of course is necessary:

  1. All metadata should be strongly typed -- be it Metadata<dyn Debug>, Metadata<[u8]> or Metadata<[u8; 3]> -- no more () or usize.
  2. The compiler should automatically implement Metadata<T>: CoerceUnsized<Metadata<U>> where T: Unsize<U>.

I would note that having a single Metadata<T> type rather than SizedMetadata<T>, SliceMetadata<[T]>, DynMetadata<dyn T> is not necessary, only the coercion is, and since the compiler is generating those, it's perfectly free to create them "cross type". I just used the same name as a short-cut.


Addendum: What's all that jazz about inline storage?

At a high-level, Box is not so much about where memory comes from, it's a container which allows:

Having the memory inline in the Box type preserves those 2 key properties whilst offering a self-contained type (not tied to any lifetime, nor any thread). It's allocation-less type-erasure.

A motivating example is therefore fn foo<T>() -> Box<dyn Future<T>, SomeInlineStorage>: it returns a stack-allocated container which contains any future type (fitting in the storage) which can evaluate to T.

SimonSapin commented 3 years ago

Box<dyn Future<T>, SomeInlineStorage> would have to be dynamically-sized itself, right? So in order to manipulate it without another lifetime or heap-allocated indirection you’d need the unsized locals language feature. And if you have that you can manipulate dyn Future<T> directly, so what’s the point of a box with inline storage?

IMO this is different from the case of Vec, which provides useful functionality on top of its storage so that ArrayVec (a.k.a. Vec with inline storage) makes sens. But Box pretty much is its storage.

matthieu-m commented 3 years ago

Box<dyn Future<T>, SomeInlineStorage> would have to be dynamically-sized itself, right?

No, that's the whole point of it actually.

In C++, you have std::string and std::function implementation typically using the "short string optimization", that is a sufficiently small payload is just embedded inside, and larger ones require a heap-allocation.

This is exactly the same principle:

So, here, SomeInlineStorage is generally speak over-reserving. You settle on a fixed alignment and size, and then you may get mem::size_of::<Box<dyn Future, SomeInlineStorage>>() == 128 regardless of what's stored inside.

If you stored a single pointer (+v-table), well, you're paying a bit too much, but that's the price for flexibility. It's up to you size it appropriately for the largest variant.

In any case, unsized locals is strictly unnecessary, as can be seen in the tests of storage-poc's RawBox.

SimonSapin commented 3 years ago

Oh I see, so this is more like SmallVec than ArrayVec and "inline" really means inline up to a certain size chosen a compile-time, and heap-allocated for values that turn out at run-time to be larger?

Back to pointer metadata though, I have a bit of a hard time following the CoerceUnsized discussion. But could you manage what you want if the handle for storage-generic Box<T> is not T::Metadata directly but another generic struct that contains that together with PhandomData<T>?

matthieu-m commented 3 years ago

Oh I see, so this is more like SmallVec than ArrayVec and "inline" really means inline up to a certain size chosen a compile-time, and heap-allocated for values that turn out at run-time to be larger?

It's up to you, you can have either a purely inline storage, or you can have "small" inline storage with heap fallback.

The main point is that the "inline" portion is always of fixed size and alignment (up to the storage) and therefore RawBox itself is always Sized.

_(You can an equivalent of ArrayVec instantiated in this test-suite: RawVec<T, inline::SingleRange<...>>)_

Back to pointer metadata though, I have a bit of a hard time following the CoerceUnsized discussion. But could you manage what you want if the handle for storage-generic Box<T> is not T::Metadata directly but another generic struct that contains that together with PhandomData<T>?

I don't think so, given the language from the documentation of CoerceUnsized:

For custom types, the coercion here works by coercing Foo<T> to Foo<U> provided an impl of CoerceUnsized<Foo<U>> for Foo<T> exists.

Such an impl can only be written if Foo<T> has only a single non-phantomdata field involving T.

If the type of that field is Bar<T>, an implementation of CoerceUnsized<Bar<U>> for Bar<T> must exist. The coercion will work by coercing the Bar<T> field into Bar<U> and filling in the rest of the fields from Foo<T> to create a Foo<U>. This will effectively drill down to a pointer field and coerce that.

It appears that PhantomData fields are ignored for the purpose of coercion.

petertodd commented 3 years ago

Note how a SliceLen<T> would be the ideal metadata for a [T] slice, as it could express the fact that the range of valid lengths for a slice reference depends on the size of T.

However, as there's quite a few ways of manipulating slice pointers without unsafe, eg via ptr::slice_from_raw_parts, I don't know if such types can actually enforce all that much at compile time.

SimonSapin commented 3 years ago

there's quite a few ways of manipulating slice pointers without unsafe, eg via ptr::slice_from_raw_parts

Yes that’s the idea behind this RFC: generalize slice_from_raw_parts to other kinds of DSTs

Manishearth commented 3 years ago

So a thing that seems to be missing here is a stable layout for DynMetadata itself.

A really annoying thing is that currently you cannot opaquely pass trait objects across FFI without doing a second allocation, because Box<dyn Trait> has unknown layout. Are there plans to make this feasible? To me this has been the main use case for work on DST APIs

SimonSapin commented 3 years ago

Would it make sense to add conversions between DynMetada and some raw pointer type? Would that help the FFI use case?

Manishearth commented 3 years ago

Yes, it would. It would be annoying to use, but it would suffice.

Manishearth commented 3 years ago

Doesn't even need to be a pointer type, jsut an opaque type with a well-defined layout. Though pointers makes it easier for other tools to use, definitely.

SimonSapin commented 3 years ago

We could document that DynMetadata itself has pointer size and ABI. (And introduce some other metadata type if there’s ever a new kind of DST that needs something else.)

Manishearth commented 3 years ago

We could document that DynMetadata itself has pointer size and ABI. (And introduce some other metadata type if there’s ever a new kind of DST that needs something else.)

That would be nice, and it would be nice if it had explicit conversion functions to *const !

SimonSapin commented 3 years ago

*const (), sure, why not.

A *const ! pointer though would always be UB to dereference. So it… encodes in the type system that it is always dangling? That doesn’t seem a good fit for vtables.

Manishearth commented 3 years ago

That works too yeah

RalfJung commented 3 years ago

It was brought to my attention that this feature has some very subtle interaction with unsafe code. Specifically, the following function is currently sound in the sense that safe code cannot cause any UB even when it calls this function:

pub fn make_weird_raw_ptr() -> *const dyn Send {
    unsafe { std::mem::transmute((0x100usize, 0x100usize)) }
}

This RFC is a breaking change in that it makes the above function unsound:

let ptr = make_weird_raw_ptr();
let meta = metadata(ptr);
let size = meta.size(); // *oops* UB

At the very least, this should be listed as an open concern to be resolved.

Maybe metadata should only be safe on references, not raw pointers?

RalfJung commented 3 years ago

Should DynMetadata not have a type parameter? This might reduce monomorphization cost, but would force that the size, alignment, and destruction pointers be in the same location (offset) for every vtable. But keeping them in the same location is probaly desirable anyway to keep code size small.

Don't size and align already have to be in the same location? Certainly Miri assumes this in its implementations of size_of_val and align_of_val for trait objects -- and I don't see a way to implement this without that information being at a consistent location.

For drop, I don't understand why it is mentioned here as also having to be in the same location.

petertodd commented 3 years ago

Re: operate on references, Weak<T> needs to be able to get a layout for a dropped value to deallocate memory. So what we probably want is a MaybeDropped<T> wrapper to represent a type whose value may have been dropped, and have metadata operate on it.

MaybeDropped can be implemented as a wrapper around ManuallyDropped that prohibits (safe) access to the inner value.

On March 13, 2021 6:01:58 AM EST, Ralf Jung @.***> wrote:

It was brought to my attention that this feature has some very subtle interaction with unsafe code. Specifically, the following function is currently sound in the sense that safe code cannot cause any UB even when it calls this function:

pub fn make_weird_raw_ptr() -> *const dyn Send {
   unsafe { std::mem::transmute((1usize, 1usize)) }
}

This RFC is a breaking change in that it makes the above function unsound:

let ptr = make_weird_raw_ptr();
let meta = metadata(ptr);
let size = meta.size(); // *oops* UB

At the very least, this should be listed as an open concern to be resolved.

Maybe metadata should only be safe on references, not raw pointers?

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/rust/issues/81513#issuecomment-798158332

SimonSapin commented 3 years ago

@RalfJung Yes, this is an important point. It came up in RFC discussions but I forgot to incorporate it in unresolved questions then. I’ve done so in the issue description here.

My understanding was that make_weird_raw_ptr is sound in current compilers but that related language rules are still mostly undecided. Has that changed?

Don't size and align already have to be in the same location?

I’m also skeptical that it could be any other way, this point is mostly copied from a comment on the RFC

Maybe metadata should only be safe on references, not raw pointers?

I think that would be a serious limitation. I don’t see a reason extracting the components of any raw pointer and putting them back together shouldn’t be sound.

However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.

RalfJung commented 3 years ago

I’ve done so in the issue description here.

Thanks. Notice however that for this RFC to work, it is not required to make "valid vtable" part of the validity invariant. Making it part of the safety invariant would be sufficient, since this is a library-level concern.

My understanding was that make_weird_raw_ptr is sound in current compilers but that related language rules are still mostly undecided. Has that changed?

Indeed, neither the validity invariant nor the safety invariant of raw pointers are pinned down exactly. I think it is safe to say though that people would expect these invaraints to be as weak as is at all possible, i.e., to require as little as possible. Even the fact that the vtable pointer is non-NULL is tripping people up and we might want to change that.

size_of_val_raw and similar APIs are unsafe fn for this exact reason.

However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.

That would be another option, yes. (There could also be SafeDynMetadata as the metadata of dyn Trait references where these methods could still be safe.)

SimonSapin commented 3 years ago

There could also be SafeDynMetadata as the metadata of dyn Trait references where these methods could still be safe.

This would require a design different from the current Pointee trait where the metadata type <dyn Trait as Pointee>::Metadata is independent of the kind of pointer/reference.

dimpolo commented 3 years ago

Is there a reason DynMetadata::drop_in_place isn't exposed similar to how DynMetadata::size_of and DynMetadata::align_of is?

This would be useful for a type-erased ThinBox.

SimonSapin commented 3 years ago

You’d need a data pointer anyway to call it, and if you have that you can use ptr::from_raw_parts_mut and then ptr::drop_in_place.

dimpolo commented 3 years ago

Oh you're right somehow I didn't think of that.

dimpolo commented 3 years ago

Ah wait. That wouldn't work for a type-erased ThinBox because ptr::from_raw_parts_mut is generic over T be we don't know what T is anymore. For a "real world" example: RustPython makes their own vtable which stores the T's drop_in_place inline before erasing T. See pyobject.rs

SimonSapin commented 3 years ago

Let’s say you have impl SomeTrait for SomeStruct. You’d use from_raw_parts_mut::<dyn SomeTrait>, that is from_raw_parts_mut::<T> with T = dyn SomeTrait. Your code before type erasure may have another type variable T = SomeStruct that happens to share the same name T, but the two variables don’t refer to the same type.

Similarly you don’t need to call drop_in_place::<SomeStruct> directly, instead then use drop_in_place::<dyn SomeTrait> which takes care of finding the destructor function pointer in the vtable.

dimpolo commented 3 years ago

You're right again. Thanks!

SimonSapin commented 3 years ago

I’ve added an unresolved question the use of *const () for the data component

rijenkii commented 3 years ago

Maybe *const !? It's still unstable though.

Amanieu commented 3 years ago

I have a slight preference for *const u8 since this is what we use for opaque pointers in the allocator API.

nagisa commented 3 years ago

Not sure if this was mentioned yet, but Pointee looks a lot like a marker trait, and traditionally we'd put them under the marker module, rather than others. For example DiscriminantKind. To me the fact that all types implement Pointee suggests that it also should be in marker rather than ptr.

nbdd0121 commented 3 years ago

However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.

That would be another option, yes. (There could also be SafeDynMetadata as the metadata of dyn Trait references where these methods could still be safe.)

How about have:

pub const fn from_raw_parts<T: ?Sized>(*const (), MaybeUninit<<T as Pointee>::Metadata>) -> *const T {}
pub const fn from_raw_parts_mut<T: ?Sized>(*mut (), MaybeUninit<<T as Pointee>::Metadata>) -> *mut T {}

impl<T: ?Sized> *const T {
    pub const fn to_raw_parts(self) -> (*const (), MaybeUninit<<T as Pointee>::Metadata>) {}
}

impl<T: ?Sized> *mut T {
    pub const fn to_raw_parts(self) -> (*mut (), MaybeUninit<<T as Pointee>::Metadata>) {}
}
SimonSapin commented 3 years ago

@nbdd0121 That looks like exactly what we have in Nightly today. What am I missing?

nbdd0121 commented 3 years ago

@nbdd0121 That looks like exactly what we have in Nightly today. What am I missing?

There are MaybeUninit<> around <T as Pointee>::Metadata, so we don't have to place any safety invariants on raw pointers.

This way @RalfJung's make_weird_raw_ptr example would still be sound:

pub fn make_weird_raw_ptr() -> *const dyn Send {
    unsafe { std::mem::transmute((0x100usize, 0x100usize)) }
}

let ptr = make_weird_raw_ptr();
let meta = metadata(ptr);
// this can't compile now without unsafe `assume_init`
// let size = meta.size();
SkiFire13 commented 2 years ago

I’ve added an unresolved question the use of *const () for the data component

Would it be possible to have *const T be the data component for this pointers and slice pointers and *const ()/*const !/*const u8/other the data component for trait objects?

SimonSapin commented 2 years ago

That would require another automagic associated type like Metadata. It’s possible but not simple.

Regarding *const ! specifically, I don’t think it would be appropriate here. There can be no value of type !, so a *const ! has the semantics of never point to a valid value. However we want the data pointer of &dyn SomeTrait to be very much pointing to a valid value, just one of specified type and size.

matthieu-m commented 2 years ago

I realized that the question I raised about having <T as Pointee>::Metadata being a strong type (and kinda forgot about, sorry) never quite got an answer. For reference https://github.com/rust-lang/rust/issues/81513#issuecomment-779353340 .

Summing it up: when experimenting with the Storage API, it came up that in order for Box<T> to be coercible into Box<U>, I needed:

<T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>> where T: Unsize<U>

It is not clear whether this is the best way to achieve the functionality above (it's not even clear if there's any other); still, it raises the specter that such implementation may be necessary, and as far as I can see this would require ensuring that <T as Pointee>::Metadata is always a strong type such as Metadata<T> and not () or usize as it is currently for Sized types and slice types respectively.

Since "downgrading" Metadata<T> to an alias for () or usize is easier than upgrading () or usize to Metadata<T> down the line for backward compatibility reasons. It seems conservative to ensure that <T as Pointee>::Metadata is a strong type, one for which traits can be implemented.

So:

SimonSapin commented 2 years ago

Since "downgrading" Metadata<T> to an alias for () or usize is easier

Nope, that’s also a breaking change. A downstream crate might have separate impls for one of their trait. If the two types are later unified, those impls start colliding.

yshui commented 2 years ago

As implemented by #81172, DynMetadata is not repr(C)? Are we not supposed to pass this type through FFI?

bjorn3 commented 2 years ago

There is no guarantee about what DynMetadata contains. Currently it is a single pointer to a vtable, but a single pointer per trait of a trait object is currently also a valid implementation strategy. It probably won't happen, but this was one of the suggested implementation methods to allow upcasting dyn TraitA + TraitB to dyn TraitA.

yshui commented 2 years ago

Is this feature not intended to provide a better FFI experience, i.e. we don't have to transmute to/from TraitObject? If so, what should I do when I need to pass a dyn pointer to/from FFI using the new interface? Should I transmute the DynMetadata, or put it in a Box and pass the pointer?

The RFC itself suggests putting DynMetadata in a repr(C): https://github.com/rust-lang/rfcs/pull/2580/files#diff-596b2a682c8f7063a250fdf3a1541d6249c84f01eed706d2b631a3a754a3ba66R95

SimonSapin commented 2 years ago

Only adding repr(C) to DynMetadata would not help with FFI (or any other use case). Since the fields of DynMetadata are private there is no stability guarantee about them. Like bjorn3 mentioned it could be more than one pointer for some dyn types in the future.

For your FFI use case we would need the Language Team to decide that DynMetadata will always be a single pointer (at least for some dyn types), then the standard library to provide conversion to/from something like *const ().

The RFC itself suggests putting DynMetadata in a repr(C):

A struct being repr(C) (in this case WithMeta) does not say anything about the repr of the types of its fields. For example this compiles without any warning: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=dbb86dae265ad1938d4002b2d7a2857a

#[repr(C)]
pub struct Foo {
    a: Bar,
    b: Bar,
}

struct Bar(u8, u32, u8); // likely reordered
yshui commented 2 years ago

A struct being repr(C) ...

While being technically true, I assumed by putting repr(C) on a type the RFC expresses the intention of using that type for FFI. Having a FFI-unsafe field type in a repr(C) type causes the outer type to be FFI-unsafe too (or so the compiler warns me), kind of destroy the purpose of putting repr(C). Of course my assumption could be wrong.

SimonSapin commented 2 years ago

Having written the ThinBox example you linked, I can confirm that the intention there was to use repr(C) in order to disable potential field re-ordering, so that the unsafe code dereferencing a *mut WithMeta<()> that was initialized as WithMeta<S> stays valid. In that example, WithMeta is a private implementation detail of the ThinBox library and is unrelated to FFI.

kupiakos commented 2 years ago

What about instead of having <dyn Trait as Pointee>::Metadata be DynMetadata storing a vtable pointer, it's instead *const DynMetadata pointing to the vtable itself? It would always require unsafe to access dyn metadata, but you need that unsafe somewhere to get dyn metadata from *const dyn Trait.

bjorn3 commented 2 years ago

That would prevent the trait object metadata from ever being more than a single pointer. (eg one for each trait vtable) Also accessing the vtables isn't just unsafe, you are depending on implementation details, so it may break at any time.

js2xxx commented 2 years ago

Should there be a direct API to construct a DynMetadata without specific pointers? Much like:

impl<Dyn: ?Sized> DynMetadata<Dyn> {
    pub fn new<T>() -> Self where T: Dyn {}
}

I don't know much about the implementation details, but I think its safe to some degree because DynMetadata doesn't store information about the content or the address of the raw pointer. If such API exists, we'll be able to manually construct DST pointers, which sounds great to me.

nbdd0121 commented 2 years ago

T: Dyn wouldn't work; Dyn is a type not a trait here.

js2xxx commented 2 years ago

On yeah that's true, but I hope there's some other way to do the stuff though. :)

nbdd0121 commented 2 years ago

This works today:

#![feature(const_fn_trait_bound)]
#![feature(unsize)]
#![feature(ptr_metadata)]

use core::marker::Unsize;
use core::ptr::DynMetadata;
use core::ptr::Pointee;

const fn new_metadata<Dyn: ?Sized, T>() -> DynMetadata<Dyn>
where
    T: Unsize<Dyn>,
    Dyn: Pointee<Metadata = DynMetadata<Dyn>>,
{
    (core::ptr::null::<T>() as *const Dyn).to_raw_parts().1
}

fn main() {
    println!("{:?}", new_metadata::<dyn core::any::Any, i32>());
}