Open KodrAus opened 3 years ago
After experimenting with custom implementations of Box
, I think there is a strong case for having strongly typed meta-data for all kinds of pointers.
The pre-allocator representation of Box
is:
struct Box<T: ?Sized> { ptr: NonNull<T>, }
The post-allocator representation is very similar:
struct Box<T: ?Sized, A: Allocator = Global> {
allocator: A,
ptr: NonNull<T>,
}
Both automatically implements CoerceUnsized<Box<U>> where T: Unsize<U>
, and all is well.
If one wants to make Box generic over its storage, then the representation becomes:
pub struct RawBox<T: ?Sized + Pointee, S: SingleElementStorage> {
storage: S,
handle: S::Handle<T>,
}
If S::Handle<T> == NonNull<T>
, then Box is still coercible; however, in the case of inline storage, that is:
Box
is moved, so is the storage, and therefore any pointer into the storage is invalidated.Hence, in the case of inline storage, S::Handle<T>
is best defined as <T as Pointee>::Metadata
.
In order to have Box<T> : CoerceUnsized<Box<U>> where T: Unsize<U>
:
S::Handle<T>: CoerceUnsized<S::Handle<U>> where T: Unsize<U>
,<T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>> where T: Unsize<U>
.And of course, Box being coercible is very much desirable.
As a result, I believe a slight change of course is necessary:
Metadata<dyn Debug>
, Metadata<[u8]>
or Metadata<[u8; 3]>
-- no more ()
or usize
.Metadata<T>: CoerceUnsized<Metadata<U>> where T: Unsize<U>
.I would note that having a single Metadata<T>
type rather than SizedMetadata<T>
, SliceMetadata<[T]>
, DynMetadata<dyn T>
is not necessary, only the coercion is, and since the compiler is generating those, it's perfectly free to create them "cross type". I just used the same name as a short-cut.
Addendum: What's all that jazz about inline storage?
At a high-level, Box
is not so much about where memory comes from, it's a container which allows:
Having the memory inline in the Box
type preserves those 2 key properties whilst offering a self-contained type (not tied to any lifetime, nor any thread). It's allocation-less type-erasure.
A motivating example is therefore fn foo<T>() -> Box<dyn Future<T>, SomeInlineStorage>
: it returns a stack-allocated container which contains any future type (fitting in the storage) which can evaluate to T
.
Box<dyn Future<T>, SomeInlineStorage>
would have to be dynamically-sized itself, right? So in order to manipulate it without another lifetime or heap-allocated indirection you’d need the unsized locals language feature. And if you have that you can manipulate dyn Future<T>
directly, so what’s the point of a box with inline storage?
IMO this is different from the case of Vec
, which provides useful functionality on top of its storage so that ArrayVec
(a.k.a. Vec
with inline storage) makes sens. But Box
pretty much is its storage.
Box<dyn Future<T>, SomeInlineStorage>
would have to be dynamically-sized itself, right?
No, that's the whole point of it actually.
In C++, you have std::string
and std::function
implementation typically using the "short string optimization", that is a sufficiently small payload is just embedded inside, and larger ones require a heap-allocation.
This is exactly the same principle:
std::string
can contain up to 15 non-NUL characters without heap allocation on 64-bits platform.sizeof(std::string) == 24
, regardless of whether it's empty, contains a single character, or contains 15.So, here, SomeInlineStorage
is generally speak over-reserving. You settle on a fixed alignment and size, and then you may get mem::size_of::<Box<dyn Future, SomeInlineStorage>>() == 128
regardless of what's stored inside.
If you stored a single pointer (+v-table), well, you're paying a bit too much, but that's the price for flexibility. It's up to you size it appropriately for the largest variant.
In any case, unsized locals is strictly unnecessary, as can be seen in the tests of storage-poc's RawBox
.
Oh I see, so this is more like SmallVec
than ArrayVec
and "inline" really means inline up to a certain size chosen a compile-time, and heap-allocated for values that turn out at run-time to be larger?
Back to pointer metadata though, I have a bit of a hard time following the CoerceUnsized
discussion. But could you manage what you want if the handle for storage-generic Box<T>
is not T::Metadata
directly but another generic struct that contains that together with PhandomData<T>
?
Oh I see, so this is more like
SmallVec
thanArrayVec
and "inline" really means inline up to a certain size chosen a compile-time, and heap-allocated for values that turn out at run-time to be larger?
It's up to you, you can have either a purely inline storage, or you can have "small" inline storage with heap fallback.
The main point is that the "inline" portion is always of fixed size and alignment (up to the storage) and therefore RawBox
itself is always Sized
.
_(You can an equivalent of ArrayVec
instantiated in this test-suite: RawVec<T, inline::SingleRange<...>>
)_
Back to pointer metadata though, I have a bit of a hard time following the
CoerceUnsized
discussion. But could you manage what you want if the handle for storage-genericBox<T>
is notT::Metadata
directly but another generic struct that contains that together withPhandomData<T>
?
I don't think so, given the language from the documentation of CoerceUnsized
:
For custom types, the coercion here works by coercing
Foo<T>
toFoo<U>
provided an impl ofCoerceUnsized<Foo<U>>
forFoo<T>
exists.Such an impl can only be written if
Foo<T>
has only a single non-phantomdata field involvingT
.If the type of that field is
Bar<T>
, an implementation ofCoerceUnsized<Bar<U>>
forBar<T>
must exist. The coercion will work by coercing theBar<T>
field intoBar<U>
and filling in the rest of the fields fromFoo<T>
to create aFoo<U>
. This will effectively drill down to a pointer field and coerce that.
It appears that PhantomData
fields are ignored for the purpose of coercion.
Note how a SliceLen<T>
would be the ideal metadata for a [T]
slice, as it could express the fact that the range of valid lengths for a slice reference depends on the size of T
.
However, as there's quite a few ways of manipulating slice pointers without unsafe
, eg via ptr::slice_from_raw_parts
, I don't know if such types can actually enforce all that much at compile time.
there's quite a few ways of manipulating slice pointers without
unsafe
, eg viaptr::slice_from_raw_parts
Yes that’s the idea behind this RFC: generalize slice_from_raw_parts
to other kinds of DSTs
So a thing that seems to be missing here is a stable layout for DynMetadata
itself.
A really annoying thing is that currently you cannot opaquely pass trait objects across FFI without doing a second allocation, because Box<dyn Trait>
has unknown layout. Are there plans to make this feasible? To me this has been the main use case for work on DST APIs
Would it make sense to add conversions between DynMetada
and some raw pointer type? Would that help the FFI use case?
Yes, it would. It would be annoying to use, but it would suffice.
Doesn't even need to be a pointer type, jsut an opaque type with a well-defined layout. Though pointers makes it easier for other tools to use, definitely.
We could document that DynMetadata
itself has pointer size and ABI. (And introduce some other metadata type if there’s ever a new kind of DST that needs something else.)
We could document that DynMetadata itself has pointer size and ABI. (And introduce some other metadata type if there’s ever a new kind of DST that needs something else.)
That would be nice, and it would be nice if it had explicit conversion functions to *const !
*const ()
, sure, why not.
A *const !
pointer though would always be UB to dereference. So it… encodes in the type system that it is always dangling? That doesn’t seem a good fit for vtables.
That works too yeah
It was brought to my attention that this feature has some very subtle interaction with unsafe code. Specifically, the following function is currently sound in the sense that safe code cannot cause any UB even when it calls this function:
pub fn make_weird_raw_ptr() -> *const dyn Send {
unsafe { std::mem::transmute((0x100usize, 0x100usize)) }
}
This RFC is a breaking change in that it makes the above function unsound:
let ptr = make_weird_raw_ptr();
let meta = metadata(ptr);
let size = meta.size(); // *oops* UB
At the very least, this should be listed as an open concern to be resolved.
Maybe metadata
should only be safe on references, not raw pointers?
Should DynMetadata not have a type parameter? This might reduce monomorphization cost, but would force that the size, alignment, and destruction pointers be in the same location (offset) for every vtable. But keeping them in the same location is probaly desirable anyway to keep code size small.
Don't size and align already have to be in the same location? Certainly Miri assumes this in its implementations of size_of_val
and align_of_val
for trait objects -- and I don't see a way to implement this without that information being at a consistent location.
For drop
, I don't understand why it is mentioned here as also having to be in the same location.
Re: operate on references, Weak<T>
needs to be able to get a layout for a dropped value to deallocate memory. So what we probably want is a MaybeDropped<T>
wrapper to represent a type whose value may have been dropped, and have metadata
operate on it.
MaybeDropped
can be implemented as a wrapper around ManuallyDropped
that prohibits (safe) access to the inner value.
On March 13, 2021 6:01:58 AM EST, Ralf Jung @.***> wrote:
It was brought to my attention that this feature has some very subtle interaction with unsafe code. Specifically, the following function is currently sound in the sense that safe code cannot cause any UB even when it calls this function:
pub fn make_weird_raw_ptr() -> *const dyn Send { unsafe { std::mem::transmute((1usize, 1usize)) } }
This RFC is a breaking change in that it makes the above function unsound:
let ptr = make_weird_raw_ptr(); let meta = metadata(ptr); let size = meta.size(); // *oops* UB
At the very least, this should be listed as an open concern to be resolved.
Maybe
metadata
should only be safe on references, not raw pointers?-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/rust-lang/rust/issues/81513#issuecomment-798158332
@RalfJung Yes, this is an important point. It came up in RFC discussions but I forgot to incorporate it in unresolved questions then. I’ve done so in the issue description here.
My understanding was that make_weird_raw_ptr
is sound in current compilers but that related language rules are still mostly undecided. Has that changed?
Don't size and align already have to be in the same location?
I’m also skeptical that it could be any other way, this point is mostly copied from a comment on the RFC
Maybe
metadata
should only be safe on references, not raw pointers?
I think that would be a serious limitation. I don’t see a reason extracting the components of any raw pointer and putting them back together shouldn’t be sound.
However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada
methods like size
could be made unsafe fn
s.
I’ve done so in the issue description here.
Thanks. Notice however that for this RFC to work, it is not required to make "valid vtable" part of the validity invariant. Making it part of the safety invariant would be sufficient, since this is a library-level concern.
My understanding was that make_weird_raw_ptr is sound in current compilers but that related language rules are still mostly undecided. Has that changed?
Indeed, neither the validity invariant nor the safety invariant of raw pointers are pinned down exactly. I think it is safe to say though that people would expect these invaraints to be as weak as is at all possible, i.e., to require as little as possible. Even the fact that the vtable pointer is non-NULL is tripping people up and we might want to change that.
size_of_val_raw
and similar APIs are unsafe fn
for this exact reason.
However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.
That would be another option, yes. (There could also be SafeDynMetadata
as the metadata of dyn Trait
references where these methods could still be safe.)
There could also be
SafeDynMetadata
as the metadata ofdyn Trait
references where these methods could still be safe.
This would require a design different from the current Pointee
trait where the metadata type <dyn Trait as Pointee>::Metadata
is independent of the kind of pointer/reference.
Is there a reason DynMetadata::drop_in_place
isn't exposed similar to how DynMetadata::size_of
and DynMetadata::align_of
is?
This would be useful for a type-erased ThinBox.
You’d need a data pointer anyway to call it, and if you have that you can use ptr::from_raw_parts_mut
and then ptr::drop_in_place
.
Oh you're right somehow I didn't think of that.
Ah wait. That wouldn't work for a type-erased ThinBox because ptr::from_raw_parts_mut
is generic over T
be we don't know what T
is anymore.
For a "real world" example: RustPython makes their own vtable which stores the T
's drop_in_place
inline before erasing T
.
See pyobject.rs
Let’s say you have impl SomeTrait for SomeStruct
. You’d use from_raw_parts_mut::<dyn SomeTrait>
, that is from_raw_parts_mut::<T>
with T = dyn SomeTrait
. Your code before type erasure may have another type variable T = SomeStruct
that happens to share the same name T
, but the two variables don’t refer to the same type.
Similarly you don’t need to call drop_in_place::<SomeStruct>
directly, instead then use drop_in_place::<dyn SomeTrait>
which takes care of finding the destructor function pointer in the vtable.
You're right again. Thanks!
I’ve added an unresolved question the use of *const ()
for the data component
Maybe *const !
? It's still unstable though.
I have a slight preference for *const u8
since this is what we use for opaque pointers in the allocator API.
Not sure if this was mentioned yet, but Pointee
looks a lot like a marker trait, and traditionally we'd put them under the marker
module, rather than others. For example DiscriminantKind
. To me the fact that all types implement Pointee
suggests that it also should be in marker
rather than ptr
.
However if we end up deciding that raw trait object pointers shouldn’t have any validity invariant for their vtable pointer then DynMetada methods like size could be made unsafe fns.
That would be another option, yes. (There could also be
SafeDynMetadata
as the metadata ofdyn Trait
references where these methods could still be safe.)
How about have:
pub const fn from_raw_parts<T: ?Sized>(*const (), MaybeUninit<<T as Pointee>::Metadata>) -> *const T {}
pub const fn from_raw_parts_mut<T: ?Sized>(*mut (), MaybeUninit<<T as Pointee>::Metadata>) -> *mut T {}
impl<T: ?Sized> *const T {
pub const fn to_raw_parts(self) -> (*const (), MaybeUninit<<T as Pointee>::Metadata>) {}
}
impl<T: ?Sized> *mut T {
pub const fn to_raw_parts(self) -> (*mut (), MaybeUninit<<T as Pointee>::Metadata>) {}
}
@nbdd0121 That looks like exactly what we have in Nightly today. What am I missing?
@nbdd0121 That looks like exactly what we have in Nightly today. What am I missing?
There are MaybeUninit<>
around <T as Pointee>::Metadata
, so we don't have to place any safety invariants on raw pointers.
This way @RalfJung's make_weird_raw_ptr
example would still be sound:
pub fn make_weird_raw_ptr() -> *const dyn Send {
unsafe { std::mem::transmute((0x100usize, 0x100usize)) }
}
let ptr = make_weird_raw_ptr();
let meta = metadata(ptr);
// this can't compile now without unsafe `assume_init`
// let size = meta.size();
I’ve added an unresolved question the use of *const () for the data component
Would it be possible to have *const T
be the data component for this pointers and slice pointers and *const ()
/*const !
/*const u8
/other the data component for trait objects?
That would require another automagic associated type like Metadata
. It’s possible but not simple.
Regarding *const !
specifically, I don’t think it would be appropriate here. There can be no value of type !
, so a *const !
has the semantics of never point to a valid value. However we want the data pointer of &dyn SomeTrait
to be very much pointing to a valid value, just one of specified type and size.
I realized that the question I raised about having <T as Pointee>::Metadata
being a strong type (and kinda forgot about, sorry) never quite got an answer. For reference https://github.com/rust-lang/rust/issues/81513#issuecomment-779353340 .
Summing it up: when experimenting with the Storage API, it came up that in order for Box<T>
to be coercible into Box<U>
, I needed:
<T as Pointee>::Metadata: CoerceUnsized<<U as Pointee>::Metadata>> where T: Unsize<U>
It is not clear whether this is the best way to achieve the functionality above (it's not even clear if there's any other); still, it raises the specter that such implementation may be necessary, and as far as I can see this would require ensuring that <T as Pointee>::Metadata
is always a strong type such as Metadata<T>
and not ()
or usize
as it is currently for Sized types and slice types respectively.
Since "downgrading" Metadata<T>
to an alias for ()
or usize
is easier than upgrading ()
or usize
to Metadata<T>
down the line for backward compatibility reasons. It seems conservative to ensure that <T as Pointee>::Metadata
is a strong type, one for which traits can be implemented.
So:
<T as Pointee>::Metadata
be a strong type?Since "downgrading"
Metadata<T>
to an alias for()
orusize
is easier
Nope, that’s also a breaking change. A downstream crate might have separate impl
s for one of their trait. If the two types are later unified, those impls start colliding.
As implemented by #81172, DynMetadata
is not repr(C)
? Are we not supposed to pass this type through FFI?
There is no guarantee about what DynMetadata
contains. Currently it is a single pointer to a vtable, but a single pointer per trait of a trait object is currently also a valid implementation strategy. It probably won't happen, but this was one of the suggested implementation methods to allow upcasting dyn TraitA + TraitB
to dyn TraitA
.
Is this feature not intended to provide a better FFI experience, i.e. we don't have to transmute to/from TraitObject? If so, what should I do when I need to pass a dyn pointer to/from FFI using the new interface? Should I transmute the DynMetadata
, or put it in a Box and pass the pointer?
The RFC itself suggests putting DynMetadata
in a repr(C)
: https://github.com/rust-lang/rfcs/pull/2580/files#diff-596b2a682c8f7063a250fdf3a1541d6249c84f01eed706d2b631a3a754a3ba66R95
Only adding repr(C)
to DynMetadata
would not help with FFI (or any other use case). Since the fields of DynMetadata
are private there is no stability guarantee about them. Like bjorn3 mentioned it could be more than one pointer for some dyn
types in the future.
For your FFI use case we would need the Language Team to decide that DynMetadata
will always be a single pointer (at least for some dyn
types), then the standard library to provide conversion to/from something like *const ()
.
The RFC itself suggests putting
DynMetadata
in arepr(C)
:
A struct being repr(C)
(in this case WithMeta
) does not say anything about the repr
of the types of its fields. For example this compiles without any warning: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=dbb86dae265ad1938d4002b2d7a2857a
#[repr(C)]
pub struct Foo {
a: Bar,
b: Bar,
}
struct Bar(u8, u32, u8); // likely reordered
A struct being repr(C) ...
While being technically true, I assumed by putting repr(C)
on a type the RFC expresses the intention of using that type for FFI. Having a FFI-unsafe field type in a repr(C)
type causes the outer type to be FFI-unsafe too (or so the compiler warns me), kind of destroy the purpose of putting repr(C)
. Of course my assumption could be wrong.
Having written the ThinBox
example you linked, I can confirm that the intention there was to use repr(C)
in order to disable potential field re-ordering, so that the unsafe
code dereferencing a *mut WithMeta<()>
that was initialized as WithMeta<S>
stays valid. In that example, WithMeta
is a private implementation detail of the ThinBox
library and is unrelated to FFI.
What about instead of having <dyn Trait as Pointee>::Metadata
be DynMetadata
storing a vtable pointer, it's instead *const DynMetadata
pointing to the vtable itself? It would always require unsafe to access dyn metadata, but you need that unsafe
somewhere to get dyn metadata from *const dyn Trait
.
That would prevent the trait object metadata from ever being more than a single pointer. (eg one for each trait vtable) Also accessing the vtables isn't just unsafe, you are depending on implementation details, so it may break at any time.
Should there be a direct API to construct a DynMetadata without specific pointers? Much like:
impl<Dyn: ?Sized> DynMetadata<Dyn> {
pub fn new<T>() -> Self where T: Dyn {}
}
I don't know much about the implementation details, but I think its safe to some degree because DynMetadata doesn't store information about the content or the address of the raw pointer. If such API exists, we'll be able to manually construct DST pointers, which sounds great to me.
T: Dyn
wouldn't work; Dyn
is a type not a trait here.
On yeah that's true, but I hope there's some other way to do the stuff though. :)
This works today:
#![feature(const_fn_trait_bound)]
#![feature(unsize)]
#![feature(ptr_metadata)]
use core::marker::Unsize;
use core::ptr::DynMetadata;
use core::ptr::Pointee;
const fn new_metadata<Dyn: ?Sized, T>() -> DynMetadata<Dyn>
where
T: Unsize<Dyn>,
Dyn: Pointee<Metadata = DynMetadata<Dyn>>,
{
(core::ptr::null::<T>() as *const Dyn).to_raw_parts().1
}
fn main() {
println!("{:?}", new_metadata::<dyn core::any::Any, i32>());
}
This is a tracking issue for the RFC 2580 "Pointer metadata & VTable" (rust-lang/rfcs#2580). The feature gate for the issue is
#![feature(ptr_metadata)]
.About tracking issues
Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.
Steps
Unresolved Questions
Language-level:
DynMetadata
methods likesize
may need to beunsafe fn
. Or maybe something like*const ()
should be metadata of trait objects instead ofDynMetadata
.Right now, there is some inconsistency here:
size_of_val_raw(ptr)
is unsafe, butmetadta(ptr).size_of()
does the same thing and is safe.Metadata
be required to beFreeze
API level:
*const ()
appropriate for the data component of pointers? Or should it be*const u8
? Or*const Opaque
with some newOpaque
type? (Respectively*mut ()
andNonNull<()>
)ptr::from_raw_parts
and friends beunsafe fn
?Thin
be added as a supertrait ofSized
? Or could it ever make sense to have fat pointers to statically-sized types?DynMetadata
not have a type parameter? This might reduce monomorphization cost, but would force that the size, alignment, and destruction pointers be in the same location (offset) for every vtable. But keeping them in the same location is probaly desirable anyway to keep code size small.API bikesheds:
Pointee
(v.s. Referent?),Thin
(ThinPointee
?),DynMetadata
(VTablePtr
?), etccore::ptr
. For example: shouldThin
be incore::marker
instead?Implementation history
Tracked APIs
Last updated for https://github.com/rust-lang/rust/pull/81172.