The KnownLayout trait encodes enough information about a type's layout to be able to synthesize pointers to values of that type at runtime. While we can already do this for sized types and slice types, KnownLayout generalizes this to "slice DSTs" - types with leading fixed-sized fields followed by a trailing slice.
Progress
[x] Encode the layout of unsized types in a value that can be operated on in const code
[x] Support KnownLayout in Ref, and use this to support KnownLayout in other places
[x] Add #[doc(hidden)]Ref constructor which takes T: ?Sized + KnownLayout, and is named something like new_known_layout_name_to_be_bikeshedded
967
[x] Add private Ref method named something like deref_known_layout_name_to_be_bikeshedded (and similar for deref_mut). This method should not have a T: FromBytes bound, and should instead be unsafe and require that the contained T is valid as a safety precondition (this allows it to be used in TryFromBytes).
[x] Updating existing uses (such as TryFromBytes::try_from_ref) to internally construct a Ref and then use the appropriate private deref method
967
[x] Figure out how to support methods that provide explicit lengths for the trailing slice element
[x] #1162
[x] Require KnownLayout to apply recursively to all fields to remain forwards-compatible with #494
We're still going to do this, but not for this reason. It's tracked elsewhere.
[x] Figure out how to phrase "size validity" in APIs that support unsized types (currently, we just refer to size_of::<T>())
[x] #1263
[x] Support unsized repr(transparent) types (and maybe other reprs?)
Won't do this now; filed to track: #1477
[x] Add FromBytes methods and/or Ref constructors which operate on the whole buffer (rather than just the prefix/suffix) and take an explicit trailing element count
1258
[x] Resolve all TODO(#29) comments
[ ] #1292
[x] Once we're ready to make KnownLayout public (as of this writing, this will be in 0.8 per #671):
[x] Finalize all names and remove #[doc(hidden)] from all items
[x] Mark all slice functions/methods as deprecated
[x] For APIs which only operate on references (not values), replace the implicit T: Sized bound with T: ?Sized + KnownLayout; merge these APIs with any private APIs which were added during development (such as Ref::new_known_layout_name_to_be_bikeshedded)
[x] Audit all doc comments to make sure they're up to date. Especially:
[x] FromBytes:: ref_from_prefix and ref_from_suffix have stale doc comments that refer to size_of::<Self>()
[x] Ref::new, new_from_prefix, and new_from_suffix have stale doc comments that refer to size_of::<T>()
Motivation
Many users use zerocopy by defining types whose layouts match the layouts of data that needs to be parsed, serialized, or otherwise interacted with. Currently, zerocopy supports types with fixed size ("sized" types) and slices (a repeated structure of 0 or more instances of a sized type). However, many of our users have data which is not shaped like either of these.
In particular, many of our users have data which is shaped like a "slice-based dynamically-sized type" or "slice DST". A slice DST is a type with some number of fixed-sized fields followed by a trailing slice field. For example:
In this example, all UdpPackets have the same fixed UdpHeader, but may have a variable number of bytes in the trailing body field. The layout of UdpPacket exactly matches the layout of a UDP packet on the wire.
This design aims to support users whose data can be modeled by slice DSTs.
Design
What KnownLayout needs to encode
KnownLayout needs to provide any information required in order to perform a conversion from &[u8] to &T where T is a sized type, a slice type, or a slice DST. This conversion may be fallible due to size or alignment.
In order to support these conversions, zerocopy must know the following:
Alignment
For sized types, size
For slices and slice DSTs:
The offset to the trailing slice field (0 for slice types)
The size of the element type of the trailing slice field
In order to compute KnownLayout's associated LAYOUT constant, we use a recursive approach:
For sized types, we query the alignment and size via align_of and size_of
For slice types, we rely on align_of and size_of to get the alignment and size of the element type; the slice's layout can be computed from these values
For composite types (namely in the context of #[derive(KnownLayout)]), we need to extract a few pieces of information:
The alignment of the overall type
The offset of the trailing slice field (if any)
From these, we can compute the LAYOUT. However, computing these in a generic context is highly non-trivial.
There are multiple avenues we could approach, but they all have pros and cons, and many are subject to timelines beyond our control since they require stabilizing currently-unstable Rust features. Our plan is to "race" them - try our best to drive progress on all fronts and see which approach gets to the finish line first. Since stabilizing unstable Rust features is a low-bandwidth/high-latency process, we expect this to be a reasonable use of our time.
Option 1: offset_of! and unsized align_of
The offset_of! macro provides the ability to directly query for the offset of a field within a type. It is currently unstable, and only supports sized types. If we could stabilize offset_of! and add support for unsized types, we could use it to query the offset of a type's trailing slice field.
We also need to support alignment. For this, we'd need align_of to drop its T: Sized bound.
Ideally, we'd like KnownLayout to be a super-trait of FromZeroes (or TryFromBytes (#5) once FromZeroes: TryFromBytes). Unfortunately, FromZeroes supports types which KnownLayout cannot support: unsized types without a fixed representation (ie, repr(rust)). These types do not provide sufficient guarantees about their layout in order to satisfy KnownLayout's safety conditions. We have two options:
Restrict FromZeroes to only support types which KnownLayout can support, and then have FromZeroes: KnownLayout
Use KnownLayout as a bound on individual functions and methods
We intend for zerocopy to be a general-purpose library which supports use cases that do not require known representations (any use case which doesn't require a type's layout to correspond to a fixed specification, but only to be consistent within the context of a program's execution). The first option would preclude us from supporting these use cases, so we opt for the latter: we will use KnownLayout as a bound on individual functions and methods.
...the trailing tail: [u8] is at offset 4 (after 2 bytes for Bar.u and 2 bytes for Foo.u).
In order to compute trailing slice offset in the context of a custom derive, it's necessary to compute recursively. In particular, given the offset of the trailing field within the outermost type (in this case, Bar.foo) and the offset of the trailing slice within the inner type (in this case, Foo.tail), it's possible to compute the offset of the trailing slice within the outer type as the sum of these two values. In all cases, this is a recursive computation that bottoms out at an actual slice type, [T], whose trailing slice offset is 0.
Thus the challenge is, given the AST of an arbitrary, possibly dynamically-sized type, to produce code which computes the byte offset of the trailing field. We have a few options.
Another option is to rely on the standard library's addr_of! macro. Given a raw pointer to the beginning of a type, the expression addr_of!((*ptr).trailing_field) will compute the address of trailing field, and the raw pointer method offset_from can be used to compute the byte offset between these two pointers, effectively computing the trailing field offset.
Unfortunately, place projection the addr_of! macro performs a place projection, and even if the pointers are never dereferenced, place projections may only happen inside the bounds of a valid allocation. Per The Reference, the following is undefined behavior:
Thus, in order to use addr_of!, we need a valid allocation which is large enough that the field projection will not go out-of-bounds (the allocation does not need to be properly aligned). It is fairly trivial to produce a large allocation, even at const time:
This, however, isn't enough. We also need to bounds check our allocation to make sure that the field projection won't go out-of-bounds. While it's unlikely we'd ever encounter a type with a trailing field offset of 2^16 bytes, it's not impossible, and we can't exhibit undefined behavior if we encounter such a type.
In order to perform the bounds check, we need some way of obtaining an upper bound for the trailing field offset before we've actually computed it. The only way of doing this (to my knowledge) is to calculate the size of the smallest possible value of the type (ie, the value with 0 trailing slice elements). We can't construct such an instance in the general case, but we can construct a raw pointer to such an instance:
// A `*const [()]` with 0 elements.
let slc = core::ptr::slice_from_raw_parts(&() as *const _, 0);
let t = slc as *const T;
This relies on behavior which is currently not well-defined, but is in review as of this writing.
However, once we have this raw pointer, we need to know its size. The only way to do this is with the currently-unstable size_of_val_raw. Once we've computed the size of the smallest possible value for our type (ie, size_of_val_raw(t)), we have an upper bound for the offset of the trailing field; so long as this value is not larger than the size of our allocation, the field projection is guaranteed to be in-bounds, allowing us to soundly compute the trailing field offset.
Old text
Add a replacement for MaybeUninit which supports unsized types. Add support for conversions on custom dynamically sized-types (DSTs).
Status
Phase 1 is in review in #312.
Motivation
This proposal aims to solve two problems using a single, unified design.
Unsized MaybeUninit
The standard library's MaybeUninit type don't support wrapping unsized types. MaybeUninit is a core building block of important designs like TryFromBytes. So long as it doesn't support unsized types, designs like TryFromBytes also can't support unsized types.
Custom DSTs
While zerocopy supports conversions on slice types, it doesn't support conversions on custom dynamically-sized types (DSTs) like:
#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpHeader { ... }
#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpPacket {
header: UdpHeader,
body: [u8], // Unsized field makes this type a "custom DST"
}
Currently, users such as packet-formats instead write code like:
The latter code is more cumbersome, and requires storing an extra pointer in order to refer to a UDP packet in memory.
The ability to support custom DSTs is a frequent request from our users.
Design
This design comes in two phases; the first phase can be implemented without the second phase, and will provide value on its own.
In Phase 1, support is added for a MaybeUninit-like type that supports wrapping both T: Sized and [T] where T: Sized. This will unlock the TryFromBytes design as described above. In the Phase 2, support is added for a KnownLayout trait which provides a superset of the functionality from Phase 1, and supports zero-copy conversion of arbitrary custom DSTs. A custom derive is provided KnownLayout.
Phase 1: MaybeUninit
The standard library's MaybeUninit<T> type has the same layout as T, but it has no bit validity constraints - any byte value, including an uninitialized byte, may be written at any byte offset in a MaybeUninit<T>. A replacement for this type just needs to have these semantics, and also needs to support unsized types.
Our design builds upon the fact that MaybeUninit exists and works for sized types. We define the following trait:
pub unsafe trait AsMaybeUninit {
/// A type which has the same layout as `Self`, but which has no validity
/// constraints.
///
/// Roughly speaking, this type is equivalent to what the standard library's
/// [`MaybeUninit<Self>`] would be if `MaybeUninit` supported unsized types.
type MaybeUninit: ?Sized;
}
For Sized types, the implementation is trivial:
unsafe impl<T: Sized> AsMaybeUninit for T {
type MaybeUninit = core::mem::MaybeUninit<T>;
}
For all other types, we use the standard library's MaybeUninit type as a building block to build up a type whose layout is the same as what MaybeUninit's would be if it supported unsized types:
unsafe impl<T: Sized> AsMaybeUninit for [T] {
type MaybeUninit = [core::mem::MaybeUninit<T>];
}
unsafe impl AsMaybeUninit for str {
type MaybeUninit = <[u8] as AsMaybeUninit>::MaybeUninit;
}
Finally, we add our own MaybeUninit type, which is simply a convenience wrapper:
#[repr(transparent)]
pub struct MaybeUninit<T: AsMaybeUninit + ?Sized> {
inner: T::MaybeUninit,
}
// The equivalent impl for `MaybeUninit<T>` is already covered by the blanket impl for `T: Sized`.
unsafe impl<T: Sized> AsMaybeUninit for MaybeUninit<[T]> {
type MaybeUninit = [<T as AsMaybeUninit>::MaybeUninit];
}
In Phase 1, these are the only supported unsized types. In Phase 2, we allow deriving AsMaybeUninit on arbitrary types, which adds support for custom DSTs.
Safety invariants
The safety invariants on AsMaybeUninit are somewhat complex. This is mostly a result of needing to support unsized types. For a more in-depth explanation of why supporting unsized types introduces a lot of complexity, see here.
pub unsafe trait AsMaybeUninit {
/// A type which has the same layout as `Self`, but which has no validity
/// constraints.
///
/// Roughly speaking, this type is equivalent to what the standard library's
/// [`MaybeUninit<Self>`] would be if `MaybeUninit` supported unsized types.
///
/// # Safety
///
/// For `T: AsMaybeUninit`, the following must hold:
/// - Given `m: T::MaybeUninit`, it is sound to write any byte value,
/// including an uninitialized byte, at any byte offset in `m`
/// - `T` and `T::MaybeUninit` have the same alignment requirement
/// - It is valid to use an `as` cast to convert a `t: *const T` to a `m:
/// *const T::MaybeUninit` and vice-versa (and likewise for `*mut T`/`*mut
/// T::MaybeUninit`). Regardless of which direction the conversion was
/// performed, the sizes of the pointers' referents are always equal (in
/// terms of an API which is not yet stable, `size_of_val_raw(t) ==
/// size_of_val_raw(m)`).
/// - `T::MaybeUninit` contains [`UnsafeCell`]s at exactly the same byte
/// ranges that `T` does.
///
/// [`MaybeUninit<Self>`]: core::mem::MaybeUninit
/// [`UnsafeCell`]: core::cell::UnsafeCell
type MaybeUninit: ?Sized;
}
Let's walk through these:
The reason that we describe bit validity in terms of writing to an existing value (rather than producing a new value) is that it allows us to sidestep a lot of the complexity of defining size equivalence between unsized types. This requirement is just a generalization of what you'd write for sized types: something like, "it must be valid to produce a T::MaybeUninit containing any bit pattern, including uninitialized bytes."
We speak of T and T::MaybeUninit's alignment in prose rather than by referring to core::mem::align_of because that function requires that its type argument is sized.
The as cast requirement ensures that T and T::MaybeUninit are either both sized or have compatible unsized types. In particular, Rust prohibits sized-to-unsized casts; requiring both directions to be valid ensures that casts are sized-to-sized or unsized-to-unsized. The size equality constraint ensures that:
For sized types, sizes are equal
For custom DSTs, the trailing slice element's sizes are equal; thus, a T and a T::MaybeUninit whose trailing slices are of the same length have the same size
It is intended to be sound to synthesize two references to the same memory region where only one reference treats that region as containing an UnsafeCell. In other words, interior mutability soundness is a runtime property that is only violated when an UnsafeCell is used, not merely when a reference to it is constructed. Unfortunately, this is not actually currently guaranteed, and is unsound under the stacked borrows memory model. Thus, we need to ensure that UnsafeCells in T and T::MaybeUninit line up perfectly.
Pointer conversion
We want to be able to provide unsized equivalents of the assume_init_ref and assume_init_mut methods. However, the naive implementation doesn't work:
impl<T: AsMaybeUninit + ?Sized> MaybeUninit<T> {
pub unsafe fn assume_init_ref(&self) -> &T {
let ptr = (&self.inner) as *const T::MaybeUninit as *const T;
unsafe { &*ptr }
}
}
The *const T::MaybeUninit as *mut T cast isn't valid in a generic context where T: ?Sized because Rust doesn't know what type of pointers these are (thin, fat, what kind of fat pointer, etc). In order to get around this problem, we add the following methods to AsMaybeUninit:
TODO: Mention that this will require removing the blanket impl of AsMaybeUninit for T: Sized, which will be a breaking change. We need to not publish in between Phases 1 and 2.
Alternatives
For MaybeUninit, we could propose a change to Rust to add limited support for unsized unions, which would allow MaybeUninit (which is a union type) to support unsized types. Even if such a proposal were accepted, it would likely take months or years to be stabilized and thus available for our use.
For custom DSTs, we could wait until the ptr_metadata and layout_for_ptr features stabilize, which would allow us to rewrite Ref like this:
use core::ptr::Pointee;
pub struct Ref<B, T: ?Sized> {
data: *mut (),
meta: <T as Pointee>::Metadata,
_marker: PhantomData<(B, T)>,
}
For Sized types, T::Metadata would be (), and so Ref would consume only a single word (rather than two, as it does today).
On its own, we could still support slice types (T = [U]) as we do today using from_raw_parts and from_raw_parts_mut. However, using layout_for_ptr, we could also determine a DST's prefix and element sizes and thus support arbitrary DSTs rather than just slices.
This design would work, but it's unlikely that either of these features will stabilize soon, and we need something in the meantime. The design presented also unblocks unsized MaybeUninit, which these features wouldn't help with.
Overview
The
KnownLayout
trait encodes enough information about a type's layout to be able to synthesize pointers to values of that type at runtime. While we can already do this for sized types and slice types,KnownLayout
generalizes this to "slice DSTs" - types with leading fixed-sized fields followed by a trailing slice.Progress
const
codeKnownLayout
for sized typesKnownLayout
for unsized typesT: Sized
bound fromalign_of<T>()
offset_of!
and expand it to support unsized typesalign_of_val_raw
offset_of!
and expand it to support unsized typessize_of_val_raw
repr(C)
layout algorithmKnownLayout
inRef
, and use this to supportKnownLayout
in other places#[doc(hidden)]
Ref
constructor which takesT: ?Sized + KnownLayout
, and is named something likenew_known_layout_name_to_be_bikeshedded
967
Ref
method named something likederef_known_layout_name_to_be_bikeshedded
(and similar forderef_mut
). This method should not have aT: FromBytes
bound, and should instead beunsafe
and require that the containedT
is valid as a safety precondition (this allows it to be used inTryFromBytes
).967
TryFromBytes::try_from_ref
) to internally construct aRef
and then use the appropriate private deref method967
RequireKnownLayout
to apply recursively to all fields to remain forwards-compatible with #494size_of::<T>()
)repr(transparent)
types (and maybe other reprs?)FromBytes
methods and/orRef
constructors which operate on the whole buffer (rather than just the prefix/suffix) and take an explicit trailing element count1258
TODO(#29)
commentsKnownLayout
public (as of this writing, this will be in 0.8 per #671):#[doc(hidden)]
from all itemsT: Sized
bound withT: ?Sized + KnownLayout
; merge these APIs with any private APIs which were added during development (such asRef::new_known_layout_name_to_be_bikeshedded
)FromBytes:: ref_from_prefix
andref_from_suffix
have stale doc comments that refer tosize_of::<Self>()
Ref::new
,new_from_prefix
, andnew_from_suffix
have stale doc comments that refer tosize_of::<T>()
Motivation
Many users use zerocopy by defining types whose layouts match the layouts of data that needs to be parsed, serialized, or otherwise interacted with. Currently, zerocopy supports types with fixed size ("sized" types) and slices (a repeated structure of 0 or more instances of a sized type). However, many of our users have data which is not shaped like either of these.
In particular, many of our users have data which is shaped like a "slice-based dynamically-sized type" or "slice DST". A slice DST is a type with some number of fixed-sized fields followed by a trailing slice field. For example:
In this example, all
UdpPacket
s have the same fixedUdpHeader
, but may have a variable number of bytes in the trailingbody
field. The layout ofUdpPacket
exactly matches the layout of a UDP packet on the wire.This design aims to support users whose data can be modeled by slice DSTs.
Design
What
KnownLayout
needs to encodeKnownLayout
needs to provide any information required in order to perform a conversion from&[u8]
to&T
whereT
is a sized type, a slice type, or a slice DST. This conversion may be fallible due to size or alignment.In order to support these conversions, zerocopy must know the following:
This is encoded in the
DstLayout
type:Computing layout
In order to compute
KnownLayout
's associatedLAYOUT
constant, we use a recursive approach:align_of
andsize_of
align_of
andsize_of
to get the alignment and size of the element type; the slice's layout can be computed from these valuesFor composite types (namely in the context of
#[derive(KnownLayout)]
), we need to extract a few pieces of information:From these, we can compute the
LAYOUT
. However, computing these in a generic context is highly non-trivial.There are multiple avenues we could approach, but they all have pros and cons, and many are subject to timelines beyond our control since they require stabilizing currently-unstable Rust features. Our plan is to "race" them - try our best to drive progress on all fronts and see which approach gets to the finish line first. Since stabilizing unstable Rust features is a low-bandwidth/high-latency process, we expect this to be a reasonable use of our time.
Option 1:
offset_of!
and unsizedalign_of
The
offset_of!
macro provides the ability to directly query for the offset of a field within a type. It is currently unstable, and only supports sized types. If we could stabilizeoffset_of!
and add support for unsized types, we could use it to query the offset of a type's trailing slice field.We also need to support alignment. For this, we'd need
align_of
to drop itsT: Sized
bound.Option 2:
size_of_val_raw
andalign_of_val_raw
TODO
Option N: Manually implement Rust's
repr(C)
layout algorithmTODO
Validating and computing casts
TODO:
validate_cast_and_convert_metadata
Relationship to other traits
Ideally, we'd like
KnownLayout
to be a super-trait ofFromZeroes
(orTryFromBytes
(#5) onceFromZeroes: TryFromBytes
). Unfortunately,FromZeroes
supports types whichKnownLayout
cannot support: unsized types without a fixed representation (ie,repr(rust)
). These types do not provide sufficient guarantees about their layout in order to satisfyKnownLayout
's safety conditions. We have two options:FromZeroes
to only support types whichKnownLayout
can support, and then haveFromZeroes: KnownLayout
KnownLayout
as a bound on individual functions and methodsWe intend for zerocopy to be a general-purpose library which supports use cases that do not require known representations (any use case which doesn't require a type's layout to correspond to a fixed specification, but only to be consistent within the context of a program's execution). The first option would preclude us from supporting these use cases, so we opt for the latter: we will use
KnownLayout
as a bound on individual functions and methods.Deriving
KnownLayout
Computing alignment
Use
align_of_val_raw
Blocked on https://github.com/rust-lang/rust/issues/69835
TODO
Use a
#[repr(C)]
type and field offsetBlocked on either https://github.com/rust-lang/rust/issues/106655 or https://github.com/rust-lang/rust/issues/69835
This design is prototyped in https://github.com/google/zerocopy/pull/576:
This design relies on the
trailing_field_offset!
macro added in https://github.com/google/zerocopy/pull/540 and described below. This macro relies on stabilizing https://github.com/rust-lang/rust/issues/69835. Alternatively, if we can stabilizeoffset_of!
(https://github.com/rust-lang/rust/issues/106655) and add support for usingoffset_of!
with unsized types, then we can replacetrailing_field_offset!
here withoffset_of!
.Computing trailing field offset
The
DstLayout
type needs to know the offset of the trailing slice within a DST. For example, givenBar
:...the trailing
tail: [u8]
is at offset 4 (after 2 bytes forBar.u
and 2 bytes forFoo.u
).In order to compute trailing slice offset in the context of a custom derive, it's necessary to compute recursively. In particular, given the offset of the trailing field within the outermost type (in this case,
Bar.foo
) and the offset of the trailing slice within the inner type (in this case,Foo.tail
), it's possible to compute the offset of the trailing slice within the outer type as the sum of these two values. In all cases, this is a recursive computation that bottoms out at an actual slice type,[T]
, whose trailing slice offset is 0.Thus the challenge is, given the AST of an arbitrary, possibly dynamically-sized type, to produce code which computes the byte offset of the trailing field. We have a few options.
offset_of!
Blocked on https://github.com/rust-lang/rust/issues/106655
We could simply wait for the standard library's
offset_of!
macro to stabilize and add support for unsized types.addr_of!
Blocked on https://github.com/rust-lang/rust/issues/69835
Another option is to rely on the standard library's
addr_of!
macro. Given a raw pointer to the beginning of a type, the expressionaddr_of!((*ptr).trailing_field)
will compute the address of trailing field, and the raw pointer methodoffset_from
can be used to compute the byte offset between these two pointers, effectively computing the trailing field offset.Unfortunately, place projection the
addr_of!
macro performs a place projection, and even if the pointers are never dereferenced, place projections may only happen inside the bounds of a valid allocation. Per The Reference, the following is undefined behavior:Thus, in order to use
addr_of!
, we need a valid allocation which is large enough that the field projection will not go out-of-bounds (the allocation does not need to be properly aligned). It is fairly trivial to produce a large allocation, even at const time:This, however, isn't enough. We also need to bounds check our allocation to make sure that the field projection won't go out-of-bounds. While it's unlikely we'd ever encounter a type with a trailing field offset of 2^16 bytes, it's not impossible, and we can't exhibit undefined behavior if we encounter such a type.
In order to perform the bounds check, we need some way of obtaining an upper bound for the trailing field offset before we've actually computed it. The only way of doing this (to my knowledge) is to calculate the size of the smallest possible value of the type (ie, the value with 0 trailing slice elements). We can't construct such an instance in the general case, but we can construct a raw pointer to such an instance:
This relies on behavior which is currently not well-defined, but is in review as of this writing.
However, once we have this raw pointer, we need to know its size. The only way to do this is with the currently-unstable
size_of_val_raw
. Once we've computed the size of the smallest possible value for our type (ie,size_of_val_raw(t)
), we have an upper bound for the offset of the trailing field; so long as this value is not larger than the size of our allocation, the field projection is guaranteed to be in-bounds, allowing us to soundly compute the trailing field offset.Old text
Add a replacement for
MaybeUninit
which supports unsized types. Add support for conversions on custom dynamically sized-types (DSTs).Status
Phase 1 is in review in #312.
Motivation
This proposal aims to solve two problems using a single, unified design.
Unsized
MaybeUninit
The standard library's
MaybeUninit
type don't support wrapping unsized types.MaybeUninit
is a core building block of important designs likeTryFromBytes
. So long as it doesn't support unsized types, designs likeTryFromBytes
also can't support unsized types.Custom DSTs
While zerocopy supports conversions on slice types, it doesn't support conversions on custom dynamically-sized types (DSTs) like:
Currently, users such as packet-formats instead write code like:
The latter code is more cumbersome, and requires storing an extra pointer in order to refer to a UDP packet in memory.
The ability to support custom DSTs is a frequent request from our users.
Design
This design comes in two phases; the first phase can be implemented without the second phase, and will provide value on its own.
In Phase 1, support is added for a
MaybeUninit
-like type that supports wrapping bothT: Sized
and[T] where T: Sized
. This will unlock theTryFromBytes
design as described above. In the Phase 2, support is added for aKnownLayout
trait which provides a superset of the functionality from Phase 1, and supports zero-copy conversion of arbitrary custom DSTs. A custom derive is providedKnownLayout
.Phase 1:
MaybeUninit
The standard library's
MaybeUninit<T>
type has the same layout asT
, but it has no bit validity constraints - any byte value, including an uninitialized byte, may be written at any byte offset in aMaybeUninit<T>
. A replacement for this type just needs to have these semantics, and also needs to support unsized types.Our design builds upon the fact that
MaybeUninit
exists and works for sized types. We define the following trait:For
Sized
types, the implementation is trivial:For all other types, we use the standard library's
MaybeUninit
type as a building block to build up a type whose layout is the same as whatMaybeUninit
's would be if it supported unsized types:Finally, we add our own
MaybeUninit
type, which is simply a convenience wrapper:In Phase 1, these are the only supported unsized types. In Phase 2, we allow deriving
AsMaybeUninit
on arbitrary types, which adds support for custom DSTs.Safety invariants
The safety invariants on
AsMaybeUninit
are somewhat complex. This is mostly a result of needing to support unsized types. For a more in-depth explanation of why supporting unsized types introduces a lot of complexity, see here.Let's walk through these:
T::MaybeUninit
containing any bit pattern, including uninitialized bytes."T
andT::MaybeUninit
's alignment in prose rather than by referring tocore::mem::align_of
because that function requires that its type argument is sized.as
cast requirement ensures thatT
andT::MaybeUninit
are either both sized or have compatible unsized types. In particular, Rust prohibits sized-to-unsized casts; requiring both directions to be valid ensures that casts are sized-to-sized or unsized-to-unsized. The size equality constraint ensures that:T
and aT::MaybeUninit
whose trailing slices are of the same length have the same sizeUnsafeCell
. In other words, interior mutability soundness is a runtime property that is only violated when anUnsafeCell
is used, not merely when a reference to it is constructed. Unfortunately, this is not actually currently guaranteed, and is unsound under the stacked borrows memory model. Thus, we need to ensure thatUnsafeCell
s inT
andT::MaybeUninit
line up perfectly.Pointer conversion
We want to be able to provide unsized equivalents of the
assume_init_ref
andassume_init_mut
methods. However, the naive implementation doesn't work:The
*const T::MaybeUninit as *mut T
cast isn't valid in a generic context whereT: ?Sized
because Rust doesn't know what type of pointers these are (thin, fat, what kind of fat pointer, etc). In order to get around this problem, we add the following methods toAsMaybeUninit
:This allows us to get
assume_init_ref
andassume_init_mut
to work:Phase 2:
KnownLayout
TODO
TODO: Mention that this will require removing the blanket impl of
AsMaybeUninit
forT: Sized
, which will be a breaking change. We need to not publish in between Phases 1 and 2.Alternatives
MaybeUninit
, we could propose a change to Rust to add limited support for unsized unions, which would allowMaybeUninit
(which is a union type) to support unsized types. Even if such a proposal were accepted, it would likely take months or years to be stabilized and thus available for our use.For custom DSTs, we could wait until the
ptr_metadata
andlayout_for_ptr
features stabilize, which would allow us to rewriteRef
like this:For
Sized
types,T::Metadata
would be()
, and soRef
would consume only a single word (rather than two, as it does today).On its own, we could still support slice types (
T = [U]
) as we do today usingfrom_raw_parts
andfrom_raw_parts_mut
. However, usinglayout_for_ptr
, we could also determine a DST's prefix and element sizes and thus support arbitrary DSTs rather than just slices.This design would work, but it's unlikely that either of these features will stabilize soon, and we need something in the meantime. The design presented also unblocks unsized
MaybeUninit
, which these features wouldn't help with.