Safer constructor writing for self-referential types

Currently to write a constructor/move-constructor/copy-constructor, you have to deal with a lot of unsafe code and boilerplate steps. Each constructor has to do:

un-pin a MaybeUninitialized reference
create the being-constructed type T
initialize the MaybeUninitialized reference with the object from (2)
get a pointer, and then a reference, to the MaybeUninitialized memory
use the reference from (4) to set up self-referential pointers in the object from (2)

The only steps that the type T actually cares about and wants to define are (2) and (5), but today it has to write the whole algorithm in each constructor/move-constructor/copy-constructor. Like so:

fn myctor(dest: Pin<&mut MaybeUninit<<T as SafeCtor>::Output>>) {
  unsafe {
    let maybe_uninit_ref = Pin::into_inner_unchecked(dest);  // 1
    let (out, init) = T { ... };  // 2 (using the copy- or moved-from object in copy- or move-constructors)
    *maybe_uninit_ref = MaybeUninit::new(out);  // 3
    let t_ref = &mut *maybe_uninit_ref.as_mut_ptr();  // 4
    setup_self_references(t_ref, init); // 5
  }
}

I have implemented a set of "SafeCtor" safe traits that a type can implement in order to have minimal unsafe code in their implementations (even none if they do not require moving/offsetting pointers).

These traits expose the step (2) above as construct() -> (Self, Data) or copy_construct(from: &Self) -> (Self, Data) or move_construct(from: Pin<&mut Self>) -> (Self, Data). These trait functions provide the "constructed-from" object as a reference that can be directly used from safe rust. And they output Data that can be collected from the "constructed-from" object and represented in a useful way for initialization of self-references.

And they expose step (5) above as an initialize(this: &mut Self, d: Data) that consumes the constructor's side-channel Data to set up any self-referential state in the constructed object.

The other steps (1), (3) and (4) are performed in unsafe implementations in the unsafe CopyCtor and unsafe MoveCtor traits, or in a ctor::construct() function that replaces the direct use of ctor::from_placement_fn() from the being-constructed type.

@mcy has pointed out initial self-referential object construction could be done by exposing this 2-stage construction paradigm (i.e. steps (2) and (5)) through the constructing function such as ctor::new_with<T>(init: T, setup: FnOnce(Pin<&mut T>)), instead of using traits and requiring T : SafeCtor<Output = T>.

It presents a nice advantage: It would force the caller to construct the type T, which is reasonable for initial construction, and it even allows calling different constructors.

But it has a challenge as well, as copy- and move-constructors can not be implemented in the same way:

The copy and move requires boilerplate to execute before constructing the object T, such that a reference to the copied-from or moved-from object can be given to it. So the caller can not just construct a T themselves.
A data channel between the copied-from/moved-from object and the self-referential initialization step is required, which is provided by the Data passed between the traits' copy/move constructors and initialize.

Here is an example of a self-referential type that points into a buffer. When moving, the move-constructed type must be able to set up a new self-reference at the same offset (or in this case, the next offset).

use moveit::*;
use std::marker::PhantomPinned;
use std::pin::Pin;
use std::ptr::null_mut;

struct Cycle {
    num: [i32; 10],
    ptr: *mut i32,
    _pin: PhantomPinned,
}
impl Cycle {}

struct CycleInit {
    offset: usize,
}

impl ctor::SafeCtor for Cycle {
    type Output = Self;
    type InitializeData = CycleInit;
    fn construct() -> (Self, CycleInit) {
        (
            Cycle {
                num: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
                ptr: null_mut(), // Set by initialize().
                _pin: PhantomPinned,
            },
            CycleInit { offset: 0 },
        )
    }
    fn initialize(this: &mut Self, init: CycleInit) {
        this.ptr = unsafe { (&mut this.num).as_mut_ptr().add(init.offset) }
    }
}
impl ctor::SafeMoveCtor for Cycle {
    fn move_construct(other: Pin<&mut Self>) -> (Self, CycleInit) {
        (
            Cycle {
                num: other.num,
                ptr: null_mut(), // Set by initialize().
                _pin: PhantomPinned,
            },
            CycleInit {
                // Move-constructing will move the `ptr` to the right one
                // so we can observe it.
                offset: 1 + unsafe { other.ptr.offset_from(&other.num[0]) as usize },
            },
        )
    }
}

The only unsafe code here is that which deals with the pointer into num. The same self-referential initialization is shared between each of construction, copy-construction, and move-construction. It's not obvious if forcing that behaviour to be shared is undesirable.

Here are the traits, which would appear in ctor.rs, as they provide generic implementations of CopyCtor and MoveCtor for any type implementing SafeCopyCtor or SafeMoveCtor respectively. The SafeCtor trait provides the self-referential setup step (2nd stage of construction) used in all construction paths, but only provides a single type-construction function without any parameters. A way to provide multiple constructors is needed.

pub trait SafeCtor {
  type Output;
  type InitializeData;
  fn construct() -> (Self::Output, Self::InitializeData);
  fn initialize(o: &mut Self::Output, init: Self::InitializeData);
}
pub trait SafeCopyCtor: CopyCtor + SafeCtor {
  fn copy_construct(other: &Self::Output) -> (Self::Output, Self::InitializeData);
}
pub trait SafeMoveCtor: MoveCtor + SafeCtor {
  fn move_construct(other: Pin<&mut Self::Output>) -> (Self::Output, Self::InitializeData);
}

unsafe impl<T: SafeCopyCtor + SafeCtor<Output = T>> CopyCtor for T {
  unsafe fn copy_ctor(src: &<T as SafeCtor>::Output, dest: Pin<&mut MaybeUninit<<T as SafeCtor>::Output>>) {
    let dest_uninit = Pin::into_inner_unchecked(dest);
    let (out, init) = <T as SafeCopyCtor>::copy_construct(src);
    *dest_uninit = MaybeUninit::new(out);
    <T as SafeCtor>::initialize(&mut *dest_uninit.as_mut_ptr(), init);
  }
}

unsafe impl<T: SafeMoveCtor + SafeCtor<Output = T>> MoveCtor for T {
unsafe fn move_ctor(
  mut src: Pin<MoveRef<<T as SafeCtor>::Output>>,
  dest: Pin<&mut MaybeUninit<<T as SafeCtor>::Output>>,
) {
  let dest_uninit = Pin::into_inner_unchecked(dest);
  let (out, init) = <T as SafeMoveCtor>::move_construct(src.as_mut());
  *dest_uninit = MaybeUninit::new(out);
  <T as SafeCtor>::initialize(&mut *dest_uninit.as_mut_ptr(), init);
}
}

pub fn construct<T: SafeCtor<Output = T>>() -> impl Ctor<Output = T> {
  unsafe {
      from_placement_fn(|dest| {
          let dest_uninit = Pin::into_inner_unchecked(dest);
          let (out, init) = <T as SafeCtor>::construct();
          *dest_uninit = MaybeUninit::new(out);
          <T as SafeCtor>::initialize(&mut *dest_uninit.as_mut_ptr(), init);
      })
  }
}

Using the safe traits looks much like using the existing ones, except that T::new() (or similar) is not called directly.

fn main() {
    moveit!(let x = new ctor::construct::<Cycle>());
    moveit!(let y = new ctor::mov(x));
    println!("{}", unsafe { *y.ptr });
}

Perhaps SafeCtor could be renamed to SelfReferenceSetup, the contruct() trait functions removed, and the ctor::construct() function modified to receive a T by value instead. That would handle both multiple construction paths, while still sharing the setup of self-references.

google / moveit

Safer constructor writing for self-referential types #14