mystor / rust-cpp

Embed C++ directly inside your rust code!
Apache License 2.0
807 stars 44 forks source link

Add option to cpp_class! macro for std::marker::PhantomPinned #72

Open benkay86 opened 4 years ago

benkay86 commented 4 years ago

C++ allows the use of self-referential structures by convention; Rust does not. Using cpp_class! to wrap a class that is not trivially relocatable is unsafe because Rust might move the object and break internal references. Unfortunately this precludes the use of cpp_class! for most of the standard library.

Rust now supports the concept of pinning, or preventing an object from being moved/relocated in memory. With the pinning API it is now possible to do:

#![feature(optin_builtin_traits)]
cpp_class!(pub unsafe struct CppString as "std::string");
impl !Unpin for CppString {} // requires optin_builtin_traits
impl CppString {
    pub fn new_box() -> std::pin::Pin<std::boxed::Box<CppString>> {
        // allocate memory
        let mut memory = std::boxed::Box::pin(std::mem::MaybeUninit::<CppString>::uninit());

        // get address of allocated memory
        let address : *mut CppString =  unsafe { memory.as_mut().get_unchecked_mut() }.as_mut_ptr();

        // use placement new to construct a new string at the address
        cpp!(unsafe [address as "void*"] {
            new (address) std::string(cstring_ptr);
        }

        // unwrap the MaybeUninit
        unsafe { std::mem::transmute::<std::pin::Pin<std::boxed::Box<std::mem::MaybeUninit::<CppString>>>, std::pin::Pin<std::boxed::Box<CppString>>>(memory) }
    }
}

The only problem is Rust does not know that struct CppString is not relocatable and will happily move it out of the Pin<>. We must tell Rust that CppString is not Unpin, which is achieved in the example above using the nightly feature impl !Unpin for CppString {}. Unfortunately this means the above code doesn't compile with stable Rust.

It is possible to achieve the same effect with stable Rust using the marker type PhantonPinned, which implements !Unpin and thereby makes any structure containing it also !Unpin. This would conceptually be used to make a structure MyStruct implement !Unpin like this. Of note, the presence of phantom types does not increase the size of the structure, affect is alignment, or have any performance drawbacks.

struct MyStruct {
    my_data: i32, // or whatever type we want
    _phantom_pinned: std::marker::PhantomPinned
}

It would seem that to do this for C++ classes would require modification of the cpp_class! macro. I confess I have yet to learn much macro programming and had some trouble following all the macro_rules! and downstream parsing functions for the cpp_class! macro, so I do apologize that this is not a pull request. Some ways this could possibly look:

// Option 1
cpp_class!(pub unsafe !Unpin struct CppString as "std::string");

// Option 2
struct CppString {
    cpp_class_data!(pub unsafe struct CppString as "std::string"),
    _phantom_pinned: std::marker::PhantomPinned
}

// Option 3: Automatically detect if C++ class is relocatable?
ogoffart commented 4 years ago

Thanks for the bug report. Pinning the struct would not really help as it removes the benefit of using cpp_class at all, that is, generating the impl for Clone, Default, Drop, Copy, ... (Ok, Drop would still work)

I would recommend to do something like:

cpp_class!(pub unsafe struct CppString as "std::unique_ptr<std::string>");

That is: wrap your string class within a unique_ptr. in C++ rather than with a Box in rust.

(unique_ptr cannot be copied. Maybe it would make sense to have a C++ box where copying the box copy the content)

// Option 3: Automatically detect if C++ class is relocatable?

Yeah, I whish this was possible. Maybe in C++23 or later? (https://wg21.link/P1144 / https://wg21.link/P1029 )

benkay86 commented 4 years ago

In my view the greatest "benefit" of cpp_class! is generating a chunk of memory in Rust with the same size and alignment as a C++ object. It's true, cpp_class! does also generate the impl for Drop, which is very useful, in addition to some other traits, which are useful much of the time but perhaps should not be implemented automatically all of the time. As you correctly point out, automatically generating the Default trait for the CppString rather than for Pin<Box<CppString> allows the user to unsafely instantiate a CppString outside of a Pin using CppString::default().

I wonder if you've every thought about separating out these features into one macro that makes the object/memory and another macro that implements some traits?

I agree on some level it would be easier, and perhaps even more ergonomic, to expose C++ classes to Rust indirectly through C++ smart pointers (although some of them are not relocatable either) or through custom wrappers written in Rust itself. However, the idea of cpp_class! appeals to me precisely because of its direct style of integration: the resultant struct is the C++ object.

ogoffart commented 4 years ago

the idea of cpp_class! appeals to me precisely because of its direct style of integration: the resultant struct is the C++ object.

Yes, that's right. maybe we could have something like

cpp_class_boxed!(pub unsafe struct CppString as "std::string"))

It would add the PhantomPinned, and implement stuff like:

impl CppString {
    pub fn new_box() -> Pin<Box<Self> {...}
}
impl Default for Pin<Box<CppString>> { ... }
impl Clone for Pin<Box<CppString>> { ... }
benkay86 commented 4 years ago

I like the idea, but would maybe have the macro do even less. In my initial comment I gave an example of CppString::new_box() that default-initializes a CppString into Pin<Box<CppString>>. More generally, we might want to put the object in all kinds of heap-allocated containers like Rc, Arc, etc. What I have been doing in my actual code is defining an unsafe CppString::new_at() function that handles the logic of construction (e.g. copying a rust &str into C++ in a non-trivial implementation) and then calls C++ placement new. Then I write safe wrappers CppString::new_box(), CppString::new_rc(), etc that simply forward their arguments to CppString::new_at() and then pin the resultant object into the desired container. For example:

#![feature(optin_builtin_traits)]
cpp_class!(pub unsafe struct CppString as "std::string");
impl !Unpin for CppString {} // make !Unpin with unstable feature

impl CppString {
    // Construct a new instance of CppString at a given memory address.
    // This method is unsafe because it assumes but does not guarantee that the address is pinned (i.e. will not move).
    // Ideally use one of the safe helped methdos `like new_box()` instead.
    pub unsafe fn new_at(address: *mut CppString, string: &str) {
        // Temporarily convert the supplied string into a const char* pointer we can pass to the constructor of std::string.
        // The string value behind the pointer will be copied into the std::string, therefore the pointer need not outlive this function.
        // Unfortunately this requires a double copy, once into cstring and once more into std::string.
        let cstring = std::ffi::CString::new(string).unwrap();
        let cstring_ptr : *const std::os::raw::c_char = cstring.as_ptr();

        // Pass raw pointer at which we are going to construct the string and the cstring_ptr into the constructor for std::string.
        // The constructor will copy (i.e. clone) `cstring`.
        cpp!(unsafe [address as "void*", cstring_ptr as "const char*"] {
            // Use placement new to construct a new std::string in the memory allocated by Rust.
            new (address) std::string(cstring_ptr);
        });
    }

    pub fn new_box(string: &str) -> std::pin::Pin<std::boxed::Box<CppString>> {
        // Allocate memory to contain the std::string on the heap.
        // The memory is pinned in place because C++ does not expect it to move.
        let mut memory = std::boxed::Box::pin(std::mem::MaybeUninit::<CppString>::uninit());
        let address : *mut CppString =  unsafe { memory.as_mut().get_unchecked_mut() }.as_mut_ptr();

        // Call `new_at()` to construct a CppString in the memory.
        // This is safe because:
        // 1. The memory is allocated to the size/alignment of a CppString.
        // 2. The memory is pinned.
        unsafe { Self::new_at(address, string); }

        // The memory was initialized by the std::string constructor.
        // Now we can unwrap the MaybeUninit and return the pinned memory.
        unsafe { std::mem::transmute::<std::pin::Pin<std::boxed::Box<std::mem::MaybeUninit::<CppString>>>, std::pin::Pin<std::boxed::Box<CppString>>>(memory) }
    }
}

Unfortunately, due to the lack of support for higher-kinded types in Rust, and the profundity of Rust standard and non-standard heap-allocated containers, there is no easy way to automate generate these wrappers. Rather than create a Swiss Army knife macro that tries to do everything, would it be easier to have a:

cpp_class_nounpin!(pub unsafe struct CppString as "std::string")

That just generates code like this:

struct CppString {
    #[repr(C)]
    _opaque: [type, size], // alignment and size of std::string
    _pinned: std::marker::PhantomPinned
}

And maybe also implements Drop (since I can't think of any examples right now where it would cause problems). The binding author would then have to manually implement unsafe new_at(), new_[container](), Default, and Clone, which would be a little bit tedious but also afford complete control over how the memory is used.

If an idiomatic usage pattern emerges then you could always write a macro that builds on this functionality to do more. It's much harder to write a macro that does less.