rust-lang / rust-bindgen

Automatically generates Rust FFI bindings to C (and some C++) libraries.
https://rust-lang.github.io/rust-bindgen/
BSD 3-Clause "New" or "Revised" License
4.23k stars 679 forks source link

feature request: Enable deriving `Clone`, `Copy`, etc... for `__IncompleteArrayField` and containing structs. #2832

Open Easyoakland opened 1 month ago

Easyoakland commented 1 month ago

Preface: Bindgen is a very useful tool. Thank you for releasing it!

Background

As indicated in #1431 and #2791 __IncompleteArrayField purposefully doesn't implement Clone or Copy and structs containing it don't either. #1093 describes Eq and Hash.

The reasoning seems to be that a struct H which contains the __IncompleteArrayField implicitly refers to a larger collection and copying the struct would break this. For the other traits the argument is that the default doesn't make sense for the struct.

As indicated in #1892 the stable usage in the book violates stacked borrows.

The book also mentions that on nightly the fam containing structs can be emulated with a DST which duplicates the size (once in the original c definition and once in the fat pointer itself) which is an unnecessary inefficiency.

The fambox crate I wrote allows using these c structs on stable without nightly DST features in a way that doesn't violate stacked borrows (and therefore passes miri). It is implemented as a thin pointer without duplicating the struct's contained length. The main idea is that a type containing a flexible array member implements https://docs.rs/fambox/0.1.1/fambox/trait.FamHeader.html and is treated as a sized type with the necessary metadata to describe the buffer stored immediately after. This seems better than both advocated solutions in the book.

Under this conceptualization the struct H doesn't have an unknown size that must be carefully managed, and it is makes perfect sense to Clone, Copy, cmp, hash, etc... the Sized header. In fact, FamBox<H> can only implement Clone if H implements Clone (and same for the other traits).

It's not a big deal that __IncompleteArrayField doesn't implement these because one can write

    include!(concat!(env!("OUT_DIR"), "/bindings.rs"));
    impl<T: std::marker::Copy> Copy for __IncompleteArrayField<T> {}
//    etc..

though it is annoying.

The larger problem is that H won't derive Copy even if __IncompleteArrayField does (since that won't be known at bindgen's build time) and so you wind up with something like this

struct DeriveImpls;
impl DeriveImpls {
    const COPY_CLONE_BLACKLIST: &'static [&'static str] = &["struct_that_already_implements_Clone", "other_struct_that_already_implements_Copy", ...];
    const PARTIAL_EQ_BLACKLIST: &'static [&'static str] = &["struct_already_implements_PartialEq", ...]
    // etc...
}
impl ParseCallbacks for DeriveImpls {
    fn add_derives(&self, _info: &DeriveInfo<'_>) -> Vec<String> {
        let mut out = vec![];
        out.extend_from_slice(&["bytemuck::Pod".to_owned(), "bytemuck::Zeroable".to_owned()]);
        if !Self::COPY_CLONE_BLACKLIST.contains(&_info.name) {
            out.extend_from_slice(&["Copy".to_owned()]);
            out.extend_from_slice(&["Clone".to_owned()]);
        }
        if !Self::PARTIAL_EQ_BLACKLIST.contains(&_info.name) {
            out.extend_from_slice(&["PartialEq".to_owned()]);
        }
        // etc...
        out
    }
}

which is very annoying and tedious.

Proposal

There seems like two reasonable options to enable this feature. Both seem reasonable but the second would also help with Serialize and other custom derives so would probably be more important.

  1. A method on the builder which opts into derives on __IncompleteArrayField. By enabling this the user acknowledges that they won't be using the translated structs as DST but rather as headers.
  2. Pass __IncompleteAraryField to the add_derives callback, and pass the already present derives to the add_derives callback so #[derive(Clone, Clone)] can be avoided with a _info.derives.contains("Clone) check. This would also enable bindgen detecting that __IncompleteArrayField implements Clone while building since it would be added during the build.
emilio commented 1 month ago

As indicated in https://github.com/rust-lang/rust-bindgen/issues/1892 the stable usage in the book violates stacked borrows.

This kind of pattern is tracked in https://github.com/rust-lang/unsafe-code-guidelines/issues/256. I guess the nice thing is that we don't have interior mutability in bindgen-generated stuff, so right now it kinda works?

The larger problem is that H won't derive Copy even if __IncompleteArrayField does (since that won't be known at bindgen's build time) and so you wind up with something like this

I guess you'd need to use something like clone_into or ptr::write or something to get the "Clone" into the right allocation? Implementing a safe Clone for Foo would be totally unsound, afaict... Given that you're effectively relying on memmoving the header into the allocation anyways, why can't you just do that (memcpy-ing it)? Exposing clone() and such is kinda unsound by default. for _ in foo.clone().fam_member.as_slice() { -> boom...

Easyoakland commented 1 month ago

As indicated in #1892 the stable usage in the book violates stacked borrows.

This kind of pattern is tracked in rust-lang/unsafe-code-guidelines#256. I guess the nice thing is that we don't have interior mutability in bindgen-generated stuff, so right now it kinda works?

I don't see where interior mutability becomes a problem. The test here seems to be fine?

I did see that tracking issue. There are several implementation examples that are similar in the thread to the fambox implementation. The workaround described in the issue is fine. Use a pointer:

pub struct FamBox<H: FamHeader, O: Owner> {
    // Pointer to backing buffer including fam.
    ptr: NonNull<u8>,
    // Type markers
    ty: PhantomData<(H, O)>,
}

The larger problem is that H won't derive Copy even if __IncompleteArrayField does (since that won't be known at bindgen's build time) and so you wind up with something like this

I guess you'd need to use something like clone_into or ptr::write or something to get the "Clone" into the right allocation? Implementing a safe Clone for Foo would be totally unsound, afaict... Given that you're effectively relying on memmoving the header into the allocation anyways, why can't you just do that (memcpy-ing it)? Exposing clone() and such is kinda unsound by default. for _ in foo.clone().fam_member.as_slice() { -> boom...

You seem to keep implying that the struct is only valid while allocated next to its fam members. What I'm trying to say is under another conceptualization, the struct is valid as long as it describes a fam even if the fam doesn't exist. To actually access the fam with safe code requires already having provided it into the allocation.

As an example from the README.md:

First you create the header and buffers which do not reside in their final location (or use the builder):

let data_buffer = [0, 1, 2, 3, 4, 5];
let header = encapsulated_struct::protocol_msg::new(1, 2, data_buffer.len())?;

At this point the header is not followed by the data it will eventually reside with. However, it can also not safely access the fam's data in this form so it doesn't matter.

then you allocate the struct in a usable form

let header_and_fam = FamBoxOwned::from_fn(header, |i| data_buffer[i]);
let header = encapsulated_struct::protocol_msg::new(1, 2, data_buffer.len())?;
assert_eq!(header_and_fam.header(), &header);
assert_eq!(header_and_fam.fam(), data_buffer); // <- Notice that this is now safe for users of the data-structure.
assert_eq!(
    usize::from(*header_and_fam.header().length()),
    size_of::<protocol_msg>() + core::mem::size_of_val(&data_buffer)
);

A clone of the header would be a valid header not attached to any data.

let new_header = header_and_fam.header().clone()

but only the header which precedes the fam's data is able to access it:

let fam = header_and_fam.fam(); // <- safe and valid
let fam = new_header.fam() // <- no such method

Yes, you can do

// Safety: Not safe.
let fam = unsafe { new_header.elements.as_slice(1) }

But you violated memory safety with unsafe which is not an issue.

Hopefully this makes it clear why H: Clone, H::Element: Clone is needed for FamBox<H, Owned>: Clone and memcpy would not be valid? You're making clones, not moving the structs with memcpy.

/** [`Owned`] impls */
impl<H: FamHeader + Clone> Clone for FamBox<H, Owned>
where
    H::Element: Clone,
{
    /// The buffer, including the header and fam, is copied into a new allocation.
    #[inline]
    fn clone(&self) -> Self {
        let (header, fam) = self.as_parts();
        FamBox::from_fn(header.clone(), |i| fam[i].clone())
    }
}