rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.5k stars 12.74k forks source link

FFI and union #5492

Closed sanxiyn closed 8 years ago

sanxiyn commented 11 years ago

How would one call C functions involving union with Rust FFI?

SpiderMonkey's jsval is one example.

thestinger commented 11 years ago

There could be unsafe enum with the layout defined to be the same as C for interoperability. The only other way to deal with it would be finding the alignof and sizeof of the union in C for each platform and then translating that to Rust.

sanxiyn commented 11 years ago

Referencing Aatch/rust-xcb#2.

yichoi commented 11 years ago

referencing https://github.com/mozilla/servo/pull/398

referencing https://github.com/mozilla-servo/rust-mozjs/pull/9

Aatch commented 11 years ago

The unsafe enum idea appeals to me, since I thought about it as an option when trying to solve the union issue in rust-xcb, but decided that relying on the representation of enums was too "hacky" and fragile.

pnkfelix commented 11 years ago

brson mentions in the description for #6346 that a "macro based solution" would be appropriate here, though I do not current know what that would entail. (It sounds to me like a potential alternative to the changes to the grammar to add unsafe enum that have been discussed here.)

pnkfelix commented 11 years ago

Nominating for milestone 3, feature complete.

emberian commented 11 years ago

I don't think a "macro-based solution" would be appropriate, as you need to restrict the valid range of values at the site of usage, which macros cannot do.

graydon commented 11 years ago

An attribute on an enum that makes it have no discriminant and makes any match on the variant-part succeed, should be sufficient. Not pretty but neither are C union semantics.

graydon commented 11 years ago

accepted for feature-complete milestone

Skrylar commented 10 years ago

I ran in to this problem recently as well; Allegro makes use of Unions for passing events around in C, which turns out to be a pain to deal with in Rust.

pnkfelix commented 10 years ago

We do want to solve this problem eventually, but it need not block 1.0. Assigning P-low.

alxkolm commented 10 years ago

What status?

alexchandel commented 9 years ago

What's the recommended way to do FFI-compatible unions?

jdm commented 9 years ago

I believe structs containing a field which is at least as big as the largest type the union can represent and manual transmutes is the state of the art right now.

mzabaluev commented 9 years ago

I believe structs containing a field which is at least as big as the largest type the union can represent and manual transmutes is the state of the art right now.

Make sure you get the alignments right. The struct should have #[repr(C)] and the field posing as the union (or the inner type, in case the newtype struct emulates the union itself) has the alignment of the most-aligned variant.

alexchandel commented 9 years ago

@jdm Even when variants are different sizes? transmute errors when T and U have different sizes, and transmute_copy is just as dangerous since it copies sizeof(U) bytes, triggering "undefined behavior".

mzabaluev commented 9 years ago

Also, the overall size of the union is a multiple of the alignment of its most-aligned variant. This union has the size of 8:

union A {
    int32_t intval;
    char chars[5];
};

Which would require a Rust representation like:

#[repr(C)]
struct A {
    union_data: [i32; 2]
}

So yes, representing unions is not for the unwary.

alexchandel commented 9 years ago

@mzabaluev For a C union like this:

struct INPUT {
  DWORD type;
  union {
    MOUSEINPUT    mi;
    KEYBDINPUT    ki;
    HARDWAREINPUT hi;
  };
};

I use a struct field rather bytes. It's easier because the size and alignment change between platforms, and you can't do [u8; size_of::<MOUSEINPUT>()]

#[repr(C)]
pub struct MOUSEINPUT { ... }
#[repr(C)]
pub struct KEYBDINPUT { ... }
#[repr(C)]
pub struct HARDWAREINPUT { ... }

#[repr(C)]
pub struct INPUT {
    pub tag_: DWORD,
    pub union_: MOUSEINPUT, // MOUSEINPUT largest and most aligned
}
mzabaluev commented 9 years ago

@alexchandel Good when it works, but sometimes the largest variant is not the most aligned, like in my example above.

niconiconico9 commented 9 years ago

Is there a reason why this bug is tagged as "P-low"? The alternatives that are proposed and I guess currently used entails that a great care is taken for handling alignment properly. The last example on how this can be fixed without any language addition, is a perfect example how the language is promoting to write code that is incorrent because it don't provide a proper solution

ghost commented 9 years ago

I don't know how feasible it would be to implement, but an example usage could be:

#[repr(union)]
pub struct XEvent {
  pub type_: c_int,
  pub xany: XAnyEvent,
  // ...
  pub pad: [c_long; 24],
}

Like C unions, each field would start at the beginning of the struct, and the size of the struct would be that of its longest field. This wouldn't require adding union as a language keyword. The only limitation I can think of would be that accessing a field in the union would require unsafe, which is already used often when interfacing with C libraries.

A macro based solution could look something like:

union! {
  pub union XEvent {
    pub type_: c_int,
    pub xany: XAnyEvent,
    // ...
    pub pad: [c_long; 24],
  }
}

// functions generated by macro:
impl XEvent {
  pub unsafe fn type_<'a> (&'a self) -> &'a c_int { ::std::mem::transmute(self) }
  pub unsafe fn type__mut<'a> (&'a mut self) -> &'a mut c_int { ::std::mem::transmute(self) }
  pub unsafe fn xany<'a> (&'a self) -> &'a XAnyEvent { ::std::mem::transmute(self) }
  pub unsafe fn xany_mut<'a> (&'a mut self) -> &'a mut XAnyEvent { ::std::mem::transmute(self) }
  // ...
  pub unsafe fn pad<'a> (&'a self) -> &'a [c_long; 24] { ::std::mem::transmute(self) }
  pub unsafe fn pad_mut<'a> (&'a mut self) -> &'a mut [c_long; 24] { ::std::mem::transmute(self) }
}

The only thing that prevented me from writing this macro is the inability to determine the size of the union at compile time. The best workaround I could come up with is providing a guess of the size of the largest field and making the union generate tests to verify this.

union! {
  pub union XEvent : [c_long; 24] {
    pub type_: c_int,
    pub xany: XAnyEvent,
    // ...
    pub pad: [c_long; 24],
  }
}

// test generated by macro:
#[test]
fn test_union_size_XEvent () {
  use std::cmp::max;
  use std::mem::size_of;
  let sizes = [
    size_of::<c_int>(),
    size_of::<XAnyEvent>(),
    // ...
    size_of::<[c_long; 24]>(),
  ];
  assert!(sizes.iter().fold(0, |a, b| max(a, *b)) == size_of::<[c_long; 24]>());
}

Of course, it would be much easier on developers of language bindings to have unions available as a language feature.

retep998 commented 9 years ago

winapi would benefit massively from unions as part of the core language. I currently use a macro to make do, but its just not the same.

joshtriplett commented 9 years ago

I'm interested in unions as well, for several Linux kernel APIs. The proposal of having an "unsafe union", guaranteed to match the C layout, would work perfectly; almost any non-trivial instance of such a C union only makes sense to access in an unsafe block, given its trivial equivalence to the unsafe std::mem::transmute.

serprex commented 9 years ago

Most unions in C have a descriptor field, therefore there's a need for 2 cases (has-desciptor & has-no-descriptor). Being able to specify a struct-unique enum with custom type descriptor & the fields corresponding values would allow Rust to use the union in a type safe manner while being able to interoperate with C APIs

Essentially something like

#[enum_explicit_descriptor(t)]
#[enum_explicit_values = "I: 0, N: 1"]
unsafe struct TValue{
  t: u8,
  val: unsafe enum IntOrFloat{
    I(i32),
    N(f32),
  },
}

Using unsafe struct to handle cases where the type descriptor isn't adjacent to the union. Even then, something could be done like

#[enum_explicit_descriptor_type(u8)]
#[enum_explicit_descriptor_typeoffset(-1)] // This could be behind-the-struct by default
#[enum_explicit_values = "I: 0, N: 1"]
enum IntOrFloat{
  I(i32),
  N(f32),
}

Then there'd need to be compile-time machinery that makes sure there's a valid u8 behind the enum in definitions, though user code would access a struct TValue{ t:u8, val: IntOrFloat }

The issue of having typeoffset could be resolved by requiring explicit enums only be contained in structs & have enum_explicit_layout_typeoffset be specified by the struct. Would require a bit more strictness though since one wouldn't be able to know how to find the descriptor of an &IntOrFloat parameter

mzabaluev commented 9 years ago

@serprex: I don't think it's worthwhile to add language support for external descriptors of unions, even in cases where there is a 1:1 match between a single descriptor field value and a union variant. The code using unions is expected to be close to FFI, where unsafe is the norm; so variant matching can be always unsafe, and the burden of ensuring the correct variant would be completely on the programmer, as it is in C.

joshtriplett commented 9 years ago

@mzabaluev I agree. For a first pass, at least, we just need an unsafe construct to access fields of a C union in a C-compatible, interoperable way. We can always produce a safe wrapper around that, and even produce macros to generate such wrappers for common cases.

joshtriplett commented 8 years ago

I posted a preliminary proposal using #[repr(C,union)] struct { ... } (requiring unsafe blocks for field accesses, assignments, or initializations) to https://internals.rust-lang.org/t/pre-rfc-unsafe-enums/2873/23.

huonw commented 8 years ago

Closing in favour of https://github.com/rust-lang/rfcs/issues/877.