Closed sanxiyn closed 8 years ago
There could be unsafe enum
with the layout defined to be the same as C for interoperability. The only other way to deal with it would be finding the alignof
and sizeof
of the union in C for each platform and then translating that to Rust.
Referencing Aatch/rust-xcb#2.
referencing https://github.com/mozilla/servo/pull/398
referencing https://github.com/mozilla-servo/rust-mozjs/pull/9
The unsafe enum
idea appeals to me, since I thought about it as an option when trying to solve the union issue in rust-xcb, but decided that relying on the representation of enums was too "hacky" and fragile.
brson mentions in the description for #6346 that a "macro based solution" would be appropriate here, though I do not current know what that would entail. (It sounds to me like a potential alternative to the changes to the grammar to add unsafe enum
that have been discussed here.)
Nominating for milestone 3, feature complete.
I don't think a "macro-based solution" would be appropriate, as you need to restrict the valid range of values at the site of usage, which macros cannot do.
An attribute on an enum that makes it have no discriminant and makes any match on the variant-part succeed, should be sufficient. Not pretty but neither are C union semantics.
accepted for feature-complete milestone
I ran in to this problem recently as well; Allegro makes use of Unions for passing events around in C, which turns out to be a pain to deal with in Rust.
We do want to solve this problem eventually, but it need not block 1.0. Assigning P-low.
What status?
What's the recommended way to do FFI-compatible unions?
I believe structs containing a field which is at least as big as the largest type the union can represent and manual transmutes is the state of the art right now.
I believe structs containing a field which is at least as big as the largest type the union can represent and manual transmutes is the state of the art right now.
Make sure you get the alignments right. The struct should have #[repr(C)]
and the field posing as the union (or the inner type, in case the newtype struct emulates the union itself) has the alignment of the most-aligned variant.
@jdm Even when variants are different sizes? transmute
errors when T and U have different sizes, and transmute_copy
is just as dangerous since it copies sizeof(U)
bytes, triggering "undefined behavior".
Also, the overall size of the union is a multiple of the alignment of its most-aligned variant. This union has the size of 8:
union A {
int32_t intval;
char chars[5];
};
Which would require a Rust representation like:
#[repr(C)]
struct A {
union_data: [i32; 2]
}
So yes, representing unions is not for the unwary.
@mzabaluev For a C union like this:
struct INPUT {
DWORD type;
union {
MOUSEINPUT mi;
KEYBDINPUT ki;
HARDWAREINPUT hi;
};
};
I use a struct field rather bytes. It's easier because the size and alignment change between platforms, and you can't do [u8; size_of::<MOUSEINPUT>()]
#[repr(C)]
pub struct MOUSEINPUT { ... }
#[repr(C)]
pub struct KEYBDINPUT { ... }
#[repr(C)]
pub struct HARDWAREINPUT { ... }
#[repr(C)]
pub struct INPUT {
pub tag_: DWORD,
pub union_: MOUSEINPUT, // MOUSEINPUT largest and most aligned
}
@alexchandel Good when it works, but sometimes the largest variant is not the most aligned, like in my example above.
Is there a reason why this bug is tagged as "P-low"? The alternatives that are proposed and I guess currently used entails that a great care is taken for handling alignment properly. The last example on how this can be fixed without any language addition, is a perfect example how the language is promoting to write code that is incorrent because it don't provide a proper solution
I don't know how feasible it would be to implement, but an example usage could be:
#[repr(union)]
pub struct XEvent {
pub type_: c_int,
pub xany: XAnyEvent,
// ...
pub pad: [c_long; 24],
}
Like C unions, each field would start at the beginning of the struct, and the size of the struct would be that of its longest field. This wouldn't require adding union
as a language keyword. The only limitation I can think of would be that accessing a field in the union would require unsafe
, which is already used often when interfacing with C libraries.
A macro based solution could look something like:
union! {
pub union XEvent {
pub type_: c_int,
pub xany: XAnyEvent,
// ...
pub pad: [c_long; 24],
}
}
// functions generated by macro:
impl XEvent {
pub unsafe fn type_<'a> (&'a self) -> &'a c_int { ::std::mem::transmute(self) }
pub unsafe fn type__mut<'a> (&'a mut self) -> &'a mut c_int { ::std::mem::transmute(self) }
pub unsafe fn xany<'a> (&'a self) -> &'a XAnyEvent { ::std::mem::transmute(self) }
pub unsafe fn xany_mut<'a> (&'a mut self) -> &'a mut XAnyEvent { ::std::mem::transmute(self) }
// ...
pub unsafe fn pad<'a> (&'a self) -> &'a [c_long; 24] { ::std::mem::transmute(self) }
pub unsafe fn pad_mut<'a> (&'a mut self) -> &'a mut [c_long; 24] { ::std::mem::transmute(self) }
}
The only thing that prevented me from writing this macro is the inability to determine the size of the union at compile time. The best workaround I could come up with is providing a guess of the size of the largest field and making the union generate tests to verify this.
union! {
pub union XEvent : [c_long; 24] {
pub type_: c_int,
pub xany: XAnyEvent,
// ...
pub pad: [c_long; 24],
}
}
// test generated by macro:
#[test]
fn test_union_size_XEvent () {
use std::cmp::max;
use std::mem::size_of;
let sizes = [
size_of::<c_int>(),
size_of::<XAnyEvent>(),
// ...
size_of::<[c_long; 24]>(),
];
assert!(sizes.iter().fold(0, |a, b| max(a, *b)) == size_of::<[c_long; 24]>());
}
Of course, it would be much easier on developers of language bindings to have unions available as a language feature.
winapi
would benefit massively from unions as part of the core language. I currently use a macro to make do, but its just not the same.
I'm interested in unions as well, for several Linux kernel APIs. The proposal of having an "unsafe union", guaranteed to match the C layout, would work perfectly; almost any non-trivial instance of such a C union only makes sense to access in an unsafe block, given its trivial equivalence to the unsafe std::mem::transmute
.
Most unions in C have a descriptor field, therefore there's a need for 2 cases (has-desciptor & has-no-descriptor). Being able to specify a struct-unique enum with custom type descriptor & the fields corresponding values would allow Rust to use the union in a type safe manner while being able to interoperate with C APIs
Essentially something like
#[enum_explicit_descriptor(t)]
#[enum_explicit_values = "I: 0, N: 1"]
unsafe struct TValue{
t: u8,
val: unsafe enum IntOrFloat{
I(i32),
N(f32),
},
}
Using unsafe struct to handle cases where the type descriptor isn't adjacent to the union. Even then, something could be done like
#[enum_explicit_descriptor_type(u8)]
#[enum_explicit_descriptor_typeoffset(-1)] // This could be behind-the-struct by default
#[enum_explicit_values = "I: 0, N: 1"]
enum IntOrFloat{
I(i32),
N(f32),
}
Then there'd need to be compile-time machinery that makes sure there's a valid u8 behind the enum in definitions, though user code would access a struct TValue{ t:u8, val: IntOrFloat }
The issue of having typeoffset could be resolved by requiring explicit enums only be contained in structs & have enum_explicit_layout_typeoffset be specified by the struct. Would require a bit more strictness though since one wouldn't be able to know how to find the descriptor of an &IntOrFloat parameter
@serprex: I don't think it's worthwhile to add language support for external descriptors of unions, even in cases where there is a 1:1 match between a single descriptor field value and a union variant. The code using unions is expected to be close to FFI, where unsafe is the norm; so variant matching can be always unsafe, and the burden of ensuring the correct variant would be completely on the programmer, as it is in C.
@mzabaluev I agree. For a first pass, at least, we just need an unsafe construct to access fields of a C union in a C-compatible, interoperable way. We can always produce a safe wrapper around that, and even produce macros to generate such wrappers for common cases.
I posted a preliminary proposal using #[repr(C,union)] struct { ... }
(requiring unsafe blocks for field accesses, assignments, or initializations) to https://internals.rust-lang.org/t/pre-rfc-unsafe-enums/2873/23.
Closing in favour of https://github.com/rust-lang/rfcs/issues/877.
How would one call C functions involving union with Rust FFI?
SpiderMonkey's jsval is one example.