Open nbigaouette opened 4 years ago
Because that's what the underlying platform is using, see https://github.com/rust-lang/rust-bindgen/issues/1361. So that part of the report is invalid (MSVC is weird here :/).
A callback or such to override the enum repr may be worth it as an option, but note that as soon as you do that (if you make it i64
for example) then structs containing that enum can break and so on, so it's a bit of a footgun. Repurposing this to track that feature request.
Bindgen seems to be doing the wrong thing based on the C standard.
I understand that making things i64
everywhere could be problematic. enum
s in C aren't 64-bits unless int
is 64 bit (more on this later), so converting enum ordinals as i64
would be a bad idea.
On the other hand, MSVC is not the weird one in this case. It is the Unix behaviour that seems wrong. First, lets look at the C standard for what type enums should be:
In both cases you're looking for Section 6.7.2.2, paragraph 3 - Semantics. There it writes (and I quote directly here):
The identifiers in an enumerator list are declared as constants that have type
int
and may appear wherever such are permitted.
They always have type int
in C, which is a signed type. So in all cases on x86_64 (which I assume is what we're targeting here) we would want bindgen to do the correct thing and treat these as i32
. On other platforms this may change as int
may be i16
on some older platforms. If such a platform existed that int
was i64
, then perhaps we might consider i64
a correct representation for that platform.
Beyond that, Modern C provides similar advice:
Takeaway 1.5.6.6 Enumeration constants are of type
signed int
I'm not sure why u32
is being purported as "correct" behaviour by bindgen here. I think given the above (and the fact that enums can contain signed ordinals), bindgen should reconsider and represent everything as i32
constants on x86_64. If we're being a bit more particular, it should instead look at what type signed int
actually compares to, and use that.
Why is #[repr(C)]
not used here? From the nomicon:
repr(C) is equivalent to one of repr(u*) (see the next section) for fieldless enums. The chosen size is the default enum size for the target platform's C application binary interface (ABI). Note that enum representation in C is implementation defined, so this is really a "best guess". In particular, this may be incorrect when the C code of interest is compiled with certain flags.
As noted, a C enum
is always the size of int
, so [repr(C)]
seems like a better choice than [repr(i32)]
. In fact, that would make it safe to make the default enum style be rust
or rust_non_exhaustive
, instead of const
, which I would strongly vouch for (much cleaner and a better representation of the original code).
Also https://github.com/rust-lang/rust-bindgen/issues/1966 is somewhat related to this.
Why is
#[repr(C)]
not used here? From the nomicon:repr(C) is equivalent to one of repr(u*) (see the next section) for fieldless enums. The chosen size is the default enum size for the target platform's C application binary interface (ABI). Note that enum representation in C is implementation defined, so this is really a "best guess". In particular, this may be incorrect when the C code of interest is compiled with certain flags.
As noted, a C
enum
is always the size ofint
, so[repr(C)]
seems like a better choice than[repr(i32)]
. In fact, that would make it safe to make the default enum style berust
orrust_non_exhaustive
, instead ofconst
, which I would strongly vouch for (much cleaner and a better representation of the original code).
we could add a flag to switch between #[repr(C)]
and #[repr(whatever_int_is_for_this_target)]
. Would that be a sensible option? @tgross35 @emilio ?
That would be great for my case, just another thing that works a little bit better cross platform without necessarily needing to regenerate bindings.
Is there any reason not to make this option the default for rust style enums?
I don't have strong arguments against enabling #[repr(C)]
by default. The only thing I can see on the horizon would be that someone is relying on it being #[repr(i32)]
for some odd reason.
One "open question" that I see here is if we can use #[repr(C)]
for every C enum
or not.
Apparently, C23 allows to specify the underlying type of an enum
. So, I guess, in this particular case, we should use #[repr(the_int_that_c_said)]
instead of #[repr(C)]
.
I'm not sure if there's another way this feature could break something or not.
Edit: An alternative would be to leave this reasoning to the user :trollface:. What I mean is we could use regular expressions so people can use .*
if they want to use #[repr(C)]
for every enum
or filter accordingly.
Interesting caveat with the C23 syntax! I guess in those cases it's already determined.
The only thing I can see on the horizon would be that someone is relying on it being #[repr(i32)] for some odd reason.
I think this is pretty unlikely, and there are probably more cases the other way around where i32 is not the correct enum representation. Since the enum size is ABI defined (per the nomicon) and the size of c_int isn't fixed anyway (16 bits on avr and msp430, looking at the source), I don't think defaulting to #[repr(C)]
would break anything that doesn't already have to use some sort of workarounds because i32
isn't right anyway
I'll start working on this but I'll wait for @emilio's opinion on which should be the default.
In https://github.com/nbigaouette/onnxruntime-rs I use bindgen in
onnxruntime-sys/build.rs
to generate bindings on different platforms (macos, linux, windows).The original C header file is https://github.com/microsoft/onnxruntime/blob/v1.5.2/include/onnxruntime/core/session/onnxruntime_c_api.h (which is quite large).
This header contains a couple of enums, for example
OrtLoggingLevel
that gets wrapper asu32
: https://docs.rs/onnxruntime-sys/0.0.9/onnxruntime_sys/type.OrtLoggingLevel.htmlBut on Windows, I get
i32
s for them (docs.rs doesn't contain windows values, yet).Because of this I have trouble creating an API that can work both on Windows an other platforms.
Input C/C++ Header
wrapper.h
onnxruntime_c_api.h
Bindgen Invocation
On macOS:
On Windows:
Actual Results
On macOS:
On Windows:
Using
--rustified-enum="*"
I get:Expected Results
Note how macOS gives me a
std::os::raw::c_uint
and Windows gives me astd::os::raw::c_int
(or#[repr(u32)]
vs#[repr(i32)]
) for the enum representation.Why does bindgen generates different code on macOS/Linux and Windows? Can I tell bindgen to use
std::os::raw::c_int
everywhere? Ori64
, or whatever else?Thank you!