rust-lang / rust-bindgen

Automatically generates Rust FFI bindings to C (and some C++) libraries.
https://rust-lang.github.io/rust-bindgen/
BSD 3-Clause "New" or "Revised" License
4.5k stars 702 forks source link

Generating Rust enums for C++ enums causes NonZero-optimization footguns #225

Closed bholley closed 7 years ago

bholley commented 8 years ago

Finally got to the bottom of what was making my incremental restyle stuff crash.

Gecko has nsChangeHint, which is a bitmask declared as an enum [1]. Bindgen currently generates a compatible rust enum as shown in [2] ([3] in case the former goes away).

The problem is that Rust makes very strong assumptions about enums, and in particular assumes that enums will never have representations other than the ones listed in the declaration. And if there is no zero value declared, then [4] causes the NonZero optimization to kick in, which means that Option will use the inner enum as the discriminant. And so if somebody happens to pass nsChangeHint(0), Rust will think the entire outer struct is None.

Gecko is arguably abusing |enum| here, but people do this all the time in C++. So we should either find a safer representation, or, barring that, at least generate checked conversion routines (like bitflags! has) that callers can use instead of transmute.

Thoughts?

CC @Manishearth @emilio @nox @fitzgen @vlad @heycam @upsuper

[1] http://searchfox.org/mozilla-central/rev/f5c9e9a249637c9abd88754c8963ecb3838475cb/layout/base/nsChangeHint.h#19 [2] https://github.com/bholley/gecko-dev/blob/59bfa64706f8dc5743fbf7c7fdcd8c1478af4864/servo/components/style/gecko_bindings/structs_debug.rs#L5067 [3] https://pastebin.mozilla.org/8926070 [4] https://github.com/rust-lang/rust/pull/37224

heycam commented 8 years ago

Nice find. Can we make rust-bindgen generate a value for 0 if the C++ enum didn't have one? That might be awkward if we match against such enums, since we now have an extra (useless) case to handle, but for bitflag-like enums it's not so likely we'd be using match anyway.

upsuper commented 8 years ago

Maybe we should add an option to declare all bitflag enums, and make bindgen generate a tuple struct as well as corresponding bit operators and constants for those enums instead of a Rust enum.

upsuper commented 8 years ago

This can probably be done in regen script. We could just add those types into blacklist, and add template for generating code manually.

upsuper commented 8 years ago

Oh, hmmm, that is impossible to be done completely in regen script, because we need the constants... Probably we need to add some options to bindgen to support those kind of things, then.

emilio commented 8 years ago

Yeah, it's sort of straight forward to generate constants for enums, though probably a wrapper on top of it, something like:

struct ChangeHint(u32);

impl BitOr<nsChangeHint> for ChangeHint {
    type Output = Self;
    fn bitor(self, rhs: nsChangeHint) -> Self {
        ChangeHint(self.0 | rhs as u32)
    }
}

Though given we need to pass that to Gecko, seems straight-forward to generate something like:

type nsChangeHint = u32;

const nsChangeHint_XXX: nsChangeHint = 1;
// And so on...
upsuper commented 8 years ago

What in my mind is something like

struct nsChangeHint(u32);
impl BitOr<nsChangeHint> for nsChanghint { ... }
const nsChangeHint_XXX: nsChangeHint = nsChangeHint(1);
emilio commented 8 years ago

That sounds more complete, I like it... I'll try to write something today, not totally sure I'll find the time though.

Manishearth commented 8 years ago

I had a couple solutions in mind:

Manishearth commented 8 years ago

Actually, if nsChangeHint is a bitflag-thingy, it's very wrong to use it as an enum on the Rust side even if we do handle the zero thing. When matching a composite mask, Rust will go hit a hidden match path (fortunately not UB), print something about missing discriminants, and abort. We definitely should be creating a bitmask for this.

upsuper commented 8 years ago

I guess bitflag is probably the right approach. It seems to me bitflags! is designed for this kind of scenarios.

emilio commented 8 years ago

The problem with bitflags! is that you force a new dependency on bindgen consumers, plus it's not repr(C), plus generating code is probably a lot harder and a one-off. I've written #226 real quick, let me know if you think something like that (probably refined with default implementations for common stuff and similar) would do it.

upsuper commented 8 years ago

Yeah, I was concerned about the additional dependency...

Bitflags can be set repr(C), anyway. IIUC, you can do

bitflags! {
    #[repr(C)] flags Flags: u32 {
        const FLAG_A       = 0b00000001,
        const FLAG_B       = 0b00000010,
        const FLAG_C       = 0b00000100,
        const FLAG_ABC     = FLAG_A.bits
                           | FLAG_B.bits
                           | FLAG_C.bits,
    }
}
emilio commented 8 years ago

Oh really? TIL. I guess it's a nice to have, will fill an issue for that

nox commented 8 years ago

I'm +1 for generating bitflags code, and I think upstream already does that.

emilio commented 8 years ago

It doesn't (https://github.com/Yamakaky/rust-bindgen/pull/284), but yeah.

petrochenkov commented 8 years ago

@Manishearth Treating zero specially is not future-proof. The plans are to "compact" all enums using their invalid discriminant values, not necessarily zero. (E.g. 3 will be used as a special value for enum E { A = 0, B = 1, C = 2 }.) So, creating invalid enums through FFI is bad regardless of concrete values.

emilio commented 8 years ago

@petrochenkov I think, though, that doing layout or range optimizations on #[repr(C)] enums is kind of unfortunate.

eddyb commented 8 years ago

When matching a composite mask, Rust will go hit a hidden match path (fortunately not UB), print something about missing discriminants, and abort.

What do you mean? Using values not declared is and has been UB since long before 1.0.

Manishearth commented 8 years ago

What do you mean? Using values not declared is and has been UB since long before 1.0.

TIL, I thought it wasn't UB for C like enums. My bad.

emilio commented 7 years ago

I think this can be closed now, we have support for defining bitfield-like enums that avoids this.