Closed nikomatsakis closed 5 years ago
With the current implementation this is UB - we emit !range
metadata on loads.
I think that if we want to allow "invalid" values in enums, we need to think about it more - we e.g. probably want to allow things like comparisons.
From the programmer's (and FFI bindings) perspective, it would be ideal if C-like enums with explicit #[repr(...)]
were considered valid for the entire range of the underlying int* type. It would be basically saying "the enum is allowed to have values that don't have name assigned". However, exhaustive matching would not be possible for such enums.
C++ seems to treat out-of-rage casts to enum as "unspecified values", which means after the cast it can have any valid value of the target range. Casting back to int does not have to yield the same result, it could return anything. I couldn't find the relevant part in the C standard.
It says in Annex I of N1570 (C11) that an implementation may generate warnings when
A value is given to an object of an enumerated type other than by assignment of an enumeration constant that is a member of that type, or an enumeration object that has the same type, or the value of a function that returns the same enumerated type (6.7.2.2).
I couldn't find any other mention anywhere, and I checked the lists of implementation-defined and undefined behaviors.
@RalfJung incorrect - http://eel.is/c++draft/expr.static.cast#10
A value of integral or enumeration type can be explicitly converted to a complete enumeration type. The value is unchanged if the original value is within the range of the enumeration values ([dcl.enum]). Otherwise, the behavior is undefined.
http://eel.is/c++draft/dcl.enum#8
For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, for an enumeration where
e[min]
is the smallest enumerator ande[max]
is the largest, the values of the enumeration are the values in the rangeb[min]
tob[max]
, defined as follows: LetK
be1
for a two's complement representation and0
for a ones' complement or sign-magnitude representation.b[max]
is the smallest value greater than or equal tomax(|e[min]| − K, |e[max]|)
and equal to2M − 1
, whereM
is a non-negative integer.b[min]
is zero ife[min]
is non-negative and−(b[max] + K)
otherwise. The size of the smallest bit-field large enough to hold all the values of the enumeration type ismax(M, 1)
ifb[min]
is zero andM + 1
otherwise. It is possible to define an enumeration that has values not defined by any of its enumerators. If the enumerator-list is empty, the values of the enumeration are as if the enumeration had a single enumerator with value0
.
(basically, this all says that the range of an enum is enough to hold all of the variants or'd together, for bitflags)
I stand corrected. And it seems the rules changed, because the one I quoted above still says out-of-range is unspecified value, not UB. That was C++14, your link seems to be a draft of C++17. Thanks!
@RalfJung it looks like that was changed with CWG1766. Seems it was to make it non-constexpr?
AFAIU in C it's not undefined behaviour (or implementation-defined behaviour) to use values that are in range of the underlying integer type but not part of the enum. It basically is just syntactic sugar around integers and giving names to specific ones. Compiler warnings for using values outside the range of the enum were only added relatively recently (few years ago) too.
I guess it would be useful if repr(C)
enums could have the same behaviour, it makes interfacing with C libraries a bit more convenient (if you have to otherwise use integers and explicitly convert between those and the enum), or safer (if you still use repr(C)
enums but the C library could give you values outside the range of the enum defined in Rust, e.g. because a newer version added a new value in a backwards compatible way).
For C++ the story seems to be a bit different as @ubsan is quoting above.
It's also worth noting that enum
s are often not even used in C, due to their abysmal benefit. Instead a lot of enumerations are just a bunch of #define
constants.
@sdroege
I consider the Rust FFI equivalent of a C enum to be a newtype integer with a set of constants. All sorts of "semi-invalid" values are basically armed bombs waiting to UB up your program, to be touched very carefully, but with C-style enums, out-of-range values are not "invalid" in any way or form - using an "old" header with a "new" library is a to-be-explicitly-supported usecase.
I guess one question is whether repr
should really have such a fundamental effect on the type (giving up on exhaustiveness) -- that goes beyond just deciding how values are represented in memory. Maybe there should instead just be a macro to define a newtype + constants like @arielb1 mentioned.
Or it could piggyback on the non-exhaustive enum RFC that's currently (hopefully) in the process of being accepted. We could define that #[repr(C/iXX/uXX)]
+ non-exhaustive => well defined for all values of the underlying type.
I consider the Rust FFI equivalent of a C enum to be a newtype integer with a set of constants.
What's the exact meaning of repr(C)
for an enum then, if not that it is "like an enum in C and compatible with that over FFI boundaries"?
@sdroege as repr
in the attribute implies, it only affects the representation in memory, not semantics. This is true for all items which admit a repr(C)
attribute.
@sdroege as repr in the attribute implies, it only affects the representation in memory, not semantics. This is true for all items which admit a repr(C) attribute.
While that makes sense, it seems like a potential footgun, at least I expected it to also behave like in C :) Using repr(C)
enums for bindings to C libraries seems rather dangerous and not a good idea in that case, and like @arielb1 said some newtype around an integer of the correct size would seem like the better option. But what would be the point of repr(C)
enums then?
As long as #[repr(C/u*/i*)]
on enums exist without a convenient alternative, they will be used incorrectly. Newtype integer with constants is something that looks seriously out of place in Rust, and safely converting from int to enum is currently very inconvenient. It might seem that I am overly pessimistic, but I have seen way too many transmutes of external integers to be anything but.
@sdroege as repr in the attribute implies, it only affects the representation in memory, not semantics. This is true for all items which admit a repr(C) attribute.
The question is -- how does it affect representation in memory? C doesn't have tagged unions, so "this is laid out like C would" doesn't make any sense for enum
.
@sdroege you seem to misunderstand the section I quoted - this is only for enums with no fixed underlying type. Any enums with fixed underlying type are treated as if they are newtyped integers.
@RalfJung I agree. I don't think people should be using enum
s for this - they should be using unions wrapped in structs. It would be nice to be able to overload matching for types like this.
I believe that the original question stated:
if i have a repr(C) enum with values A=1, B=2.
which we only accept if none of the variants in the enum have data associated with them. So @RalfJung this question:
C doesn't have tagged unions, so "this is laid out like C would" doesn't make any sense for enum.
is sort of besides the point, I think.
That said, I think I would agree with @arielb1's original comment:
With the current implementation this is UB - we emit !range metadata on loads.
In other words, this is an error today, which presumably implies some speed win in some scenarios, and I see no reason not to keep this as UB. Enums in Rust are more meaningful than in C, where they are often used (righly or wrongly) as glorified integers. I think that's ok.
(But I do think we need to arrive at some better set of guiding principles by which to make such decisions.)
I see no reason not to keep this as UB.
I stand by my earlier comment. It's very hazardous to use C-like enums in FFI, and the danger is immensely counterintuitive to boot. Saying "eh, well, it shouldn't be used that way" is just hiding head in sand. The issue exists and it needs to be fixed somehow, even if just by finding a way to teach everyone the proper way to do enums in FFI. Personally, I'd consider pretty much any other alternative easier to achieve.
I also disagree: if you see a repr(C) enum you would intuitively assume that it behaves like an enum in C. If it doesn't (i.e. is not a glorified integer), that is surprising and can result in hard to track down bugs and maybe worse, makes repr(C) enums basically useless. What else than C FFI would you use it for? For all the other cases you can just use plain enums, or any of the other repr(X) variants if you want to ensure that it's stored as some kind of integer.
IMHO it would make sense to deprecated repr(C) enums in that case, and make definition of them a compiler warning at least.
Is it possible to mark repr(C) enum's as unsafe, so access to them will be allowed in unsafe blocks only, but keep C-like behaviour in unsafe blocks? Or maybe we should add _Unknown_(i32)
variant by default to repr(C) enum's somewhat.
This behaviour affects Prost: i32 type is used instead of enum type, because Protobuf defines that enum variable must be able to hold values outside of enum range to be compatible with future versions. As alternative solution, it's proposed to use _Unknonwn_(i32)
variant, but this solution cannot be implemented in current version of Rust, because C-like enums cannot contain tags.
In turn, this problem hurts me as user of Prost. I want to have easy to use interface, out of box serialization/deserialization to JSON, and so on.
Was there any update on this? It should probably also be mentioned in the unsafe code guidelines in the enums section.
Edit: Created an issue there https://github.com/rust-lang/unsafe-code-guidelines/issues/137
This is basically subsumed by https://github.com/rust-lang/unsafe-code-guidelines/issues/69: any enum value must have a valid discriminant value.
@sdroege asked me the following on IRC, and I .. wasn't sure of the answer:
I wasn't sure what I thought the answer should be.