ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.79k stars 2.47k forks source link

enums should disallow multiple enumerations with the same value #2115

Open tiehuis opened 5 years ago

tiehuis commented 5 years ago

@andrewrk: I re-opened this to propose reversing the decision


A C enum allows multiple values. e.g.

enum Foo {
    OPTION1 = 0,
    OPTION2 = 1,
    DEFAULT = OPTION1,
};

However, zig extern enum's do not (note: we cannot declare a member based on another member, another possibly improvement)

const Foo = extern enum {
    Option1 = 0,
    Option2 = 1,
    Default = 0,
};

test "asd" {
    const a = Foo.Default;
}

Fails with compilation

/tmp/t.zig:4:15: error: enum tag value 0 already taken
    Default = 0,
              ^
/tmp/t.zig:2:15: note: other occurrence here
    Option1 = 0,
              ^

We should allow extern enums to have the same tag value for multiple members. The specific use-case here would be for translate-c. Encountered this when trying to process gmp.h which has this enum present:

/* Available random number generation algorithms.  */
typedef enum
{
  GMP_RAND_ALG_DEFAULT = 0,
  GMP_RAND_ALG_LC = GMP_RAND_ALG_DEFAULT /* Linear congruential.  */
} gmp_randalg_t;

Alternatively for translate-c enums could be translated as sets of constants but this feels wrong. There are possibly some questions regarding switch cases and handling of duplicate values there that I haven't considered.

Hejsil commented 5 years ago

I'll add something related to this, that I've hit before. C users also uses enums for flags, so sometimes you'll see this in headers:

enum flags {
    A = 1 << 0,
    B = 1 << 1,
};
enum flags all = A | B;

translate_c will also fail to translate something like this if all is ever used.

tgschultz commented 5 years ago

It's possible to do this using definitions:

pub const Compression = extern enum(u32)
{
    Uncompressed    =   0,
    RLE8            =   1,  //8bpp only
    RLE4            =   2,  //4bpp only
    Bitfields       =   3,  //Windows, 16bpp/32bpp

    //These seem to only exist for passthrough to
    // devices like printers and don't actually
    // exist in .bmp files
    JPEG            =   4,  //Windows v3
    PNG             =   5,  //Windows v3

    //This is legit
    AlphaBitfields  =   6,  //Windows v4

    //I think these only show up in Windows Metafiles,
    // not actual .bmp files
    CMYK            =   11,
    RLE8CMYK        =   12,
    RLE4CMYK        =   13,

    //These only appear in OS/2 files
    pub const Huffman1D = @This().Bitfields; //OS/2, Monochrome only
    pub const RLE24   = @This().JPEG;        //OS/2, 24bpp only
};

However the definitions don't work as enum literals.

eira-fransham commented 5 years ago

I think that C enums should be converted to:

pub const mylib_enumname: type = struct {
    pub const A: comptime_int = 0;
    pub const B: comptime_int = 1;
    // ...
};

Since that matches how C sees them. At the moment there are plenty of C headers that cannot be imported without modification.

kkartaltepe commented 4 years ago

This also shows up in clang-c/libclang https://clang.llvm.org/doxygen/Index_8h_source.html in the CXCursor enumeration.

gruebite commented 4 years ago

Encountered this when creating bindings for Godot script API: https://github.com/GodotNativeTools/godot_headers/blob/master/nativescript/godot_nativescript.h#L45

SpexGuy commented 4 years ago

I've been thinking about the current definition of extern enum, and I've come to the conclusion that @cImport should never use it. Here's a summary of my thoughts:

Zig's C translator replaces enums with integer types of the correct size. The constant values, however, are generated as extern enum. At first glance, this means that for all code that touches the external API, the zig code must use @enumToInt when passing values and @intToEnum when receiving values. But there's a problem that arises here related to Zig's strict rules about enum semantics. Consider the case where a C function returns an error code of an enumerated type. The temptation here is to write switch(@intToEnum(returnedValue)). But this approach has the potential to introduce dangerous undefined behaviour in the event that the C API returns an unnamed value. Such a case is totally valid according to the C API, and can happen easily if a DLL is updated. This means that the generated enum values can never be used safely without wrapping them in @enumToInt. Since Zig is trying to 'beat C at its own game', this presents an ergonomics problem that should be addressed. I see two valid fixes:

Personally I like the second option better, but either of these would improve ergonomics and make interfacing with C code safer.

daurnimator commented 4 years ago

The temptation here is to write switch(@intToEnum(returnedValue)). But this approach has the potential to introduce dangerous undefined behaviour in the event that the C API returns an unnamed value.

@intToEnum should have checked behaviour (you should get the error "has no tag matching integer value"); if you expect such a thing to occur, that's the usecase for non-exhaustive enums

SpexGuy commented 4 years ago

The behaviour is checked, but only in debug and safe builds. Mismatched dll versions are the sort of thing that happens often in production and very rarely during development.

SpexGuy commented 4 years ago

Ah, I hadn't seen #2524. Having @cImport generate non-exhaustive enums solves the UB problem, but there is still a usability issue with C APIs that use flags requiring @enumToInt and @intToEnum. Defining bitwise operations on non-exhaustive enums of the same type might help but that's not really in line with the intended semantics of non-exhaustive enums. I think I'm still looking for a slightly different use case to make dealing with imported C APIs less painful.

Vexu commented 4 years ago

I think the best way extern enum can work is @SpexGuy's second option with integer operations and exhaustive enums.

Pros:

Cons:

I have enums implemented in translate-c-2 in a way that would benefit from this proposal getting accepted.

Vexu commented 4 years ago

I think the best way extern enum can work is ...

Implemented here https://github.com/Vexu/zig/commit/7353e1798f6aca80194276dd335d5225d40f76d2 for reference.

andrewrk commented 4 years ago

Here's what's accepted:

What I'm not convinced of:

Note that extern enums do not necessarily mean "compatible with the semantics of C" - they mean "compatible with the ABI of C". Enums are not a valid way to represent flags. An enum is a scalar value; flags is a set of booleans. As far as what this means for translate-c, I think the @enumToInt / @intToEnum conversions will be fine. The messiness is a good signal that this code could be better if it were converted to zig types properly, e.g. a struct with align(0) bools. One thing to note is that @intToEnum for extern enums would have no safety check, since an extern enum would allow values that are not named.

"Beating C at its own game" -> as long as the .h file does proper C types. Using enums for flags even in C is an abuse of the type system, and so I'm happy with that being awkward to deal with. The awkwardness is correct.

andrewrk commented 4 years ago

This is now implemented and in master branch. However, note the bug #3991

andrewrk commented 3 years ago

I re-opened this to push back on this decision. Here is my proposal:

Reasoning: nearly all the instances of extern enum in the standard library today are incorrect. It is not obvious that one should choose a regular enum with an explicit tag type over an extern enum.

There are real consequences for field aliases; for example trying to use std.enums.directEnumArray on an extern enum tries to run O(N^2) logic at compile time due to the possible existence of field aliases.

Enums with field aliases are a different mathematical entity, one that is more complicated to reason about, one that requires costly compromises in the compiler implementation, and one that should not be in Zig.

cryptocode commented 4 months ago

People mentioned enum flags being used in C. Here's an idiom I've been using when interacting with certain OS data structures whose enums have certain values based on other enum values:

/// Page protection flags
pub const Protection = EnumFromDecl(vm_prot_t, false, struct {
    pub const READ: u32 = 0x01;
    pub const WRITE: u32 = 0x02;
    pub const READ_WRITE: u32 = READ | WRITE;
    ...
});

This mirrors the C definition/documentation nicely, while allowing use-sites to switch on the enum:

switch (seg64.initprot) {
    ...
    .READ_WRITE => std.debug.print("Read-write case\n", .{}),
    ...
}

EnumFromDecl just walks through the declarations and produce enum fields, basically the inverse to EnumFieldStruct which already exists in the std library.

The bool parameter determines if the resulting enum should be exhaustive or not.

mlugg commented 4 months ago

@cryptocode, enum flags should be represented using packed struct.

pub const Protection = packed struct(u32) {
    read: bool, // this is the LSB
    write: bool, // and the next bit
    execute: bool, // etc
    _: u29 = 0, // remaining (padding) bits
};
cryptocode commented 4 months ago

@mlugg yeah most of time either that or just constants. But sometimes you really do want an enum, it just so happens that some fields are based on others. That is, on use-sites you wanna switch on the named combinations rather than inspecting individual flags. Narrow usecase, but when it fits it's really nice as you get the benefits of using switch (exhaustiveness being one)

Not proposing any changes to language or std, just sharing an idiom that's been useful, say, when all named flag combinations must be handled.