ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
32.09k stars 2.34k forks source link

Proposal: require `noreturn` as backing/tag of uninstantiable `enum` #19855

Open rohlem opened 2 months ago

rohlem commented 2 months ago

Empty-AND-exhaustive enum-s are uninstantiable ("noreturn-like") types - see #3257, #15909, and other issues for explanation.

In status-quo (tested 0.13.0-dev.46+3648d7df1), the compiler chooses u0 as the backing/tag type of enum{}. While that is not distinctly wrong, this is the same backing/tag type as chosen for an exhaustive enum with one value, enum{foo}.

Via manual override the compiler actually allows you to choose any integer type as the backing/tag of an uninstantiable enum. When using a non-zero-bit type, as fields they even participate in the size of the containing aggregate, even though they are uninstantiable, which I believe to be nonsensical behavior, and potentially a bit confusing:

pub const A = struct {
    bar: enum(u8){},
};
pub const B = packed union {
    foo: u8,
    bar: enum(u32){},
};

comptime {
    const assert = @import("std").debug.assert;
    assert(@sizeOf(A) == 1); //uninstantiable type, so @sizeOf is not really meaningful
    assert(@sizeOf(B) == 4); //fits into a single byte, therefore could be argued that it should be 1 instead
}

I propose that instead, an uninstantiable enum should always have a tag type of noreturn (whether explicitly specified or compiler-deduced), which should make its nature more clear to all parties.

When manually writing an uninstantiable enum type, the tag/backing type specified in status-quo is virtually meaningless. Still having an integer type specified may confuse readers - and the only reason I could think of doing this would be if the writer intended the enum with an instantiable backing/tag type to be instantiable. The proposal instead turns this into a compile error, clearly stating that the backing/tag type needs to change to noreturn, the enum needs at least one state/field, or to be made non-exhaustive.

As another small benefit, it becomes easier to account for the possibility of uninstantiable enum types in userland reflection code. Where today it might use the backing/tag 's bit size in calculation, like the compiler does in status-quo, noreturn isn't an integer type, so will trigger a compile error directly pointing to where logic needs to diverge. Code that deals with this in status-quo needs to special-case based on the enum having no fields AND being exhaustive, which is more complex and leads to less uniform code.

EDIT: Note that non-exhaustive enum-s already require specifying a backing/tag type in status-quo (f.e. enum{_} leads to a compile error). Further, because _ already checks that there are unused state/field values left (enum(u1){a,b,_} errors due to this), it seems the most regular to disallow enum(noreturn){_} as well - there is no value for _ to represent.

nektro commented 2 months ago

enums are fancy named integers. sometimes a particular backing int is required for adhering to a particular abi. an enum with zero fields being backed by a u0 is just as fine as an enum with 100 fields being backed by a u7. it is not required for the field count of an enum to exhaustively fill its backing int, regardless of size.

rohlem commented 2 months ago

EDIT: I realize now there may be an argument that a non-exhaustive enum enum(T){_} without named states could also be called "empty". I've changed the title to be more clearly about uninstantiable enum-s.

@nektro (Original response, assuming you included uninstantiable enums in your statement:) I'm not sure in what scenario an uninstantiable enum type being passed via an ABI makes sense - on Zig's side it has no valid value (like noreturn), so any access becomes illegal behavior. (It's then, arguably, not a fancy named integer, but a fancy named noreturn - a concept which I assume some ABIs, like system C ABIs, may not support.) For a use case where the Zig side simply doesn't care about any of the represented values I would have suggested using a non-exhaustive enum (instantiable, even without named states) - do you have an example use case in mind where using an uninstantiable enum is more sensible?