ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.99k stars 2.49k forks source link

Support for using both '_' and else prongs at the same time in switch statements for non-exhaustive enums. #12250

Open Flaminator opened 2 years ago

Flaminator commented 2 years ago

Currently when using switch statements with non-exhaustive enums you are able to use it in the following 2 ways as explained in the documentation:

A switch on a non-exhaustive enum can include a '_' prong as an alternative to an else prong with the difference being that it makes it a compile error if all the known tag names are not handled by the switch.


const Enum = enum(u32)
{
A = 1,
B = 2,
C = 44,
_
};

fn somefunction(value: Enum) void { // Compiles as all tags are handled switch(value) { .A => {}, .B => {}, .C => {}, => {}, // Unnamed tags go here }

// Compiles as all tags are handled
 switch(value)
{
    .A => {},
    .B => {},
    else => {}, // Both named and unnamed tags go here
}

}

As I had not read that part of the documentation and I have only been using '_' prongs, I was expecting the following code to work as well:
```zig
fn some_other_function(value: Enum) void
{
    // Does not compile giving "error: else and '_' prong in switch expression"
    switch(value)
    {
        .A => {},
        .C => {},
        else => {},   // Named tags go here so .B in this case
        _ => {},      // Unnamed tags go here
    }
}

The idea I had in my head was that the '_' prong would catch the unnamed tags and the else prong the named tags. I was running into this when I was using the Win32 API and had written my own enum for the WM_NOTIFY values. As the part of the code I was working on did not need all of the named tags I thought I could add the else prong to filter those out.

I asked about this on IRC and more people thought it worked the way I thought it would work. So I opened this issue to see if it is viable to add something like this to the language. I have no real preference for any kind of syntax for this but the way I had written it down seemed really natural. One thing that did struck me when I was making this issue was that in my head I saw else as something for named tags and '_' for unnamed tags so it didn't even come up to me that using else in non-exhausted enums would actually also catch the unnamed tags.

mk12 commented 2 years ago

If you think of non-exhaustive enums as "Allow arbitrary u32 values in addition to the named ones," then this proposal makes sense. Even though you aren't specifically handling .B, you care about named vs. unnamed.

But the main use case for them is, "Allow me to add new named values in the future". And in particular, "Force client code to decide whether they want to be broken when I add new members". When clients write exhaustive switches, they use _ to say, "Yes, please break so I can handle the new member" or else to say "No, do not break my code." Using both here seems strange: "No, don't break my code, but let me react differently if the member was added before I compiled or after".

There are also a couple alternatives to changing the language:

I'm not necessarily opposed to your idea — you could argue it's simplifying the language rather than complicating it, by removing a barrier. My only qualm is that my intuition gets else and _ backwards here. You said:

it didn't even come up to me that using else in non-exhausted enums would actually also catch the unnamed tags.

To me it seems intuitive because else makes me think of converting the switch to an if-else chain, where it becomes the final else that handles everything. But in your proposal else is the first level catch-all, and _ is the real catch-all.

At the very least if this is accepted I think the compiler should enforce putting them in the order you showed: else, then _.

tsmanner commented 1 year ago

I was just toying with an idea after question about whether switches cases could handle using a set of comptime-known tags/values for a prong rather than requiring each value to be written out individually. The example given was a way to define punctuation characters as the return value from a comptime function (presumably so the same set of them can be reused in multiple places), and have a switch prong catch those. Allowing else and _ prongs in non-exhaustive enum switches would allow for a user-space solution to that problem by reifying a new enum type rather than returning a slice. Example:

const std = @import("std");

fn Punctuation() type {
    return @Type(.{
        .Enum = .{
            .layout = .Auto,
            .tag_type = u8,
            .fields = &[_]std.builtin.Type.EnumField{
                .{ .name = ".", .value = '.' },
                .{ .name = ",", .value = ',' },
            },
            .decls = &[_]std.builtin.Type.Declaration{},
            .is_exhaustive = false,
        }
    });
}

pub fn main() !void {
    var x: u8 = ' ';
    switch (@intToEnum(Punctuation(), x)) {
        inline else => |punc| std.debug.print("Punctuation {s}\n", .{[_]u8{@enumToInt(punc)}}),
        _ => |non_punc| std.debug.print("Non-punctuation {s}\n", .{[_]u8{@enumToInt(non_punc)}}),
    }
}

It's a little bit unfortunate that it has to use intToEnum and enumToInt all over the place, but it's also being truthful and could be cleaned up or encapsulated better, I'm sure.

Flaminator commented 1 year ago

@mk12 I only just saw your comment, in my use case as I am wrapping around an existing c api so I am indeed just using an enum to give names to an u32. Of course that also means there are values that should be in the enum that I just haven't encountered or seen yet that exist.

I don't necessarily see it as something to do with breakage and more something to do with there might be values I don't know about or that might be added later on. Adding a new tag to a normal enum would also break client code after all. I kinda forgot what the idea behind adding non-exhaustive enums was again.

I still think having else for tagged and '_' for untagged things makes the most sense. I just wasn't necessarily thinking else would also cover '_'.

hryx commented 1 year ago

I've recently wanted this feature when dealing with a collection of fairly long non-exhaustive enums. Here is an example: https://github.com/hryx/llvm-bitcode/blob/6b60fd4/src/Bitcode.zig#L352-L432

In this case, I need to switch on a subset of the enum for partial processing, then switch on a different subset for further processing. On either of these passes, it would be nice to use else to catch values which are known but irrelevant, and _ to catch values which are unknown/invalid. (Or vice-versa, keyword choice is besides the point.) As it is, I have to either duplicate the large list of values or lose the distinction between irrelevant and unknown values.

Khitiara commented 8 months ago

my biggest use case here is to use an inline else for handling known values which is presently illegal as that isnt exhaustive, and i have to fall back to an inline for and hope the optimizer realises its a switch with the _ case being after the for

kj4tmp commented 2 weeks ago

I also expected similar behavior as explained by the original poster.

In my use case, I am deserailizing a byte stream. I don't trust the byte stream so my Enum must be able to handle all possible values. I also want to make sure I get the right value of the enum.

/// Mailbox Types
///
/// Ref: IEC 61158-4-12:2019 5.6
pub const MailboxType = enum(u4) {
    /// error
    ERR = 0x00,
    /// ADS over EtherCAT (AoE)
    AoE,
    /// Ethernet over EtherCAT (EoE)
    EoE,
    /// CAN Application Protocol over EtherCAT (CoE)
    CoE,
    /// File Access over EtherCAT (FoE)
    FoE,
    /// Servo Drive Profile over EtherCAT (SoE)
    SoE,
    /// Vendor Specfic over EtherCAT (VoE)
    VoE = 0x0f,
    _,
};

pub const InContent = union(enum) {
    coe: coe.InContent,

    // TODO: implement other protocols

    /// identifiy the content of the mailbox in buffer
    fn identify(buf: []const u8) !std.meta.Tag(InContent) {
        var fbs = std.io.fixedBufferStream(buf);
        const reader = fbs.reader();
        const mbx_header = try wire.packFromECatReader(Header, reader);

        return switch (mbx_header.type) {
            .CoE => return .coe,
            _ => return error.InvalidMbxProtocol,
            else => return error.NotImplemented,
        };
    }
};