Allow returning a value with an error

CurtisFenner commented 5 years ago

Sometimes when a function fails, there is extra information that you have on hand that may help the caller respond to the problem or produce a diagnostic. For example, in the parseU64 example by andrewrk here,

const ParseError = error {
    InvalidChar,
    Overflow,
};

pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {

it would be useful for the function could return the position of the invalid character so that the caller could produce a diagnostic message.

Because Zig treats error types specially, when using errors you get a bunch of nice features, such as ! error-set inference, try/catch, and errdefer; you currently lose these features if you want to return extra diagnostic information since that information is no longer an error type.

While something like index-of-bad-character is less useful for parsing an integer, getting "bad character" with no location when parsing a 2KiB JSON blob is very frustrating! -- this is the current state of the standard library's JSON parser.

There are currently two workarounds possible today to let this extra information get out, neither of which are very ergonomic and which work against Zig's error types:

Workaround 1: Return a tagged union

You could explicitly return a tagged union that has the extra information:

const ParseError = error {
    Overflow,
}

const ParseResult = union(enum) {
    Result: u64,
    InvalidChar: usize,
}

pub fn parseU64(buf: []const u8, radix: u8) ParseError!ParseResult {

This is unfortunate in a number of ways. First, because InvalidChar is no longer an error, you cannot propagate/handle the failure with try/catch. Second, because the InvalidChar case is no longer an error, you cannot use errdefer to cleanup partially constructed state in the parser. Finally, calling the function is made messy because it can fail in two separate ways -- either in the error union, or in the explicitly returned union. This means calls that distinguish different errors (as opposed to just propagating with try) need nested switches.

Workaround 2: Write to an out parameter

You could also leave the error set alone, and instead expand the contract of parseU64 to write to an out parameter whenever it returns a InvalidChar error:

pub fn parseU64(buf: []const u8, radix: u8, invalid_char_index: *usize) ParseError!u64{

However, this makes the function's interface much messier: it now includes mutation, and it makes it impossible to indicate that it's being called in such a way that it cannot fail, since the pointer parameter is required (where previously a catch unreachable could handle). Also, it won't be immediately obvious which out parameters are associated with which errors, especially if inferred error sets are being used. In particular, it gives libraries writes the opportunity to sometimes re-use out parameters (in order to prevent function signatures from growing out of hand) and sometimes not (they at least cannot when the types aren't the same).

Proposal: Associate each error with a type

EDIT: Scroll down to a comment for a refreshed proposal. It looks essentially the same as here but with a bit more detail. The primary difference is not associating errors with value types, but an error within a particular error-set with a type. This means no changes to the anyerror type are necessary.

I propose allowing a type to be associated with each error:

const ParseError = error {
    InvalidChar: usize,
    Overflow, // equivalent to `Overflow: void`
};

pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {
    ......
        if (digit >= radix) {
            return error.InvalidChar(index);
        }
    ......

The value returned would be available in switchs:

if (parseU64(str, 10)) |number| {
    ......
} else |err| switch (err) {
    error.Overflow => {
        ......
    },
    error.InvalidChar => |index| {
        ......
    }
}

This allows a function which can fail in multiple ways to associate different value types with different kinds of failures, or just return some plain errors that worked how they did before.

With this proposal, the caller can use inferred error sets to automatically propagate extra information, and the callsite isn't made messy with extra out-parameters/an extra non-error failure handling switch. In addition, all of the features special to errors, like errdefer and try/catch, continue to work.

Errors in the global set would now be associated with a type, so that the same error name assigned two different types would be given different error numbers.

I'm not sure what happens when you have an error set with the same name twice with different types. This could possibly be a limited case where "overloading" a single name is OK, since instantiating an error is always zero-cost, but I'll ask what others think.

I'm fairly new to Zig, so some of the details may not be quite right, but hopefully the overall concept and proposal makes sense and isn't unfixably broken.

hryx commented 5 years ago

I see potential in that. A world where error sets are just regular unions, but given all the syntax-level amenities of today's errors.

// a regular-ass type
const InvalidChar = struct {
    pos: usize,
};

// an error set containing different types
const ParseError = error {
    InvalidChar: InvalidChar,
    Overflow, // void
};

// merge like ya would today
const Error = ParseError || error{OutOfMemory};

fn f() void {
    parse(something) catch |err| switch (err) {
        .InvalidChar => |e| warn("bad character at {}", e.pos),
        .Overflow => warn("overflow"),
        .OutOfMemory => warn("out of memory"),
    };
}

Taking it further, perhaps all today's good stuff about errors could be applied to any type, not just unions. Maybe the error keyword "taints" a type as an error type. (Although, making errors non-unions would probably have too many effects on the language.)

const SomeError1 = error struct {
    val: usize,
    reason: []const u8,
};

const SomeError2 = error union(enum) {
    OutOfOrder,
    OutOfBounds,
    OutOfIdeas,
};

// today's, which is sugar for above
const SomeError3 = error {
    ResourceExhausted,
    DeadlineExceeded,
};

Because you could now "bloat" an error set with types of larger size, this might affect how strongly use of the global error set is discouraged.

daurnimator commented 5 years ago

I remember seeing this proposed before but I can't find the issue for it. Maybe it was only on IRC?

andrewrk commented 5 years ago

Thank you @CurtisFenner for a well written proposal

shawnl commented 5 years ago

This is just a tagged union.

And as they seem so useful, maybe we can add anonymous structs, so we can just use tagged unions instead of multiple return values.

Don't worry about the optimizations here. The compiler can handle that.

ghost commented 5 years ago

There's a previous issue here #572 (just for the record)

emekoi commented 5 years ago

because errors are assigned a unique value, how about allowing for tagged unions to use errors as the tag value? this would avoid adding new syntax to language and making this feature consistent with other constructs in the language. this tangentially relies on #1945.

/// stolen from above

const ParseError = union(error) {
    InvalidChar: usize,
    Overflow, // equivalent to `Overflow: void`
};

pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {
    // ......
        if (digit >= radix) {
            return error{ .InvalidChar = index };
        }
    // ......
}

test "parseU64" {
    if (parseU64(str, 10)) |number| {
        // ......
    } else |err| switch (err) {
        error.Overflow => {
            // ......
        },
        error.InvalidChar => |index| {
            // ......
        }
    }
}

shawnl commented 5 years ago

Agreeing with @emoki I'd like some syntactic sugar for multiple arguments to an error switch, if the type is defined in the same tagged union:

/// stolen from above

const ParseError = union(error) {
    InvalidChar: InvalidCharStruct,
    Overflow, // equivalent to `Overflow: void`

    pub const InvalidCharStruct = {
        i: usize,
        o: bool,
    }
};

pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {
    // ......
        if (digit >= radix) {
            return error{ .InvalidChar = .InvalidCharStruct{index, false} };
        }
    // ......
}

test "parseU64" {
    if (parseU64(str, 10)) |number| {
        // ......
    } else |err| switch (err) {
        error.Overflow => {
            // ......
        },
        error.InvalidChar => |index, boolish| {
            // ......
        }
    }
}

CurtisFenner commented 5 years ago

I think what @emekoi suggested is excellent, as it removes the need for extra syntax and sidesteps the issues of increasing the size of anyerror and dealing with error names assigned different types, while still enabling the core idea here!

daurnimator commented 5 years ago

return error{ .InvalidChar = index };

I assume this should be:

return ParseError{ .InvalidChar = index };

Otherwise I love the idea!

emekoi commented 5 years ago

that's what i wasn't sure about. would you still have to explicitly name the error even when using an inferred error set? or would you just use error as you normally would with an inferred error set?

ghost commented 5 years ago

Not a proposal, but something possible currently: here's a variation on OP's "Workaround 2" (the out parameter). A struct member instead of an "out" parameter. It's still not perfect, but this or Workaround 2 is still the most flexible as they make it possible to allocate memory for the error value (e.g. a formatted error message).

const Thing = struct {
    const ErrorInfo = struct {
        message: []u8,
    };

    error_info: ?ErrorInfo,

    // `allocator` could also be a parameter of an init function
    fn doSomething(self: *Thing, allocator: ...) !void {
        if (bad thing 1) {
            self.error_info = ErrorInfo {
                .message = try ...allocate a string...,
            };
            return error.ThingError;
        } else if (bad thing 2) {
            self.error_info = ErrorInfo {
                .message = try ...allocate a different string...,
            };
            return error.ThingError;
        } else {
            // happy
        }
    }
};

fn caller() void {
    var thing = Thing.init();
    defer thing.deinit(); // free allocated stuff in error_info if present

    thing.doSomething(some_allocator) catch |err| {
        switch (err) { 
            error.ThingError => {
                // this `.?` is the smelliest part of this idea
                std.debug.warn("error: {}\n", thing.error_info.?.message);
            },
            else => {
                // e.g. an OOM error from when we tried to alloc for the error message
                std.debug.warn("some other error\n");
            },
        }
        return;
    }

    std.debug.warn("success\n");
}

This might be a solution for std InStream and OutStream which currently have that annoying generic error parameter?

Also, for parsers and line numbers specifically, you don't need to include the line number in the error value itself. Just maintain it in a struct member and the caller can pull it out when catching. If these struct members aren't exclusive to failed states, then there's no smell at all here.

const Parser = struct {
    ...
    line_index: usize,

    parse(self: *Parser) !?Token {
        // continually update line_index, return a regular zig error if something goes wrong
    }
};

Tetralux commented 5 years ago

I like @emekoi's suggestion here, but I'll note that I'd like to be able to have parseU64 return !u64 and have the error type inferred, just as we do now, and still be able to do ~~return error{ .InvalidIndex = index };.~~

Tetralux commented 5 years ago

I guess it would actually be return error.InvalidChar{ .index = index }; - But that's still fine by me :)

emekoi commented 5 years ago

in your example doSomething can be cleaned up using errdefer

marler8997 commented 5 years ago

I think the issue here can be summarized by noting that zig has 2 concepts that are tied together that probably don't need to be.

Error Control Flow
Error Codes

Zig has some nice constructs that make error control flow easy to work with (try, errdefer, catch, orelse, etc). However, the only way to use them is if you return "Error Codes". If Zig provides a way to enable "Error Control Flow" with more than just "Error Codes" then applications are free to choose the best type to return error information.

Maybe Zig should be able to infer an error set that includes any type, not just error codes?

fn foo() !void{
    if (...)
        return error SomeStruct.init(...);
    if (...)
        return error.SomeErrorCode;
}

gggin commented 5 years ago

this c++ Proposal is so cool with this zig Proposal, so maybe a consider.l http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0709r3.pdf

gggin commented 5 years ago

I find that the boost library outcome also support custom types. https://ned14.github.io/outcome/tutorial/advanced/payload/copy_file2/

daurnimator commented 4 years ago

I additionally propose that coercing a union(error) to an error should be possible. That way you can still have a "returns all errors" function fn foo() anyerror!void but it only returns the error code, and not the error value.

nodefish commented 4 years ago

Is this not typically a job for some kind of interfaces, i.e. allow anything that implements IError to be used for the error control flow syntax? This would play nicely with proposals for wrapped primitives.

andrewrk commented 4 years ago

Here's a pattern that I would consider to be an alternative to this proposal:

https://github.com/ziglang/zig/blob/570973761a219cde2e7ab0b3a22fe0696a9b1e3e/lib/std/target.zig#L717-L826

I think it's quite reasonable.

Edit: here's usage example at the callsite:

https://github.com/ziglang/zig/blob/570973761a219cde2e7ab0b3a22fe0696a9b1e3e/src-self-hosted/stage2.zig#L677-L708

shawnl commented 4 years ago

The problem with returning a value with an error, is that it is the same as returning a polymorphic type, and defining that return type inside the function, instead of in the function signature. While I think we still need inferred return types (#447) for some things, this is an advanced feature, and fancier patterns, like the example, should be required in order to utilize these features.

We also need inferred types as a way of sneaking in multiple return values (through the anonymous structs we already have), which LLVM supports, but in C requires an ugly one-use-but-defined struct (and where the C ABI layout of that struct is ignored by the optimization pass).

CurtisFenner commented 4 years ago

I think it's reasonable, but I think it could be better by making minimal changes to the language. Specifically, I think Zig is already expressive enough to more tightly reflect the function's interface in its type signature; we just need to apply the right already-existing features. My two main complaints with what you can currently achieve:

IDEs cannot help you associate returned errors to particular fields on the returned struct
You cannot use a tagged union, because you want to avoid the "happy path" fields being behind a superfluous .successful case. However, using a struct instead of a tagged union means
- Understanding how the struct is populated requires reading the documentation (if it can be trusted) or otherwise the code, whereas a tagged-union is an established pattern that doesn't require you to look elsewhere
- You don't get immediate runtime checks that you are only reading from the correct field (see above); this requires boilerplate for setting defaults or setting up undefined
- You potentially waste memory in the struct layout for diagnostic fields that aren't populated in other error cases / in success

An error set like error { A, B } is essentially a enum. An error union like A!B is already essentially a tagged union, union(enum) { success: B, err: A }.

This (modified) proposal is to optionally transform the "error set" concept into an "error union" concept -- noting that a tagged union with all void field types is essentially the same thing as a enum; ie, we have just strengthened an existing concept (error sets) to another existing concept (tagged unions).

I don't think it's necessary to automatically generate the union, as emekoi suggested earlier -- we just use unions, but with tags that are errors instead of enums.

The example would look something like

pub const DiagnosticsErr = union(error) {
    UnknownCpuFeature: ?[]const u8,
    MissingArchitecture: void,
    // same as UnknownCpu: void
    UnknownCpu,
}

 pub fn parse(args: ParseOptions) !Target { // infer DiagnosticsErr!Target
    ......
    // equivalent to `return DiagnosticsErr{.MissingArchitecture={}};`
    return error.MissingArchitecture;
    ......
    return DiagnosticsErr{.UnknownCpuFeature = feature_name};
}

var result = Target.parse(.......) catch |err| switch (err) {
     error.UnknownCpuFeature => |unknown_feature_name| {
        ......
    },
    else => |e| return e,
}

I think this is a relatively small change to the language (since it re-uses existing concepts and is totally backwards compatible) to make some situations much clearer without any additional runtime/compiletime cost.

emekoi commented 4 years ago

related: #786

CurtisFenner commented 4 years ago

I'm going to restate the proposal a little more fully now that some discussion has improved it.

Currently, Zig has a concept of error sets, which are a collection of distinct error values. They are conceptually very similar to enums, except that they have globally unique values and can be cast to anyerror. Error sets can also be combined using ||. They are written like error { MyErr1, MyErr2 }.

Error sets can be combined with any type into error union types, written E!T where E is an error set and T is any type.

Instances of an error set are error values, written like error.MyError or ErrorSet.MyError.

Error unions have a few language features that make robustly handling errors easy: they trigger errdefers, and they have try and catch operators. In addition, the E part of E!T can be inferred.

I propose generalizing error sets into a new feature which I will call tagged error sets.

They might look like this:

const TaggedErrorSet = error {
    UnknownCpuFeature: ?[]const u8,
    MissingArchitecture: void,
    UnknownCpu, // same as UnknownCpu: void
    InvalidChar: u8,
};

Note that the existing syntax error {A, B, C} still works, and means the same thing as before.

A || B works as before, but requires that any errors in common to both are associated with the same type.

I propose that an instance of a tagged-error-set can be written like this:

error.InvalidChar('$')
TaggedErrorSet.InvalidChar('$')
// or perhaps
error.InvalidChar{'$'}
TaggedErrorSet.InvalidChar{'$'}

// There are other alternatives, but I find them less parallel with the existing shorthand that
// error.E
// is equivalent to
// (error {E}).E
error { .InvalidChar = 'x' }
TaggedErrorSet { .InvalidChar = 'x' }

An error union works as before. ! combines a (tagged) error set and a type. catch and try work as before.

A switch on an tagged error set gains an argument: error.InvalidChar => |c| This is the value that the tagged-error-set "constructor" was passed.

Like error unions, these tagged error sets are implemented as unions. Tag integer values are assigned as before in the global anyerror enum.

I don't believe this dramatically affects error-inference. The error sets are still just "combined" in a way analogous to ||, with an error raised if two errors mismatch their types.

Explicitly casting (using @as) one error set to another can drop (ie, replacing with void) any value type, including to anyerror. Casting from a smaller error set to a bigger error set may require doing a copy in the event that the union of one is larger than the union of the other. For that reason and others, I don't think implicit casting should be allowed when any of the errors have an associated non-empty type.

I'm currently writing code with a very many number of exit points which throw an error and are required to also set a message, and being diligent to always do both is dragging on me. I believe this proposal is a good fit for Zig because

this proposal is essentially backwards compatible (except possibly some introspection features)
this proposal only uses features that already existing in Zig (ie, tagged unions) but simply weren't a part of this part of the language before
this fits well into the "Zig zen":
- it clarifies intent by moving a common need (reporting information alongside an error) into the type system
- it reduces runtime crashes/bugs by
  - letting you use errdefer and try in more cases (when having to choose to use a union instead of an error)
  - ensuring that you are writing diagnostic information rather than leaving it uninitialized (when having to choose to use struct mutation with a returned error, like it's a tagged union)
- it provides an obvious way to return diagnostic information along with an error, replacing the need to decide between mutating structs, returning non-error unions, etc

ghost commented 4 years ago

One problem: what if a user tries to define an error with a name already used in another error set with a different payload type? With current semantics that's no problem, but under this proposal it has to be an error. The user now has to take into account every other error set in the program, and library authors have to make sure they don't use names that anyone else might want -- #include all over again.

This could be worked around by disallowing error type inference on functions that originate tagged errors, and requiring all tagged errors to be associated with an explicit error type, but then you have the same problem naming error types, unless you unpack and propagate errors manually, which is about as much work as just passing a diagnostic pointer.

I like the concept of this proposal, but I just don't think there's a way to do it cleanly.

CurtisFenner commented 4 years ago

That's a good callout @EleanorNB .

Here are my initial thoughts:

We could do nothing.

In general, adding a new error to an error set is a "breaking change" since callers need to be handle the new errors, so this would only affect a function which 'passes through' error sets it didn't define. (If it handles them [exhaustively] it's not surprising that it would be broken by addition of a new error set to one of the functions it calls) Perhaps this is uncommon enough that simply making this a caveat is OK.

In addition, this is already kind of a problem -- if two independently maintained error sets both coincidentally (i.e., without coordinating with each other) choose the same error name, a caller of a function which propagates both error sets can't distinguish between the two error cases (I am assuming that since they were chosen without coordination, they don't [necessarily] indicate the same kind of recovery must be done), which can result in incorrect error recovery. So, in a sense, failing compilation when a situation like this arises could actually be considered an improvement over the status quo, since otherwise code might 'coincidentally compile' even though it is not correctly handling a new type of error that was accidentally give the same name ("Compile errors are better than runtime crashes.")

Alternatively, we could make the type associated with the error part of its value, so error.MyErr and error.MyErr(usize) are assigned different values in the error-enum and are considered as distinct as error.Err1 and error.Err2

This would make handling errors slightly less convenient, since a switch case would need to indicate the type ('inferring' it would come back to this problem, since I would guess approximately 0 code would take advantage of any 'optional' disambiguation), but is a relatively small price to pay if this is the only issue with this proposal.

Mouvedia commented 4 years ago

This is the top 1 non-accepted proposal based on 👍

bgrundmann commented 3 years ago

Relevant language features in other languages, seem to be polymorphic variants in OCaml. And of course exceptions in OCaml and SML. Ignore the fact that exceptions are thrown in OCaml / SML and only look at how when you declare them you can extend them. Indeed the SML dialect Alice had the ability to declare new extensible datatypes https://www.ps.uni-saarland.de/alice/manual/types.html#exttype.

The trickiest part wrt implementation seems to be that all of the above languages have a GC and therefore can use a uniform representation of the error type that is probably not available as easily to ZIG that is the pointer.

But the representation of polymorphic variants in OCaml (basically a integer derived via a hash function of the name) seems to be otherwise available. And you can do the usual set lowest bit if the whole representation fits into an integer representation (that is if the error has no associated data).

Polymorphic variants in OCaml are allowed to have payloads of different types for the same name, it's just that you can't write a function that can return both. That is it is a compile error to write a function that tries to use two different payload types for the same tag.

bgrundmann commented 3 years ago

An explanation of the representation can be found here: https://dev.realworldocaml.org/runtime-memory-layout.html

happyalu commented 3 years ago

I'm very new to zig, and loving everything so far :) I did stumble when trying to return errors with more context, and found a lot of help on the discord community (thanks again!) and that also brought me to this issue. I think something like this proposal would be very helpful, especially for writing general purpose libraries.

I'm perhaps biased by error handling in go, python and c++, so it's possible my thoughts will change with more experience in zig, but this is my today's wish for errors:

error data (e.g. line_num for parse errors, invalid byte value for stream decoders, etc.) should be part of the error value, rather than a separate context that the function receives. If main() called lib1.foo() that called lib2.bar(), and both lib1 and lib2 used diagnostic contexts to return errors, lib1 would have to unwrap and sometimes rewrap all of lib2's errors. "try lib2.bar()" would otherwise just return the error value without the related context that main() might have found useful to report to the user.
It would be useful if the (error+error_data) could itself provide a format() method. This would perhaps allow more dynamic error messages without needing allocators to generate the messages.
The ability to wrap errors with additional context might also be useful for more helpful end-user error messages (e.g. "unable to save report: libcsv failed writing /path/file.csv: No space left on device")

jumpnbrownweasel commented 3 years ago

I'm not sure what happens when you have an error set with the same name twice with different types. This could possibly be a limited case where "overloading" a single name is OK, since instantiating an error is always zero-cost, but I'll ask what others think.

Would this preclude having a format method associated with individual errors, since there is no way to tell which method to use when two types are merged? A format method would be extremely useful to avoid allocation for error messages at the time the error is created, as @happyalu mentioned. But I don't know if it's possible, given the merging of errors.

CurtisFenner commented 3 years ago

@jumpnbrownweasel The revised version of the proposal linked in the original post makes clear that types shouldn't be associated with errors in the "global error set", but instead within individual error sets. In that formulation, an error set is basically just a tagged union. As long as you can identify a "field" inside the tagged union, you could get the associated type.

This comment discusses what the result of merging two error sets would be, proposing basically two alternatives:

It's illegal. This reifies into a compiler error an existing potential issue due to the non-namespaced natured of error values. In general, two sources could independently choose to use the same error name; a "generic" caller of a union of them is thus unable to differentiate between the two cases, despite them likely needing different handling.
- This behavior has the potential to "surprisingly" prevent two independent libraries from being used together. Arguably, this is good, since uncoordinated error-name reuse has a high potential to lead to error-handling bugs.
Tagged error set fields could be identified by the tuple (name, type) rather than just by name.
- In order to keep the error-tag value the same, this would require an additional "sub-tag" to disambiguate which type among several the error is. This requires some runtime computation and space overhead, but not much, and only for the rare and probably inadvisable tagged error sets which actually reuse error tag names.
- (This also provides the "feature" where an error value can be "namespaced" to a library via association with a nominal type)
- Because edge cases matter, it may make sense to require all error values not associated with void to include in their "name" the associated type.
  - This could be very inconvenient, and "feels" unnecessary 99% of the time, since such ambiguous tagged error sets are rare, probably inadvisable, and probably only created by uncommon generic manipulation of error sets.
  - Still, emphasizing edge cases is important, and naming a type is not that difficult, and I would hardly want such an "inconvenience" to block this feature.
  - This requires changes to "meta" access to tagged error sets, since access now requires both a anyerror/[]const u8 and a type.
- What you are quoting is suggesting that, as a shorthand, the type-disambiguation could be optional, at least in some cases. This is in contrast to Zig's general principles, but since it makes using one of Zig's best features -- error handling -- easier, maybe that's a reasonable trade-off.
  - An ambiguous reference would be illegal; but almost all references are not ambiguous.
  - In any case, you could always explicitly state the tag type in order to disambiguate and resolve the error.
  - "Meta" access would still need changes, and likely should require disambiguation even if literal access would not.

jumpnbrownweasel commented 3 years ago

Thanks @CurtisFenner .

Jarred-Sumner commented 3 years ago

Here's another variation of what error syntax with values could look like.

// this could define arguments passed to a formatter function for the error
// that way, the cost of formatting the string is moved to the caller if they wish to print the error
// and the allocator would be passed in that way, too.
return error.InvalidLoader("{s} is an invalid loader. Please choose from jsx, js, ts, tsx", .{loader});

// or no formatter
return error.InvalidLoader;

// or maybe a tuple value associated with it. 
return error.InvalidLoader(.{loader});

i-am-the-slime commented 3 years ago

This is an excerpt from Andrew's alternative:

         /// If error.UnknownCpuFeature is returned, this will be populated. 
         unknown_feature_name: ?[]const u8 = null,

I think the point of this proposal is to get compiler support instead of a docstring. The types in this allow for a lot of invalid state combinations like not populating unknown_feature_name or populating it when the error did not occur.

Is there a way to make the existence of the unknown_feature_name tied to having an error.UnknownCpuFeature?

ayende commented 3 years ago

Just wanted to note that this is a highly desired feature from my perspective. The ability to provide context for the error is really important to provide good user experience for the API.

The context / doc comment to provide additional information is a viable option, but that make the API much harder to express over time.

rslabbert commented 2 years ago

Conscious of cluttering this issue with too many comments, but I have a few notes to share based on a recent project. This is one of the bigger pain points right now in using the standard library and building useful APIs.

Example standard library paint point - Filesystem Operations It's very difficult to do complex filesystem operations currently without good error messages (I'm writing this comment while trying to debug a std.fs.Dir.symlink trace). I'd suggest the std.fs module could be a great testing bed for error proposals.

For example trying to use std.fs.Dir.* can result in a FileNotFound error and being able to see the path of the file, and potentially the tree of directories walked through to get to that point would make debugging much easier.

const path = "test.txt";

// error: CopyError{ .source = "/opt/src/test.txt", .destination = "/opt/dest/test.txt" }
try srcDir.copyFile(path, destDir, path, .{})

This does raise the concern around allocations, but like with a lot of std.fs functions currently, std.fs.MAX_PATH_BYTES could be used.

Error Cause / Wrapping Zig already gets some of this via error traces, but I've found the ability to wrap child errors to be enormously valuable coming from other languages (javascript, go, rust). Especially as it helps differentiate between errors being used as control flow, and errors that are actual errors.

Contrast current zig and hypothetical wrapping:

// CURRENT: big trace with difficulty identifying details
try srcDir.copyFile(path, destDir, path, .{});

// error message
"error: CopyError"
// trace
  return CopyError
  ENOTDIR => return error.NotDir
  ENOENT => return error.FileNotFound
  ENOTDIR => return error.NotDir
  ENOTDIR => return error.NotDir
  ENOENT => return error.FileNotFound

// Hypothetical: trace included, but message excludes control flow errors
try srcDir.copyFile(path, destDir, path, .{});

// error message
"error: CopyError: failed to copy file {source} to {destination}:
  FileNotFound: failed to open file {path}: 
    FileNotFound: failed to open directory {parent directory}"
// trace
  return CopyError { .source = full path, .destination = full path }
  ENOTDIR => return error.NotDir { ... } // Control flow error
  ENOENT => return error.FileNotFound { .path = full destination path }
  ENOTDIR => return error.NotDir { ... } // Control flow error
  ENOTDIR => return error.NotDir { ... } // Control flow error
  ENOENT => return error.FileNotFound { .path = destination path parent directory }

Adding a cause or format to all errors in Zig seems like it could have some potential issues with allocation and available memory. Potentially enforcing cause messages to be static, relying on the error's payload values for context so that the output message can be a easily chained series of messages, since the stdout performance here might be less important due to not being in the hot path.

Error sets / nominal types As a follow on from @CurtisFenner's comment

The caution I can see here is that if error shorthand is only a compiler error when an ambiguous reference exists, every library or standard library addition/upgrade could potentially introduce compiler issues. The ergonomic value of short hand is really valuable though, and I can see a few ways of addressing this ():

Use shadowing/scoping rules. For example:

const OpenError = @import("std").fs.File.OpenError;
// error.FileNotFound resolves to std.fs.File.OpenError

const FileOpenError = error { FileNotFound };
// error.FileNotFound resolves to FileOpenError.FileNotFound

const Struct = struct {
    const StructErrors = error { FileNotFound };
     // error.FileNotFound resolves to Struct.StructErrors.FileNotFound
};

Allow short hand for file local errors. Ambiguous compiler errors are fine inside of a file since there's no change of hidden errors cropping up. All other errors have to be absolute. For example:

const FileOpenError = error { FileNotFound };
// error.FileNotFound resolves to FileOpenError.FileNotFound

const OpenError = @import("std").fs.File.OpenError;

error.FileNotFound => ...
// Compiler error: error.FileNotFound resolves to [OpenError.FileNotFound, FileOpenError.FileNotFound], specify unambiguous reference instead

fogti commented 2 years ago

I would really prefer https://github.com/ziglang/zig/issues/2647#issuecomment-592014508 over the latest proposal, especially because mostly three features of the union(error) proposal are currently missing, and doesn't introduce any incompatibility afaik; and has no cross-library-ambiguity concerns. Missing features needed to support it are:

allow union(error), propagate the entries to the error set; currently fails like:
```
..../src/main.zig:3:26: error: expected '.', found ')'
const MyErr = union(error) {
                       ^
```
(also, does someone know why that error message on zig 0.10.0 fbe5336f3be7065ab84b797eb629ada9565341c2 does include the full path to the main.zig file, but the next error message doesn't?) I think it should be a compiler error if the union only contains void fields/kinds.
- stage1/parser.cpp: ContainerDeclType
- std/zig/Ast.zig: Node.Tag needs some way to capture union(error), either reusing existing container_decl_arg, or by introducing a new tag enum kind. I opted for not reusing for now, because it probably leads to too much overlooked code locations (I/we wouldn't benefit from exhaustiveness checks otherwise).
allow a type generated by that to be used as an error type in a ! union; currently fails like:
```
./src/main.zig:15:18: error: expected error set type, found type 'MyErrU'
pub fn martial() MyErrU!void {
               ^
```
- stage1/ir.cpp: ir_resolve_error_set_type
allow union(error) types to be combined using ||, including combining with error types (by coercing, defined below), sometimes equivalent to ++. It should be a compiler error when union types conflict (field with same name but different associated type). error types should coerce to union(error) by assuming all fields have associated type void.
- afaik seems to be handled by the same code which handles the previous case, although this might be different in the self-hosted part.

I don't think that automatic conversion from a union(error) to error is necessary, because in many cases something might need to be done to the attached data of the union. Or it might be a good idea to restrict automatic conversion to cases where no pointers are in the union; because pointers are usually attached to an allocator, and proper "deconstructing" of the union might include .deinit or allocator.free calls.

It is afaik also forward-compatible to the latest proposal, by later treating union(error) and error as equal.

The only thing which I'm unsure about that is: how should it be handled in the error backtrace.

don't print them.
print them if they don't contain any pointers (even recursively) (because we don't know the lifetime of pointers)
print them always. This is probably the preferable option (because it would really help with debugging filesystem errors), but it introduces a big problem: how to handle pointers and their lifetimes. It might be an idea to have some kind of "error allocator", such that it can be assumed that all pointers inside of an error are allocated via the error allocator (it might be necessary to recursively clone data structures for that to work). An alternative would be to format the error as soon as an union(error) -> error conversion (whether explicit or implicit) happens, so that the lifetime would then be always controlled by some kind of implicit error allocator (this is the easy way, but it introduces overhead when we don't handle+discard the error while still working with the union(error) type, but instead handle it when we only have an error, because the union(error) -> error conversion/transition might allocate).

Nevertheless, I think it would be a good idea to introduce this, perhaps with some marker flag @allowUnionError(true), although that might be unnecessary, because the syntax is distinct enough that a search for union(error) across a code base already would suffice to find all usage locations. The part about the error backtrace could be left as-is for now, such that the union data would be ignored for now, and can be added later. Especially, because this is imo really necessary for scalable error handling in large libraries.

partial impl WIP: https://github.com/zseri/zig/tree/ext-union-error

ok, it turns out that implementing it is actually a bit harder than I expected, mostly, because not everything produces an (type/comptime eval) error at places where I would expect them.

Jarred-Sumner commented 2 years ago

Some examples where returning a value with an error would be helpful:

If std.base64.Base64Decoder.decode returns an error, how do you tell the user how many bytes were successfully decoded?

https://github.com/ziglang/zig/blob/94aeb6ed2a591f0d4c21d1456cc8c2a1ba413024/lib/std/base64.zig#L189-L226

If std.fs.File.writeAll returns an error, how do you tell the user how many bytes were written successfully?

If a developer did try std.fs.openFileAbsolute(path, flags), how do you tell the user what file path failed to open successfully? You'd have to wrap every call into another function which logs errors separately.


                var body_file = std.fs.openFileAbsoluteZ(absolute_path_, .{ .mode = .read_only }) catch |err| {
                    Output.printErrorln("<r><red>{s}<r> opening file {s}", .{ @errorName(err), absolute_path });
                    Global.exit(1);
                };

scheibo commented 1 year ago

I appreciate that this is a difficult issue (I think Spec's explanation on the Zig Discord was helpful for me understanding the challenges), but I really feel the current state of things is unsatisfactory. I just spent way longer than I should have to debugging an issue of my project's build not working on Windows given that all I had to work with from the zig compiler was an error: AccessDenied and the build command that failed. When I finally gave up and switched to rewriting and then debugging things through Node the error that it returned was EBUSY and the specific path in question that Windows considered to be busy, which made the problem actually tractable[^1]. While the obvious answer here is "The Zig compiler is a work in progress and eventually we will improve our error messages using the diagnostic pattern proposed above..." (or perhaps that this is some Windows specific issue, etc), I think the fact that even the compiler can't consistently implement this pattern points to it perhaps being too manual/tedious/unergonomic/difficult to expect the Zig ecosystem at large to do the same.

(sorry for not being able to propose a concrete solution here, I just felt the need to bump this issue with a use case after a particularly unpleasant debugging experience)

[^1]: EDIT: from looking deeper at this I think maybe one problem is Zig mapping USER_MAPPED_FILE -> AccessDenied - I think if I had gotten the more specific error here I probably would have been able to solve this even without the filename, though the filename + the exact error code would obviously have been the best option

karlseguin commented 1 year ago

I think this has already been captured, but I'm dealing with error values that have resources that must be freed. I'm not sure how this would work with try.

Specifically, I've written a driver for duckdb. When failing to execute a query, duckdb provides a detailed error message. That message is valid until the underlying prepared statement is freed. Whether I dupe the error message or keep a reference to the prepared statement, I need to free something.

I'm using a Result / tagged union, but it's a bit cumbersome to consume.

VisenDev commented 1 year ago

I agree that something to this affect would be great, particularly in the instance of failed parsing, writing to file, etc...

I have also experience similar opaque error messages when parsing json with no hint of where in the json file the error occurred

ni-vzavalishin commented 7 months ago

I was thinking along the lines of @marler8997's suggestions, but from a perspective that this approach can actually solve the problem of merging conflicting error sets (which is an unsolved problem in the OP's proposal, and I find it a bit difficult to accept it to be unsolved). So, once we allow other types to be used as members of error sets, there is no problem with a conflict, since each type is globally unique in a natural way. The syntax could look e.g. like follows (I do not insist on this specific form, which is based on @marler8997's one, this is just an example)

const MyErrorSet = error {
    SomeErrorId,
    AnotherErrorId,
    error ErrorIdWithPayload,
};
const ErrorIdWithPayload = struct {
    payload_field1: []const u8,
    payload_field2: usize,
};
fn fail() MyErrorSet!void {
    return error ErrorIdWithPayload{ .payload_field1 = "", .payload_field2 = 0 };
}

fn func() void {
    fail() catch |err| switch(err) {
        error.SomeErrorId, error.AnotherErrorId => {},
        error ErrorIdWithPayload => |payload| {
            std.debug.print("{s} {}\n", .{ payload.payload_field1, payload.payload_field2 });
        }
    };
}

Notice that, differently from SomeErrorId and AnotherErrorId, the ErrorIdWithPayload is globally unique and won't clash with another identifier called ErrorIdWithPayload defined elsewhere even upon merging the error sets. It is used in exactly the same way as any other identifier. So, if func was in another file, one would have to qualify it with a namespace:

const fail_example = @import("fail_example.zig");

fn func() void {
    fail_example.fail() catch |err| switch(err) {
        error.SomeErrorId, error.AnotherErrorId => {},
        error fail_example.ErrorIdWithPayload => |payload| {
            std.debug.print("{s} {}\n", .{ payload.payload_field1, payload.payload_field2 });
        }
    };
}

ChibiBlasphem commented 4 months ago

Currently i'm doing it the Rust way. It kinda fills what I need from it but it would be better to better builtin to the language as this is, in my opinion, not totally readable and it makes 2-way of managing errors:

const std = @import("std");

fn Result(comptime O: type, comptime E: type) type {
    return union(enum) {
        Ok: O,
        Err: E,
    };
}

const CreateFolderErr = union(enum) { WriteError: []const u8 };

fn createFolders(dir: std.fs.Dir, paths: []const []const u8) Result(void, CreateFolderErr) {
    for (paths) |path| {
        dir.makeDir(path) catch {
            return .{ .Err = .{ .WriteError = path } };
        };
    }
    return .Ok;
}

pub fn main() !void {
    const paths = &[_][]const u8{
        "folder1",
        "folder1/folder1.1",
        "folder2/folder2.1",
    };
    const dir = try std.fs.cwd().openDir("tmp", .{});

    // Should print "Could not write: folder2/folder.2.1", as "folder2" does not exists
    switch (createFolders(dir, paths)) {
        .Err => |err| switch (err) {
            .WriteError => |path| std.debug.print("Could not write: {s}\n", .{path}),
        },
        .Ok => {},
    }
}

It would cleary be more readable and more the zig way to do something like this:

const std = @import("std");

const CreateFolderErr = union(error) {
    WriteError: []const u8
};

fn createFolders(dir: std.fs.Dir, paths: []const []const u8) CreateFolderErr!void {
    for (paths) |path| {
        dir.makeDir(path) catch {
            return .{ .WriteError = path };
            // Or more explicit: return error { .WriteError = path };
        };
    }
    return;
}

pub fn main() !void {
    const paths = &[_][]const u8{
        "folder1",
        "folder1/folder1.1",
        "folder2/folder2.1",
    };
    const dir = try std.fs.cwd().openDir("tmp", .{});

    // Should print "Could not write: folder2/folder.2.1", as "folder2" does not exists
    createFolders(dir, paths) catch |err| switch (err) {
        .WriteError => |path| std.debug.print("Could not write: {s}\n", .{path}),
    };
}

ericlangedijk commented 1 month ago

I just started with Zig and as far as I can see now, the error handling is absolutely perfect and extremely elegant.

(I tried Rust for a while, where error handling is insanely complicated and hardly usable, in my opinion).

For more complicated error handling, where feedback (which can be a lot) is needed, maybe it is better to leave this to the application or library programmers to handle in a creative way.

ziglang / zig