Open CurtisFenner opened 5 years ago
I see potential in that. A world where error sets are just regular unions, but given all the syntax-level amenities of today's errors.
// a regular-ass type
const InvalidChar = struct {
pos: usize,
};
// an error set containing different types
const ParseError = error {
InvalidChar: InvalidChar,
Overflow, // void
};
// merge like ya would today
const Error = ParseError || error{OutOfMemory};
fn f() void {
parse(something) catch |err| switch (err) {
.InvalidChar => |e| warn("bad character at {}", e.pos),
.Overflow => warn("overflow"),
.OutOfMemory => warn("out of memory"),
};
}
Taking it further, perhaps all today's good stuff about errors could be applied to any type, not just unions. Maybe the error
keyword "taints" a type as an error type. (Although, making errors non-unions would probably have too many effects on the language.)
const SomeError1 = error struct {
val: usize,
reason: []const u8,
};
const SomeError2 = error union(enum) {
OutOfOrder,
OutOfBounds,
OutOfIdeas,
};
// today's, which is sugar for above
const SomeError3 = error {
ResourceExhausted,
DeadlineExceeded,
};
Because you could now "bloat" an error set with types of larger size, this might affect how strongly use of the global error set is discouraged.
I remember seeing this proposed before but I can't find the issue for it. Maybe it was only on IRC?
Thank you @CurtisFenner for a well written proposal
This is just a tagged union.
And as they seem so useful, maybe we can add anonymous structs, so we can just use tagged unions instead of multiple return values.
Don't worry about the optimizations here. The compiler can handle that.
There's a previous issue here #572 (just for the record)
because errors are assigned a unique value, how about allowing for tagged unions to use errors as the tag value? this would avoid adding new syntax to language and making this feature consistent with other constructs in the language. this tangentially relies on #1945.
/// stolen from above
const ParseError = union(error) {
InvalidChar: usize,
Overflow, // equivalent to `Overflow: void`
};
pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {
// ......
if (digit >= radix) {
return error{ .InvalidChar = index };
}
// ......
}
test "parseU64" {
if (parseU64(str, 10)) |number| {
// ......
} else |err| switch (err) {
error.Overflow => {
// ......
},
error.InvalidChar => |index| {
// ......
}
}
}
Agreeing with @emoki I'd like some syntactic sugar for multiple arguments to an error switch, if the type is defined in the same tagged union:
/// stolen from above
const ParseError = union(error) {
InvalidChar: InvalidCharStruct,
Overflow, // equivalent to `Overflow: void`
pub const InvalidCharStruct = {
i: usize,
o: bool,
}
};
pub fn parseU64(buf: []const u8, radix: u8) ParseError!u64 {
// ......
if (digit >= radix) {
return error{ .InvalidChar = .InvalidCharStruct{index, false} };
}
// ......
}
test "parseU64" {
if (parseU64(str, 10)) |number| {
// ......
} else |err| switch (err) {
error.Overflow => {
// ......
},
error.InvalidChar => |index, boolish| {
// ......
}
}
}
I think what @emekoi suggested is excellent, as it removes the need for extra syntax and sidesteps the issues of increasing the size of anyerror
and dealing with error names assigned different types, while still enabling the core idea here!
return error{ .InvalidChar = index };
I assume this should be:
return ParseError{ .InvalidChar = index };
Otherwise I love the idea!
that's what i wasn't sure about. would you still have to explicitly name the error even when using an inferred error set? or would you just use error
as you normally would with an inferred error set?
Not a proposal, but something possible currently: here's a variation on OP's "Workaround 2" (the out parameter). A struct member instead of an "out" parameter. It's still not perfect, but this or Workaround 2 is still the most flexible as they make it possible to allocate memory for the error value (e.g. a formatted error message).
const Thing = struct {
const ErrorInfo = struct {
message: []u8,
};
error_info: ?ErrorInfo,
// `allocator` could also be a parameter of an init function
fn doSomething(self: *Thing, allocator: ...) !void {
if (bad thing 1) {
self.error_info = ErrorInfo {
.message = try ...allocate a string...,
};
return error.ThingError;
} else if (bad thing 2) {
self.error_info = ErrorInfo {
.message = try ...allocate a different string...,
};
return error.ThingError;
} else {
// happy
}
}
};
fn caller() void {
var thing = Thing.init();
defer thing.deinit(); // free allocated stuff in error_info if present
thing.doSomething(some_allocator) catch |err| {
switch (err) {
error.ThingError => {
// this `.?` is the smelliest part of this idea
std.debug.warn("error: {}\n", thing.error_info.?.message);
},
else => {
// e.g. an OOM error from when we tried to alloc for the error message
std.debug.warn("some other error\n");
},
}
return;
}
std.debug.warn("success\n");
}
This might be a solution for std InStream and OutStream which currently have that annoying generic error parameter?
Also, for parsers and line numbers specifically, you don't need to include the line number in the error value itself. Just maintain it in a struct member and the caller can pull it out when catching. If these struct members aren't exclusive to failed states, then there's no smell at all here.
const Parser = struct {
...
line_index: usize,
parse(self: *Parser) !?Token {
// continually update line_index, return a regular zig error if something goes wrong
}
};
I like @emekoi's suggestion here, but I'll note that I'd like to be able to have parseU64
return !u64
and have the error type inferred, just as we do now, and still be able to do
return error{ .InvalidIndex = index };
.
I guess it would actually be return error.InvalidChar{ .index = index };
- But that's still fine by me :)
in your example doSomething
can be cleaned up using errdefer
I think the issue here can be summarized by noting that zig has 2 concepts that are tied together that probably don't need to be.
Zig has some nice constructs that make error control flow easy to work with (try
, errdefer
, catch
, orelse
, etc). However, the only way to use them is if you return "Error Codes". If Zig provides a way to enable "Error Control Flow" with more than just "Error Codes" then applications are free to choose the best type to return error information.
Maybe Zig should be able to infer an error set that includes any type, not just error codes?
fn foo() !void{
if (...)
return error SomeStruct.init(...);
if (...)
return error.SomeErrorCode;
}
this c++ Proposal is so cool with this zig Proposal, so maybe a consider.l http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0709r3.pdf
I find that the boost library outcome
also support custom types.
https://ned14.github.io/outcome/tutorial/advanced/payload/copy_file2/
I additionally propose that coercing a union(error)
to an error should be possible. That way you can still have a "returns all errors" function fn foo() anyerror!void
but it only returns the error code, and not the error value.
Is this not typically a job for some kind of interfaces, i.e. allow anything that implements IError
to be used for the error control flow syntax? This would play nicely with proposals for wrapped primitives.
Here's a pattern that I would consider to be an alternative to this proposal:
I think it's quite reasonable.
Edit: here's usage example at the callsite:
The problem with returning a value with an error, is that it is the same as returning a polymorphic type, and defining that return type inside the function, instead of in the function signature. While I think we still need inferred return types (#447) for some things, this is an advanced feature, and fancier patterns, like the example, should be required in order to utilize these features.
We also need inferred types as a way of sneaking in multiple return values (through the anonymous structs we already have), which LLVM supports, but in C requires an ugly one-use-but-defined struct (and where the C ABI layout of that struct is ignored by the optimization pass).
I think it's reasonable, but I think it could be better by making minimal changes to the language. Specifically, I think Zig is already expressive enough to more tightly reflect the function's interface in its type signature; we just need to apply the right already-existing features. My two main complaints with what you can currently achieve:
error
s to particular fields on the returned struct.successful
case. However, using a struct
instead of a tagged union means
undefined
struct
layout for diagnostic fields that aren't populated in other error cases / in successAn error set like error { A, B }
is essentially a enum
.
An error union like A!B
is already essentially a tagged union, union(enum) { success: B, err: A }
.
This (modified) proposal is to optionally transform the "error set" concept into an "error union" concept -- noting that a tagged union with all void field types is essentially the same thing as a enum; ie, we have just strengthened an existing concept (error sets) to another existing concept (tagged unions).
I don't think it's necessary to automatically generate the union, as emekoi suggested earlier -- we just use union
s, but with tags that are error
s instead of enum
s.
The example would look something like
pub const DiagnosticsErr = union(error) {
UnknownCpuFeature: ?[]const u8,
MissingArchitecture: void,
// same as UnknownCpu: void
UnknownCpu,
}
pub fn parse(args: ParseOptions) !Target { // infer DiagnosticsErr!Target
......
// equivalent to `return DiagnosticsErr{.MissingArchitecture={}};`
return error.MissingArchitecture;
......
return DiagnosticsErr{.UnknownCpuFeature = feature_name};
}
var result = Target.parse(.......) catch |err| switch (err) {
error.UnknownCpuFeature => |unknown_feature_name| {
......
},
else => |e| return e,
}
I think this is a relatively small change to the language (since it re-uses existing concepts and is totally backwards compatible) to make some situations much clearer without any additional runtime/compiletime cost.
related: #786
I'm going to restate the proposal a little more fully now that some discussion has improved it.
Currently, Zig has a concept of error sets, which are a collection of distinct error
values. They are conceptually very similar to enum
s, except that they have globally unique values and can be cast to anyerror
. Error sets can also be combined using ||
. They are written like error { MyErr1, MyErr2 }
.
Error sets can be combined with any type into error union types, written E!T
where E
is an error set and T
is any type.
Instances of an error set are error
values, written like error.MyError
or ErrorSet.MyError
.
Error unions have a few language features that make robustly handling errors easy: they trigger errdefer
s, and they have try
and catch
operators. In addition, the E
part of E!T
can be inferred.
I propose generalizing error sets into a new feature which I will call tagged error sets.
They might look like this:
const TaggedErrorSet = error {
UnknownCpuFeature: ?[]const u8,
MissingArchitecture: void,
UnknownCpu, // same as UnknownCpu: void
InvalidChar: u8,
};
Note that the existing syntax error {A, B, C}
still works, and means the same thing as before.
A || B
works as before, but requires that any errors in common to both are associated with the same type.
I propose that an instance of a tagged-error-set can be written like this:
error.InvalidChar('$')
TaggedErrorSet.InvalidChar('$')
// or perhaps
error.InvalidChar{'$'}
TaggedErrorSet.InvalidChar{'$'}
// There are other alternatives, but I find them less parallel with the existing shorthand that
// error.E
// is equivalent to
// (error {E}).E
error { .InvalidChar = 'x' }
TaggedErrorSet { .InvalidChar = 'x' }
An error union works as before. !
combines a (tagged) error set and a type. catch
and try
work as before.
A switch
on an tagged error set gains an argument: error.InvalidChar => |c|
This is the value that the tagged-error-set "constructor" was passed.
Like error unions, these tagged error sets are implemented as unions. Tag integer values are assigned as before in the global anyerror
enum.
I don't believe this dramatically affects error-inference. The error sets are still just "combined" in a way analogous to ||
, with an error raised if two errors mismatch their types.
Explicitly casting (using @as
) one error set to another can drop (ie, replacing with void
) any value type, including to anyerror
. Casting from a smaller error set to a bigger error set may require doing a copy in the event that the union of one is larger than the union of the other. For that reason and others, I don't think implicit casting should be allowed when any of the errors have an associated non-empty type.
I'm currently writing code with a very many number of exit points which throw an error and are required to also set a message, and being diligent to always do both is dragging on me. I believe this proposal is a good fit for Zig because
errdefer
and try
in more cases (when having to choose to use a union instead of an error)One problem: what if a user tries to define an error with a name already used in another error set with a different payload type? With current semantics that's no problem, but under this proposal it has to be an error. The user now has to take into account every other error set in the program, and library authors have to make sure they don't use names that anyone else might want -- #include
all over again.
This could be worked around by disallowing error type inference on functions that originate tagged errors, and requiring all tagged errors to be associated with an explicit error type, but then you have the same problem naming error types, unless you unpack and propagate errors manually, which is about as much work as just passing a diagnostic pointer.
I like the concept of this proposal, but I just don't think there's a way to do it cleanly.
That's a good callout @EleanorNB .
Here are my initial thoughts:
We could do nothing.
In general, adding a new error to an error set is a "breaking change" since callers need to be handle the new errors, so this would only affect a function which 'passes through' error sets it didn't define. (If it handles them [exhaustively] it's not surprising that it would be broken by addition of a new error set to one of the functions it calls) Perhaps this is uncommon enough that simply making this a caveat is OK.
In addition, this is already kind of a problem -- if two independently maintained error sets both coincidentally (i.e., without coordinating with each other) choose the same error name, a caller of a function which propagates both error sets can't distinguish between the two error cases (I am assuming that since they were chosen without coordination, they don't [necessarily] indicate the same kind of recovery must be done), which can result in incorrect error recovery. So, in a sense, failing compilation when a situation like this arises could actually be considered an improvement over the status quo, since otherwise code might 'coincidentally compile' even though it is not correctly handling a new type of error that was accidentally give the same name ("Compile errors are better than runtime crashes.")
Alternatively, we could make the type associated with the error part of its value, so error.MyErr
and error.MyErr(usize)
are assigned different values in the error-enum and are considered as distinct as error.Err1
and error.Err2
This would make handling errors slightly less convenient, since a switch
case would need to indicate the type ('inferring' it would come back to this problem, since I would guess approximately 0 code would take advantage of any 'optional' disambiguation), but is a relatively small price to pay if this is the only issue with this proposal.
This is the top 1 non-accepted proposal based on 👍
Relevant language features in other languages, seem to be polymorphic variants in OCaml. And of course exceptions in OCaml and SML. Ignore the fact that exceptions are thrown in OCaml / SML and only look at how when you declare them you can extend them. Indeed the SML dialect Alice had the ability to declare new extensible datatypes https://www.ps.uni-saarland.de/alice/manual/types.html#exttype.
The trickiest part wrt implementation seems to be that all of the above languages have a GC and therefore can use a uniform representation of the error type that is probably not available as easily to ZIG that is the pointer.
But the representation of polymorphic variants in OCaml (basically a integer derived via a hash function of the name) seems to be otherwise available. And you can do the usual set lowest bit if the whole representation fits into an integer representation (that is if the error has no associated data).
Polymorphic variants in OCaml are allowed to have payloads of different types for the same name, it's just that you can't write a function that can return both. That is it is a compile error to write a function that tries to use two different payload types for the same tag.
An explanation of the representation can be found here: https://dev.realworldocaml.org/runtime-memory-layout.html
I'm very new to zig, and loving everything so far :) I did stumble when trying to return errors with more context, and found a lot of help on the discord community (thanks again!) and that also brought me to this issue. I think something like this proposal would be very helpful, especially for writing general purpose libraries.
I'm perhaps biased by error handling in go, python and c++, so it's possible my thoughts will change with more experience in zig, but this is my today's wish for errors:
error data (e.g. line_num for parse errors, invalid byte value for stream decoders, etc.) should be part of the error value, rather than a separate context that the function receives. If main() called lib1.foo() that called lib2.bar(), and both lib1 and lib2 used diagnostic contexts to return errors, lib1 would have to unwrap and sometimes rewrap all of lib2's errors. "try lib2.bar()" would otherwise just return the error value without the related context that main() might have found useful to report to the user.
It would be useful if the (error+error_data) could itself provide a format() method. This would perhaps allow more dynamic error messages without needing allocators to generate the messages.
The ability to wrap errors with additional context might also be useful for more helpful end-user error messages (e.g. "unable to save report: libcsv failed writing /path/file.csv: No space left on device")
I'm not sure what happens when you have an error set with the same name twice with different types. This could possibly be a limited case where "overloading" a single name is OK, since instantiating an error is always zero-cost, but I'll ask what others think.
Would this preclude having a format method associated with individual errors, since there is no way to tell which method to use when two types are merged? A format method would be extremely useful to avoid allocation for error messages at the time the error is created, as @happyalu mentioned. But I don't know if it's possible, given the merging of errors.
@jumpnbrownweasel The revised version of the proposal linked in the original post makes clear that types shouldn't be associated with errors in the "global error set", but instead within individual error sets. In that formulation, an error set is basically just a tagged union. As long as you can identify a "field" inside the tagged union, you could get the associated type.
This comment discusses what the result of merging two error sets would be, proposing basically two alternatives:
anyerror
/[]const u8
and a type
.Thanks @CurtisFenner .
Here's another variation of what error syntax with values could look like.
// this could define arguments passed to a formatter function for the error
// that way, the cost of formatting the string is moved to the caller if they wish to print the error
// and the allocator would be passed in that way, too.
return error.InvalidLoader("{s} is an invalid loader. Please choose from jsx, js, ts, tsx", .{loader});
// or no formatter
return error.InvalidLoader;
// or maybe a tuple value associated with it.
return error.InvalidLoader(.{loader});
This is an excerpt from Andrew's alternative:
/// If error.UnknownCpuFeature is returned, this will be populated.
unknown_feature_name: ?[]const u8 = null,
I think the point of this proposal is to get compiler support instead of a docstring. The types in this allow for a lot of invalid state combinations like not populating unknown_feature_name
or populating it when the error did not occur.
Is there a way to make the existence of the unknown_feature_name
tied to having an error.UnknownCpuFeature
?
Just wanted to note that this is a highly desired feature from my perspective. The ability to provide context for the error is really important to provide good user experience for the API.
The context / doc comment to provide additional information is a viable option, but that make the API much harder to express over time.
Conscious of cluttering this issue with too many comments, but I have a few notes to share based on a recent project. This is one of the bigger pain points right now in using the standard library and building useful APIs.
Example standard library paint point - Filesystem Operations
It's very difficult to do complex filesystem operations currently without good error messages (I'm writing this comment while trying to debug a std.fs.Dir.symlink
trace). I'd suggest the std.fs
module could be a great testing bed for error proposals.
For example trying to use std.fs.Dir.*
can result in a FileNotFound
error and being able to see the path of the file, and potentially the tree of directories walked through to get to that point would make debugging much easier.
const path = "test.txt";
// error: CopyError{ .source = "/opt/src/test.txt", .destination = "/opt/dest/test.txt" }
try srcDir.copyFile(path, destDir, path, .{})
This does raise the concern around allocations, but like with a lot of std.fs
functions currently, std.fs.MAX_PATH_BYTES
could be used.
Error Cause / Wrapping Zig already gets some of this via error traces, but I've found the ability to wrap child errors to be enormously valuable coming from other languages (javascript, go, rust). Especially as it helps differentiate between errors being used as control flow, and errors that are actual errors.
Contrast current zig and hypothetical wrapping:
// CURRENT: big trace with difficulty identifying details
try srcDir.copyFile(path, destDir, path, .{});
// error message
"error: CopyError"
// trace
return CopyError
ENOTDIR => return error.NotDir
ENOENT => return error.FileNotFound
ENOTDIR => return error.NotDir
ENOTDIR => return error.NotDir
ENOENT => return error.FileNotFound
// Hypothetical: trace included, but message excludes control flow errors
try srcDir.copyFile(path, destDir, path, .{});
// error message
"error: CopyError: failed to copy file {source} to {destination}:
FileNotFound: failed to open file {path}:
FileNotFound: failed to open directory {parent directory}"
// trace
return CopyError { .source = full path, .destination = full path }
ENOTDIR => return error.NotDir { ... } // Control flow error
ENOENT => return error.FileNotFound { .path = full destination path }
ENOTDIR => return error.NotDir { ... } // Control flow error
ENOTDIR => return error.NotDir { ... } // Control flow error
ENOENT => return error.FileNotFound { .path = destination path parent directory }
Adding a cause
or format
to all errors in Zig seems like it could have some potential issues with allocation and available memory. Potentially enforcing cause messages to be static, relying on the error's payload values for context so that the output message can be a easily chained series of messages, since the stdout performance here might be less important due to not being in the hot path.
Error sets / nominal types As a follow on from @CurtisFenner's comment
The caution I can see here is that if error shorthand is only a compiler error when an ambiguous reference exists, every library or standard library addition/upgrade could potentially introduce compiler issues. The ergonomic value of short hand is really valuable though, and I can see a few ways of addressing this ():
Use shadowing/scoping rules. For example:
const OpenError = @import("std").fs.File.OpenError;
// error.FileNotFound resolves to std.fs.File.OpenError
const FileOpenError = error { FileNotFound };
// error.FileNotFound resolves to FileOpenError.FileNotFound
const Struct = struct {
const StructErrors = error { FileNotFound };
// error.FileNotFound resolves to Struct.StructErrors.FileNotFound
};
Allow short hand for file local errors. Ambiguous compiler errors are fine inside of a file since there's no change of hidden errors cropping up. All other errors have to be absolute. For example:
const FileOpenError = error { FileNotFound };
// error.FileNotFound resolves to FileOpenError.FileNotFound
const OpenError = @import("std").fs.File.OpenError;
error.FileNotFound => ...
// Compiler error: error.FileNotFound resolves to [OpenError.FileNotFound, FileOpenError.FileNotFound], specify unambiguous reference instead
I would really prefer https://github.com/ziglang/zig/issues/2647#issuecomment-592014508 over the latest proposal, especially because mostly three features of the union(error)
proposal are currently missing, and doesn't introduce any incompatibility afaik; and has no cross-library-ambiguity concerns.
Missing features needed to support it are:
union(error)
, propagate the entries to the error
set; currently fails like:
..../src/main.zig:3:26: error: expected '.', found ')'
const MyErr = union(error) {
^
(also, does someone know why that error message on zig 0.10.0 fbe5336f3be7065ab84b797eb629ada9565341c2 does include the full path to the main.zig
file, but the next error message doesn't?)
I think it should be a compiler error if the union
only contains void
fields/kinds.
stage1/parser.cpp
: ContainerDeclType
std/zig/Ast.zig
: Node.Tag
needs some way to capture union(error)
, either reusing existing container_decl_arg
, or by introducing a new tag enum kind. I opted for not reusing for now, because it probably leads to too much overlooked code locations (I/we wouldn't benefit from exhaustiveness checks otherwise).!
union; currently fails like:
./src/main.zig:15:18: error: expected error set type, found type 'MyErrU'
pub fn martial() MyErrU!void {
^
stage1/ir.cpp
: ir_resolve_error_set_type
union(error)
types to be combined using ||
, including combining with error
types (by coercing, defined below), sometimes equivalent to ++
. It should be a compiler error when union types conflict (field with same name but different associated type). error
types should coerce to union(error)
by assuming all fields have associated type void
.
I don't think that automatic conversion from a union(error)
to error
is necessary, because in many cases something might need to be done to the attached data of the union. Or it might be a good idea to restrict automatic conversion to cases where no pointers are in the union; because pointers are usually attached to an allocator, and proper "deconstructing" of the union might include .deinit
or allocator.free
calls.
It is afaik also forward-compatible to the latest proposal, by later treating union(error)
and error
as equal.
The only thing which I'm unsure about that is: how should it be handled in the error backtrace.
error allocator
(it might be necessary to recursively clone data structures for that to work). An alternative would be to format the error as soon as an union(error)
-> error
conversion (whether explicit or implicit) happens, so that the lifetime would then be always controlled by some kind of implicit error allocator
(this is the easy way, but it introduces overhead when we don't handle+discard the error while still working with the union(error)
type, but instead handle it when we only have an error
, because the union(error) -> error
conversion/transition might allocate).Nevertheless, I think it would be a good idea to introduce this, perhaps with some marker flag @allowUnionError(true)
, although that might be unnecessary, because the syntax is distinct enough that a search for union(error)
across a code base already would suffice to find all usage locations. The part about the error backtrace could be left as-is for now, such that the union
data would be ignored for now, and can be added later. Especially, because this is imo really necessary for scalable error handling in large libraries.
partial impl WIP: https://github.com/zseri/zig/tree/ext-union-error
ok, it turns out that implementing it is actually a bit harder than I expected, mostly, because not everything produces an (type/comptime eval) error at places where I would expect them.
Some examples where returning a value with an error would be helpful:
If std.base64.Base64Decoder.decode
returns an error, how do you tell the user how many bytes were successfully decoded?
If std.fs.File.writeAll
returns an error, how do you tell the user how many bytes were written successfully?
If a developer did try std.fs.openFileAbsolute(path, flags)
, how do you tell the user what file path failed to open successfully? You'd have to wrap every call into another function which logs errors separately.
var body_file = std.fs.openFileAbsoluteZ(absolute_path_, .{ .mode = .read_only }) catch |err| {
Output.printErrorln("<r><red>{s}<r> opening file {s}", .{ @errorName(err), absolute_path });
Global.exit(1);
};
I appreciate that this is a difficult issue (I think Spec's explanation on the Zig Discord was helpful for me understanding the challenges), but I really feel the current state of things is unsatisfactory. I just spent way longer than I should have to debugging an issue of my project's build not working on Windows given that all I had to work with from the zig compiler was an error: AccessDenied
and the build command that failed. When I finally gave up and switched to rewriting and then debugging things through Node the error that it returned was EBUSY
and the specific path in question that Windows considered to be busy, which made the problem actually tractable[^1]. While the obvious answer here is "The Zig compiler is a work in progress and eventually we will improve our error messages using the diagnostic pattern proposed above..." (or perhaps that this is some Windows specific issue, etc), I think the fact that even the compiler can't consistently implement this pattern points to it perhaps being too manual/tedious/unergonomic/difficult to expect the Zig ecosystem at large to do the same.
(sorry for not being able to propose a concrete solution here, I just felt the need to bump this issue with a use case after a particularly unpleasant debugging experience)
[^1]: EDIT: from looking deeper at this I think maybe one problem is Zig mapping USER_MAPPED_FILE -> AccessDenied
- I think if I had gotten the more specific error here I probably would have been able to solve this even without the filename, though the filename + the exact error code would obviously have been the best option
I think this has already been captured, but I'm dealing with error values that have resources that must be freed. I'm not sure how this would work with try
.
Specifically, I've written a driver for duckdb. When failing to execute a query, duckdb provides a detailed error message. That message is valid until the underlying prepared statement is freed. Whether I dupe the error message or keep a reference to the prepared statement, I need to free something.
I'm using a Result / tagged union, but it's a bit cumbersome to consume.
I agree that something to this affect would be great, particularly in the instance of failed parsing, writing to file, etc...
I have also experience similar opaque error messages when parsing json with no hint of where in the json file the error occurred
I was thinking along the lines of @marler8997's suggestions, but from a perspective that this approach can actually solve the problem of merging conflicting error sets (which is an unsolved problem in the OP's proposal, and I find it a bit difficult to accept it to be unsolved). So, once we allow other types to be used as members of error sets, there is no problem with a conflict, since each type is globally unique in a natural way. The syntax could look e.g. like follows (I do not insist on this specific form, which is based on @marler8997's one, this is just an example)
const MyErrorSet = error {
SomeErrorId,
AnotherErrorId,
error ErrorIdWithPayload,
};
const ErrorIdWithPayload = struct {
payload_field1: []const u8,
payload_field2: usize,
};
fn fail() MyErrorSet!void {
return error ErrorIdWithPayload{ .payload_field1 = "", .payload_field2 = 0 };
}
fn func() void {
fail() catch |err| switch(err) {
error.SomeErrorId, error.AnotherErrorId => {},
error ErrorIdWithPayload => |payload| {
std.debug.print("{s} {}\n", .{ payload.payload_field1, payload.payload_field2 });
}
};
}
Notice that, differently from SomeErrorId
and AnotherErrorId
, the ErrorIdWithPayload
is globally unique and won't clash with another identifier called ErrorIdWithPayload
defined elsewhere even upon merging the error sets. It is used in exactly the same way as any other identifier. So, if func
was in another file, one would have to qualify it with a namespace:
const fail_example = @import("fail_example.zig");
fn func() void {
fail_example.fail() catch |err| switch(err) {
error.SomeErrorId, error.AnotherErrorId => {},
error fail_example.ErrorIdWithPayload => |payload| {
std.debug.print("{s} {}\n", .{ payload.payload_field1, payload.payload_field2 });
}
};
}
Currently i'm doing it the Rust way. It kinda fills what I need from it but it would be better to better builtin to the language as this is, in my opinion, not totally readable and it makes 2-way of managing errors:
const std = @import("std");
fn Result(comptime O: type, comptime E: type) type {
return union(enum) {
Ok: O,
Err: E,
};
}
const CreateFolderErr = union(enum) { WriteError: []const u8 };
fn createFolders(dir: std.fs.Dir, paths: []const []const u8) Result(void, CreateFolderErr) {
for (paths) |path| {
dir.makeDir(path) catch {
return .{ .Err = .{ .WriteError = path } };
};
}
return .Ok;
}
pub fn main() !void {
const paths = &[_][]const u8{
"folder1",
"folder1/folder1.1",
"folder2/folder2.1",
};
const dir = try std.fs.cwd().openDir("tmp", .{});
// Should print "Could not write: folder2/folder.2.1", as "folder2" does not exists
switch (createFolders(dir, paths)) {
.Err => |err| switch (err) {
.WriteError => |path| std.debug.print("Could not write: {s}\n", .{path}),
},
.Ok => {},
}
}
It would cleary be more readable and more the zig way to do something like this:
const std = @import("std");
const CreateFolderErr = union(error) {
WriteError: []const u8
};
fn createFolders(dir: std.fs.Dir, paths: []const []const u8) CreateFolderErr!void {
for (paths) |path| {
dir.makeDir(path) catch {
return .{ .WriteError = path };
// Or more explicit: return error { .WriteError = path };
};
}
return;
}
pub fn main() !void {
const paths = &[_][]const u8{
"folder1",
"folder1/folder1.1",
"folder2/folder2.1",
};
const dir = try std.fs.cwd().openDir("tmp", .{});
// Should print "Could not write: folder2/folder.2.1", as "folder2" does not exists
createFolders(dir, paths) catch |err| switch (err) {
.WriteError => |path| std.debug.print("Could not write: {s}\n", .{path}),
};
}
I just started with Zig and as far as I can see now, the error handling is absolutely perfect and extremely elegant.
(I tried Rust for a while, where error handling is insanely complicated and hardly usable, in my opinion).
For more complicated error handling, where feedback (which can be a lot) is needed, maybe it is better to leave this to the application or library programmers to handle in a creative way.
Sometimes when a function fails, there is extra information that you have on hand that may help the caller respond to the problem or produce a diagnostic. For example, in the
parseU64
example by andrewrk here,it would be useful for the function could return the position of the invalid character so that the caller could produce a diagnostic message.
Because Zig treats
error
types specially, when using errors you get a bunch of nice features, such as!
error-set inference,try
/catch
, anderrdefer
; you currently lose these features if you want to return extra diagnostic information since that information is no longer an error type.While something like index-of-bad-character is less useful for parsing an integer, getting "bad character" with no location when parsing a 2KiB JSON blob is very frustrating! -- this is the current state of the standard library's JSON parser.
There are currently two workarounds possible today to let this extra information get out, neither of which are very ergonomic and which work against Zig's error types:
Workaround 1: Return a tagged union
You could explicitly return a tagged union that has the extra information:
This is unfortunate in a number of ways. First, because
InvalidChar
is no longer anerror
, you cannot propagate/handle the failure withtry
/catch
. Second, because theInvalidChar
case is no longer anerror
, you cannot useerrdefer
to cleanup partially constructed state in the parser. Finally, calling the function is made messy because it can fail in two separate ways -- either in the error union, or in the explicitly returned union. This means calls that distinguish different errors (as opposed to just propagating withtry
) need nestedswitch
es.Workaround 2: Write to an out parameter
You could also leave the error set alone, and instead expand the contract of
parseU64
to write to an out parameter whenever it returns aInvalidChar
error:However, this makes the function's interface much messier: it now includes mutation, and it makes it impossible to indicate that it's being called in such a way that it cannot fail, since the pointer parameter is required (where previously a
catch unreachable
could handle). Also, it won't be immediately obvious which out parameters are associated with which errors, especially if inferred error sets are being used. In particular, it gives libraries writes the opportunity to sometimes re-use out parameters (in order to prevent function signatures from growing out of hand) and sometimes not (they at least cannot when the types aren't the same).Proposal: Associate each error with a type
EDIT: Scroll down to a comment for a refreshed proposal. It looks essentially the same as here but with a bit more detail. The primary difference is not associating errors with value types, but an error within a particular error-set with a type. This means no changes to the
anyerror
type are necessary.I propose allowing a type to be associated with each error:
The value returned would be available in
switch
s:This allows a function which can fail in multiple ways to associate different value types with different kinds of failures, or just return some plain errors that worked how they did before.
With this proposal, the caller can use inferred error sets to automatically propagate extra information, and the callsite isn't made messy with extra out-parameters/an extra non-error failure handling switch. In addition, all of the features special to errors, like
errdefer
andtry
/catch
, continue to work.Errors in the global set would now be associated with a type, so that the same error name assigned two different types would be given different error numbers.
I'm not sure what happens when you have an error set with the same name twice with different types. This could possibly be a limited case where "overloading" a single name is OK, since instantiating an error is always zero-cost, but I'll ask what others think.
I'm fairly new to Zig, so some of the details may not be quite right, but hopefully the overall concept and proposal makes sense and isn't unfixably broken.