titzer / virgil

A fast and lightweight native programming language
1.13k stars 40 forks source link

RFC: Choice Types #249

Open k-sareen opened 1 month ago

k-sareen commented 1 month ago

I'm commenting on the choice type RFC that you have proposed. Idk what is the best medium to discuss this in a natural way. Google Docs might be better than GitHub Issues[^1], but no need to bikeshed right now 😆

I have a couple of comments on the RFC.

Verbosity for Complex Scenarios

It feels like choice types only reduce verbosity for simple cases where you only have two types, i.e. only for cases such as A | B.

// Generic IOErr type provided by Virgil stdlib (say)
type IOErr {
  case PermissionDenied;
  [...]
}

// Enum of errors specific to my application
type ProgramErr {
  case OhNoSomethingBadHappened;
  [...]
}

def foo(file: string) -> int | IOErr | ProgramErr {
  [...]
}

def bar(file: string) {
  var f = foo(file) ||  doSomething(that);
}

Here, to the best of my understanding, that in the function bar is going to refer to the ProgramErr. In fact, the type of f is actually int | IOErr. So if I want to do something with the actual return value or theIOErr then I need to do something like:

var f = foo(file) ||  doSomething(that);
var g = f || doSomethingElse(that);

This is, honestly, a bit confusing. The verbosity increases with the number of types in the choice type. Actually it's not clear to me from the RFC if the above snippet to get the IOErr value out is even syntactically and semantically correct.

You can argue that I should have defined a new type Err which is a union of IOErr and ProgramErr to simplify my code so that the errors are all grouped together, but that only solves it for this particular case. What if I am also using another library who exposes LibXyzErr and functions in my program can mix and match error types of all three kinds? Further, even if I can group them like so: def foo(file: string) -> int | (IOErr | ProgramErr), then that solves getting the int value out, but now makes error handling less obvious (i.e. have to make new function to handle error and then I have similar issues differentiating between the errors like above). In Rust, I often just give up and use anyhow to deal with this. This is unsatisfactory as we've sort of weakened the type system (at least intuitively).

A match-statement might be better in such complex cases, but at that point choice types essentially act as shorthand for ADTs. This is not necessarily a bad thing, but in my experience with Rust and error types, it often makes dealing with errors very annoying as a developer. The increased verbosity and visual clutter associated of peppering match-statements everywhere is unappealing.

This is a tangent, but I very much subscribe to the waterbed theory of complexity, aka trying to forcefully simplify complex aspects of a system make the complexity pop-up somewhere else. The reduced complexity of choice types with the || operator works until you deal with more complex applications. As above, having two or more different kinds of error types is not unheard of, and in fact very probable.

This is a bit of a ramble, but I hope I've made my concerns regarding the verbosity clear.

Also I'll ask a related question to the above. Since IOErr | ProgramErr is not a subtype of IOErr | ProgramErr | LibXyzErr, does the implicit conversion from a value of the former type to the latter type still work?

Semantics of &&

What are the semantics of &&? I don't see any mention of it in the actual RFC. Surely the whole point of choice types is to use the choice operator (||)?

Survey of Choice Types

I went and did a survey of some programming languages to see if anyone uses choice types in a similar fashion for error handling as above. I didn't really find any, but here are some interesting ones:

Harelang uses choice types (aka tagged unions) for error handling. I'm personally not a fan of Harelang, but that's for different reasons. However, notably, the choice types are associative. They are also essentially quite similar to Rust with their ! and ? operators which essentially map to .unwrap() and ? in Rust. You can also match on the choice type like Rust for more complex types, though they don't have ADTs (and generics!!!!) in Harelang. Further, it doesn't seem like you can attach arbitrary data to errors -- you're limited to 24-bytes of storage. This is obviously quite different from your proposal since they effectively have the same issues as Rust.

Zig has a weird error handling story. Their error handling story is essentially returning an arbitrary error set to a function and using try-catch expressions (not blocks) to handle them. They also do not have support to attach arbitrary data to errors currently but it is planned.

I guess technically Go's error handling story is essentially a choice type where you are forced to only have one error type. But let's not get into how broken its error handling is. Taking inspiration from Go is probably not the best idea.

General Comments on Error Handling System

Exceptions are bad, sure. But I equally think using generic ADTs like Result, Maybe, etc. leads to the proliferation of unwrap() etc. that you mentioned. Not to mention, the increased verbosity and visual clutter for complex cases and when you actually have to handle errors.

I would argue that an error handling system should allow custom/user-defined error types but be "special" in some way (such as your proposed choice types) that allows them to be handled tersely. Currently I'm just unsure how choice types will help in more complex scenarios. Maybe it's just impossible to make a satisfactorily terse error handling system for complex scenarios due to the waterbed theory of complexity. Or maybe all we need is a special err<T> type along with choice types which allows the developer to return arbitrary values such as tuples, strings, ints, etc. as error payloads. I'm not sure.

I'm open to discussing further since I think we would all greatly benefit from a proper error handling system in Virgil instead of coming up with our own ad-hoc solutions for every library or program we write.

[^1]: Honestly, GitHub Discussions is poorly designed -- you (as in generic you) want to consolidate information, not spread it out over five different places, but this is the old curmudgeon in me talking.

titzer commented 2 weeks ago

Thanks for sharing your thoughts, I have read this comment a couple times and will try to boil it down:

First, suppose:

def trySomething() -> A | Err1;
def tryMore(t: A) -> B | Err1;
def tryLast(r: B) -> C | Err2;
def recover1<T>(e: Err1) -> T;

The main idea is that we want to write tryLast(tryMore(trySomething())) and handle all the errors properly, by either using recover1 or returning the error to the caller.

  1. Choice type composability doesn't really work. I agree. There are a couple choices, e.g. making the (A | B) operator associative, so one could write C | Err1 | Err2, to mean `C | (Err1 | Err2) or using somehow use subtyping for unification.
  2. The && operator wasn't explained. I was thinking this would be a kind of dual of || where the success value would get bound as that in the RHS and could "bubble" right, meanwhile the error bubbles "out". So trySomething() && tryMore(that) would end up with B | Err1. You could then further do trySomething() && tryMore(that) && tryLast(that) and get type C | Err1 | Err2.
  3. Maybe the || syntax is actually misleading. I was hoping that it would be a kind of recovery mechanism where the error could be handled and then a value returned that unified with the LHS. E.g. being able to write trySomething() || recover1(that), which would have type B, e.g. by reporting and recovering from the error.

The more I think about it, the more I realize that just having something like trySomething() || return would be useful. The return expression has the bottom type; it doesn't return a value, so wouldn't contribute another choice. Or, even trySomething() || goto Error. Not literally a goto (or maybe!?). The idea being that control flow can happen instead of needing to produce a value, so that the expression can have the "success" value of B.

k-sareen commented 2 weeks ago

Choice type composability doesn't really work. I agree. There are a couple choices, e.g. making the (A | B) operator associative, so one could write C | Err1 | Err2, to mean `C | (Err1 | Err2) or using somehow use subtyping for unification.

Yes I think this is personally my biggest concern. I think making it associative will help specially with something like I've proposed below.

The && operator wasn't explained. I was thinking this would be a kind of dual of || where the success value would get bound as that in the RHS and could "bubble" right, meanwhile the error bubbles "out". So trySomething() && tryMore(that) would end up with B | Err1. You could then further do trySomething() && tryMore(that) && tryLast(that) and get type C | Err1 | Err2

Oh that's a cool idea. So similar to combining shell commands. I like this idea.

Maybe the || syntax is actually misleading. I was hoping that it would be a kind of recovery mechanism where the error could be handled and then a value returned that unified with the LHS. E.g. being able to write trySomething() || recover1(that), which would have type B, e.g. by reporting and recovering from the error.

May I suggest a try-else expression?

try tryLast(try tryMore(try trySomething()));

Semantics could be that try returns the value if everything is fine, but it will immediately exit and return the error to the caller. And then we could do:

try tryLast(try tryMore(try trySomething() else recover1()) else recover1());

In the case we want to handle the error and recover immediately. Though I don't have a good idea of how to use the error in the recover1() function. Maybe it's implicitly passed in (probably bad idea)? Or we have a keyword like that in your original proposal that binds to the error. So like:

try tryLast(try tryMore(try trySomething() else recover1(that)) else recover1(that));

The assumption here is that choice types are right-associative. So A | B | C = A | (B | C). This would allow the that keyword to bind to the type B | C.

A benefit of something like try-else is that we can easily do defaults:

var v = try doSomething() else "Hello";

And that call sites are easily auditable (grep for try!).

One issue that I haven't thought about too much yet is how does the early return work in cases like:

def doSomething() -> A | Err1;
def doSomethingElse() -> B | Err2 {
  var t = try doSomething();
}

In the case the try doSomething() fails above, the error returned will be of type Err1. But Err1 is not in the type signature of doSomethingElse. So I think this should be a compilation failure that can be fixed in one of two ways:

(i) By adding Err1 to valid list of return values for doSomethingElse

def doSomething() -> A | Err1;
def doSomethingElse() -> B | Err1 | Err2 {
  var t = try doSomething();
}

OR (ii) By handling/recovering from the error inside doSomethingElse:

def doSomething() -> A | Err1;
def doSomethingElse() -> B | Err2 {
  var t = try doSomething() else A.new();
}

The issue with this, of course, is that we propagate the error types everywhere when we don't want to handle then which might be too verbose (though this is very subjective).