nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.25k stars 1.46k forks source link

Rework Nim's exception handling #8363

Closed Araq closed 1 year ago

Araq commented 5 years ago

Nim's exception handling is currently tied to Nim's garbage collector and every raised exception triggers an allocation. For embedded systems this is not optimal, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0709r1.pdf for more details.

So, here is what I propose: Clearly distinguish between "bugs" and "errors". Bugs are not mapped to exceptions.

The builtin "bugs" are mapped to the new system.FatalError enum, runtime errors are mapped to system.Error. FatalErrors are usually not recovered from, unless a "supervisor" is used. (See the section about them.)


type
  FatalError* = enum         ## Programming bugs and other error
                             ## conditions most programs cannot
                             ## realistically recover from.
                             ## However, a "supervisor" can be
                             ## registered that does allow to
                             ## recover from them.
    IndexError,              ## index out of bounds
    FieldError,              ## invalid object field accessed
    RangeError,              ## value out of its valid range
    ReraiseError,            ## nothing to re-raise

    ObjectAssignmentError,   ## object assignment would lose data (object slicing problem)
    ObjectConversionError,   ## object is not a subtype of the given type
    MoveError,               ## object cannot be accessed as it was moved
    DivByZeroError,          ## divison by zero

    OverflowError,           ## integer arithmetic under- or overflow
    AccessViolationError,    ## segfault
    NilAccessError,          ## attempt to deref a 'nil' value
    AssertionError,          ## assertion failed. Is used to indicate wrong API usage
    DeadThreadError,         ## the thread cannot be accessed as its dead.
    LibraryError,            ## a DLL could not be loaded

    OutOfMemError,           ## the system ran out of heap space
    StackOverflowError,
    FloatInvalidOpError,
    FloatDivByZeroError,
    FloatOverflowError,
    FloatUnderflowError,
    FloatInexactError

  Error* = enum ## NOTE: Not yet the real, exhaustive list of possible errors!
    NoError,
    SyntaxError,
    IOError, EOFError, OSError,

    ResourceExhaustedError,
    KeyError,
    ValueError,
    ProtocolError,
    TimeoutError

proc isBug*(e: FatalError): bool = e <= AssertionError

Every error that can be caught is of the type Error. The effect system tracks if a proc can potentially raise such an error. This means that an "empty" except can be used to conveniently check for any meaningful exception, comparable in convenience to an if statement/expression:


let x = try: parseJson(input) except: quit(getCurrentExceptionMsg())

The runtime calls system.panic on a detected bug, which roughly has this implementation:


var panicHook*: proc (e: FatalError; msg: cstring) {.nimcall.}

proc panic(e: FatalError; msg: cstring) {.noreturn.} =
  if panicHook != nil:
    panicHook(e, msg)
  else:
    if msg != nil: stdout.write msg
    quit(ord(e)+1)

Out of memory

Out of memory is a fatal error.

Rationale: "Stack unwinding with destructors invocations that free memory" is too much gambling, if it works, that's mostly by chance as in more complex programs it becomes intractable.

The systems that are designed to continue gracefully in case of an OOM situation should make use of the proposed "supervisor" feature. The systems that are proven to work with a fixed amount of heap space should allocate this space upfront (the allocator already has some support for this) and then an OOM situation is a fatal, unrecoverable error too.

Implementation of 'try'

There are different implementation strategies possible for these light weight exceptions:

  1. C setjmp. Since this is pretty expensive, this will not be used.
  2. C++'s try statement. Will be used for the C++ target unless the C++ dialect does not support exceptions. This is likely to offer the best performance.
  3. Map Error to an additional return value and propagate it explicitly in the generated code via if (retError) goto error;
  4. Map Error to a hidden pointer parameter so and propagate it explicitly in the generated code via if (*errorParam) goto error;. This is probably slower than using a return value, but this requires benchmarking.

The supervisor

Fatal errors cannot be caught by try. For servers in particular or for embedded devices that lack an OS a "supervisor" can be used to make a subsystem shut down gracefully without shutting down the complete process. The supervisor feature requires extensive stdlib support; every open call needs to register the file descriptor to a threadlocal list so that they can be closed even in the case of a fatal error which does not run the destructors if any implementation strategy but (2) is used. Ideally also the memory allocator supports "safe points", so every allocation that was done by the failed subsystem can be rolled back. For the usual segregated free-list allocators this seems feasible to provide.

The supervisor sets the panicHook to a proc that does the raise or longjmp operation. The supervisor runs on the same thread as the supervised subsystem.

Rationale: Easier to implement, stack corruptions are super rare even in low level Nim code, embedded systems might not support threads. Erlang's supervisors conflate "recover from this subsystem" with a concurrency mechanism. This seems to be unnecessary.

A supervisor will be part of the stdlib, but one also can write his own. A possible implementation looks like:


template shield(body: untyped) =
  let oldPanicHook = panicHook
  var target {.global.}: JmpBuf
  panicHook = proc (e: FatalError; msg: cstring) =
    log(e, msg)
    longjmp(target)
  let oldHeap = heapSnapshot()
  var failed = false
  if setjmp(target) == 0:
    try:
      body
    except:
      failed = true
  else:
    failed = true
  if failed:
    heapRollback(oldHeap)
    closeFileDescs()
  panicHook = oldPanicHook

# in a typical web server, that must have some resistance against
# programming bugs, this would be used like so:
shield:
  handleWebRequest()

How well the heap can be rolled back and external resources can be freed is a quality of implementation issue, even in the best case, created files would not be removed on failure. Perfect isolation is not even provided by Erlang's supervisors, in fact, every write to a disk is not rolled back! The supervisor feature is not a substitute for virtualization.

andreaferretti commented 5 years ago

I guess it would be possible, but then you would have to deal with a value of this generated type. I am not sure this buys you anything

cooldome commented 5 years ago

@Araq, does new proposed exception handling will still provide stacktraces?

Araq commented 5 years ago

Yeah, but we also already have system.writeStackTrace and system.getStackTrace and that's independent from the exception mechanism. (Yes, I know you need to query the stack before the exception unwinds the stack.)

Varriount commented 5 years ago

@andreaferretti The compiler already knows exactly what kind of errors a procedure can throw. That means, given 2 enum types a function can throw, the compiler turn this:

try:
  let res = exceptionalProc()
except AValue:
   echo getCurrentException(), " A"
except BValue:
   echo getCurrentException(), " B"

Into, roughly,

let (errVariant, res) = exceptionalProc()
if errVariant.errType == 1 and errVariant.err1Value == AValue:
   echo getCurrentException(), " A"
if errVariant.errType == 2 and errVariant.err2Value == BValue:
   echo getCurrentException(), " B"

The datatype used behind the scenes would be something like

type ErrValue = object
  case errTypeKind: enum # Dynamically generated based on possible thrown types
  of etk1:
    errValue1: enum
  # Repeat for other possible thrown types
andreaferretti commented 5 years ago

@Varriount yeah, that could be done, at the cost of potentially generating a new type for each proc that throws.

That may be more convenient, but declaring your variant type has some advantages too: you are kind of forced to declare all errors that your library can throw, and convert external errors in your type. This is a pattern that is not too bad.

mratsim commented 5 years ago

To complement the distinction between bugs and recoverable errors and @zah post, here are relevant snippets from https://gist.github.com/zah/d2d729b39d95a1dfedf8183ca35043b3 (it should probably be public btw) and the corresponding blog post http://joeduffyblog.com/2016/02/07/the-error-model/

Also this RFC is strongly linke to the strict mode RFC https://github.com/nim-lang/Nim/issues/7826


From the blog post

Principles

As we set out on this journey, we called out several requirements of a good Error Model:

Overview of error models

2018-07-20_10-38-34

Bugs Aren’t Recoverable Errors!

A critical distinction we made early on is the difference between recoverable errors and bugs:


Varriount commented 5 years ago

Just a situation that I ran into today, which I think should be kept in mind for whatever system is decided upon: I'm currently writing a command shell (like, bash, cmd.exe, etc). Part of the for statement involves declaring a regular expression, like for <regex> in <command>, so that the output of the command can be split into pieces.

The code looks roughly like this:

var regex: Regex
try:
  regex = toPattern( arguments[0])
except RegexError:
  echo getCurrentException().msg # Prints out the type of error *and* its place within the input

When arguments[0] contains "a[", the following is printed out:

Invalid set. Missing `]`
a[
 ^

Note that in this case, the error contains information about both the class of error, and the place the error occurred.

Skrylar commented 5 years ago

As long as "not much" changes on the written code side (ex. @zah pointing out the category/code numbers, mapped through macros) it seems okay.

@Varriount I would argue you should read data like that from an exception's parameters. Error systems that only supply data via a string (and I'm not accusing Regex of this per se) are horrible.

Arguing that because average programmers don't wield exception types in a semantically pleasing way is the same bogus "average idiots do it wrong so take away everyone's cake" argument that has the macro system suppressed from, well, almost every language.

nitely commented 5 years ago

@Varriount I would argue you should read data like that from an exception's parameters.

Well, I guess that can be useful for presenting the data in other formats. That does not mean the message should be removed from the traceback. Having a custom error within the traceback is beyond useful, I do that everywhere. Handling custom error messages in a different way for every library wouldn't be pretty, to say the least. Most users consuming an API just want the error to be shown within the traceback. I don't want to deal with "exception's parameters" anywhere (not in code I write nor in code I use), that's an edge case.

rayman22201 commented 5 years ago

I finally had some time to read the C++ proposal in more detail. I have some thoughts.

The C++ proposal allows both old style ref exceptions and new style variant exceptions to coexist. In the paper this is done by wrapping blocks that use variant exceptions with a "throws" keyword.

A similar thing could be implemented with Nim pragmas to identify functions that throw variant exceptions.

In the C++ proposal, catch can accept both types of exceptions, and can transparently convert from one type to the other. We could do something similar in Nim with converters.

Using the effect system we could track which functions throw which type of exception, and annotate the catching function with what type of exception it is willing to catch. Or perhaps also globally with a compiler flag?

The major cost of this from a language standpoint is that catch now has to be much smarter, as it has to handle two different classes of exceptions.

This results in two classes of exceptions, which isn't ideal, but there is some precedent for allowing multiple systems to co-exist: the multiple GC options and the destructor system, and more directly related the {.gcsafe.} pragma. I realize this is not an "apples to apples" comparison though.

rayman22201 commented 5 years ago

Regarding the discussion between @andreaferretti and @Varriount about the combinatorial explosion of variants. I am personally a big fan of @andreaferretti's proposal. It allows for a large amount of expressiveness in error handling, while still achieving the desired efficiency properties.

There is also strong arguments that forcing a library author to handle lower level exceptions and translate them into more contextually useful exceptions as you go higher up the stack is a good practice. Currently wide spread exception systems don't encourage this behavior, and it causes real world issues. @see this quote from section 4.1.9 of the C++ proposal:

Propagating arbitrarily typed exceptions is not composable, and is often lossy. It is already true that intermediate code should translate lower-level exception types to higher-level types as the exception moves across a semantic boundary where the error reporting type changes, so as to preserve correct semantic meaning at each level. In practice, however, programmers almost never do that consistently, and this has been observed in every language with typed propagated exceptions that I know of, including Java, C#, and Objective-C (see [Squires 2017] for entertaining discussion). This proposal actively supports and automates this existing best practice by embracing throwing values that can be propagated with less loss of specific information....

Propagating arbitrarily typed exceptions breaks encapsulation by leaking implementation detail types from lower levels. (“Huh? What’s a bsafe::invalid_key? All I did was call PrintText!”) As a direct result...

That being said, maybe there is a middle ground between libraries having to manually define and translate all error variants and the compiler auto-generating a combined variant error type that is a giant Frankenstein combination of all possible error types.

Perhaps a macro could be used to generate an error type that combines the error types the programmer knows they want to throw. A pseudo-automated approach. @see straw man example below:

type
  myLibraryError* = enum
    errA,
    errB

type combinedError* = combinedEnum
    myLibraryError
    stdLibError,
    someOtherLibraryError

This is more work from the library author, but less work than having to manually translate all the error types across module boundries. It also prevents the problem of a bunch of "similar but not quite the same" error types that @Varriount raised.

In theory this looks like a more labor intensive equivalent to the completely automated approach (the compiler auto-generating a bunch of combinatorial error types).

I predict that in practice the library author knows better than the compiler what exceptions they want to handle / throw: This is a programmer guided manual pruning of the possible error combinations.

If the program throws an exception that the library author didn't expect, that is arguably a bug, and should fail fast inside the library code anyway. The compiler could even detect and error if the program doesn't handle all the necessary variant types.

I'm not sure if this idea is good or not, I'm throwing it out there for consideration and critique :-)

Bulat-Ziganshin commented 5 years ago

straw man example below

your example just mimics hierarchy of exception classes - technique that is supported by all major languages (C++, Java, C#, Python), but not supported directly by errcodes approach as implemented in C/Rust/Go/Zig

rayman22201 commented 5 years ago

@Bulat-Ziganshin

your example just mimics hierarchy of exception classes

That is exactly the point. Many of the concerns raised on this thread are about keeping the exception hierarchy with this new system. I'm attempting to find a compromise that gives errcode like performance while also letting people have exception hierarchies.

Bulat-Ziganshin commented 5 years ago

I'm attempting to find a compromise that gives errcode like performance while also letting people have exception hierarchies.

But then why you make this citation? It says that the exception hierarchy is broken, not what it's great and we want to find a cheap way to emulate it, in particular:

There is also strong arguments that forcing a library author to handle lower level exceptions and translate them into more contextually useful exceptions as you go higher up the stack is a good practice. Currently wide spread exception systems don't encourage this behavior,

What this mean? Exception classes allows you to make hierarchies, but they cannot force you. The same is for errcode approach - you can develop an hierarchy of errcodes, but you can't force all users to do it well. So, what's the difference?

So, please make a choice: [ ] exception class hierarchy doesn't have some features you are proposing [ ] you propose cheaper implementation that provides some of the features of the class hierarchy

But please don't mix arguments of the first with proposal of the second. You idea doesn't give us any benefits OVER class hierarchy, so the arguments you cited can't be used to support it.

rayman22201 commented 5 years ago

You are forcing a false dichotomy. It is not one or the other. This proposal does have benefits over the class hierarchy. It is much cheaper! That is a very real benefit.

so the arguments you cited can't be used to support it.

You are not understanding my argument. I am personally in favor of errcode or variant style error handling. Which is what I argue at the top of the comment. Immediately below that I say:

That being said, maybe there is a middle ground...

In other words. I prefer variant type style, but many in this thread disagree and I am attempting to design a compromise.

The referenced C++ paper, gives similar arguments around a compromise, which is why they propose the system of two different exception types.

This proposal is not solely mine. It has been discussed by @Araq, @andreaferretti and @Varriount, and I reference their comments.

My contributions are specifically the idea of allowing both types of exceptions, and making the auto-generation of variant types as proposed by @Varriount more explicit. You are calling me out as if I made some radical departure from the discussion.

I argue that variant type exceptions are better for runtime performance and arguably better for library design, and cite arguments to support that claim.

I then acknowledge the earlier discussion that provide arguments against such a system, and attempt to provide a compromise.

Bulat-Ziganshin commented 5 years ago

Sorry, I don't mean that you differ from Araq proposal, or his arguments. I just can't sum up together everything already said, so I argued over your concrete post. If that contradicts Araq's arguments too, it's OK for me.

This proposal does have benefits over the class hierarchy. It is much cheaper!

Yes, here I agree. You can say that errcode approach is much cheaper. It's OK.

But if you think that it has other benefits, please give me details of these benefits. In particular, I don't think that errcodes will improve over classes in "encouraging" exception hierarchies. Do you agree with me here?

PS: Yes, I'm against the Araq proposal. I just try to dissolve my arguments into smaller, discrete pieces. I will be glad to see answers about "other benefits" from everyone, including Araq.

PPS: yeah, I'm going to read the C++ paper...

rayman22201 commented 5 years ago

But if you think that it has other benefits, please give me details of these benefits. In particular, I don't think that errcodes will improve over classes in "encouraging" exception hierarchies. Do you agree with me here?

I agree with you. It does not improve over exception hierarchies.

I personally think exception hierarchies are not as useful as people say they are, which is why I am in favor of Araq's proposal. Exceptions are crap, let me have something simpler with better performance so that I can write a Nim kernel module and actually use some std lib functions and types. But I know I am in the minority on this.

If we can design a solution that is equivalent (or at least close feature parity) to exception hierarchies (the current state of Nim), but has better performance, shouldn't we do that?

No matter what your opinion is, the paper is very good. I definitely recommend it! 😄

dom96 commented 5 years ago

To be clear: I'm largely happy with this proposal, but not for Nim v1.0. I know that at least @rayman22201 doesn't think this is being suggested for Nim pre-v1.0 but as far as I know it is. This is why I am pushing back against this.

Bulat-Ziganshin commented 5 years ago

For me, the main giveaway from the C++ paper is the following citation:

The primary design goal is conceptual integrity [Brooks "Mythical man-month"], which means that the design is coherent and reliably does what the user expects it to do. Conceptual integrity’s major supporting principles are: • Be consistent: Don’t make similar things different, including in spelling, behavior, or capability. Don’t make different things appear similar when they have different behavior or capability. • Be orthogonal: Avoid arbitrary coupling. Let features be used freely in combination. • Be general: Don’t restrict what is inherent. Don’t arbitrarily restrict a complete set of uses. Avoid special cases and partial features.

We must use those as principles for the Nim development.

endragor commented 5 years ago

TL;DR I agree with where this and #7826 RFCs go.

I recently added a bit of manually-managed data structures into a relatively complex multi-threaded framework, and faced plenty of problems with exceptions, so it's nice to see there are RFCs to address these problems. Even when not dealing with manual memory allocations/deallocations, I feel exceptions do a lot of harm in absence of RAII semantics, since every line of code you write may raise, and you have to make sure it never leaves you up in an inconsistent state (simple examples: having a mutex locked or a counter not decremented). For simple cases inserting try/finally or defer helps, but is still something you have to keep in mind and something that affects performance because of setjmp. For complex cases defer may not work as you may want to unlock/free a resource conditionally depending on what happens within the procedure. I noticed libraries that accept callbacks sometimes do not consider that the callbacks may raise and end up in a bad state if they do. This is expected, since it's easy to forget about handling exceptions when the compiler doesn't force you to.

As a solution, you can try adding {.raises: []} to procedures where it is critical, but with the current state of Nim and the stdlib it will turn your code into a mess, because you'd have to insert try/except everywhere, as even array access may raise IndexError. Even if you call alloc with --gc:none, then you suddenly have to add try/except, too, since alloc may call user-defined outOfMemHook and it may potentially raise, so effects system asks you to handle that.

Overall, I support the idea of making writing {.raises: [].} code easier, which this RFC would help a lot. However, one very useful property of exceptions is that you can see where certain problem occurred. This is almost always needed in debug mode, and in production mode, too, unless you cannot allocate space to keep the stack trace in (the embedded use case?). So if Nim goes with the RFC, a must-have is a simple built-in way to obtain stack traces of exceptions (this isn't directly mentioned in the original post, but @Araq confirmed above this will be added).

Ideally, {.raises: []} should be the default, and it's not hard to enforce that if FatalErrors just kill the program, reporting where the error occurred. The rest would be conscious decisions made by the programmer. It would help: 1) Ensure that if something within your procedure raises, you will know about it and do something about it - either handle the exception or add raises pragma to the proc. 2) Reduce possibility that a procedure may raise a whole bunch of unclear exceptions, like asyncdispatch.poll does now (raises: [Exception, ValueError, OSError, FutureError, IndexError]). When a programmer sees this, he or she would be encouraged to instead provide more clear exceptions and document them.

The downside of above is that it will make Nim less ideal for quick prototyping. The solution suggested in #7826 seems to provide good of both worlds - if you care about robustness in your project, you could use func everywhere. If you need to throw something together fast, just use proc.

dom96 commented 5 years ago

Ideally, {.raises: []} should be the default, and it's not hard to enforce that if FatalErrors just kill the program, reporting where the error occurred.

This is how it works in Java AFAIK and I think that's the reason people hate checked exceptions.

endragor commented 5 years ago

People may hate being forced to be disciplined, but it pays off when you develop something big that needs to be robust. The only two robust ways of handling (recoverable) errors I know are checked exceptions and returning errors, e.g. Result[T], like it's usually done in functional languages. But Result[T] doesn't work well without algebraic types and pattern matching, since you may still potentially access the value without checking for error.

You usually don't care about robustness when you are throwing together a quick prototype, and that's what my last paragraph was about.

I programmed on Java in the past, and checked exceptions were definitely a good thing for anything over 1k LoC - as I described earlier, they resulted in cleaner APIs where you know what to expect and make sure you handle errors. What most people hate about checked exceptions, as I understand, is that if you want to write a simple program that, say, counts lines in a file, you'll have to add exception handling boilerplate, while in Nim you can do it in 1 line: echo readFile("file.txt").splitLines().len.

The problem is that by allowing non-robust way and keeping it the default, Nim ensures that most of the ecosystem will be written that way. I'm sure that if Java didn't have checked exceptions, the code created with it would be much worse than it already is.

Araq commented 5 years ago

Java's mistake is that the default in interface declarations (and everywhere else, but it's most problematic for interfaces) is "throws nothing" and so people are forced to handle the error where they cannot handle it so they "log" the error away and everything depends on a logging framework. Alternatively unchecked exceptions are used. We can learn from this mistake and still have a more disciplined approach to error handling.

drslump commented 5 years ago

Really interesting discussion guys. After reading it and the related material I think that @Araq proposal can be tuned to gain a lot of benefits as others have pointed out, and not only solve the technical problem of supporting embedded systems.

Personally I think a key concept for handling errors in a way that scales from rapid prototyping to complex projects is to ensure that they are properly encapsulated at clear boundaries.

In the case of Nim there is no concept of packages so the only way I see as a natural boundary would be module exports, where each function exported must handle recoverable errors from the stuff it uses and only raise or forward its own error definitions (and perhaps a very well known set of core ones). That means that when prototyping you don't need to care but if you're creating a module then it's your responsibility to handle errors, it's just not fair to only do the funny part of implementing the success branch and push boring error handling to the module user 😄

I'm not familiar enough with Nim though to know with any certainty if a module export is a broad enough boundary for this to work right. It might be a bit of a pain if you want to refactor a library into separate modules for example. Perhaps that could be solved with a pragma that makes that module "private"?

My point though is that once the locality of the error handling is enforced, exception class hierarchies are seldom required, you can use a module and know exactly what errors you can expect from it, instead of some generic catch all that basically logs the error and wishes the program can continue running by sheer luck.

Araq commented 1 year ago

This issue should have been an RFC and it's now so old that I'm not sure it still reflects my opinions. :-)