python / exceptiongroups

An early draft of a PEP around Exception Groups in Python
22 stars 4 forks source link

Introducing try..catch #3

Closed gvanrossum closed 3 years ago

gvanrossum commented 3 years ago

If we find we really need new syntax to flag that people have thought about the consequences of multi-exceptions, we could switch the existing try..except syntax to try..catch. It seems most other languages use the latter keyword anyway.

Syntax

The syntax would just replace except with catch, leaving everything else the same.

We have the option however of disallowing catch: (i.e. with no exception, the catch-all block) -- forcing people (and automatic translators) to write catch BaseException:.

Note that catch would have to be a soft keyword (supported by the new PEG parser, see PEP 617), since there are plenty of other uses of catch in existing code that we don't want to break.

Transition

The transition plan would be that try..exept will eventually be removed from the language. There would be three stages:

  1. try..catch and try..except can both be used.
  2. try..except works but gives a deprecation warning.
  3. try..except stops working.

Possibly stage 2 can be split and try..except inside async functions can be deprecated sooner than in other contexts.

During stages 1 and 2, each try statement must use either catch or except -- you cannot have both catch and except blocks in the same statement. (But you can have them in the same file.)

Semantics

When the raised exception is not a multi-exception the semantics of try..catch is the same as for try..except.

When the raised exception is a multi-error the semantics change.

Basically when a multi-error contains different exception types it is possible that more than one catch block runs. E.g.

try:
    raise MultiError([ValueError(), ZeroDivisionError(), RuntimeError()])  # or whatever
catch ValueError:
    print("VE")
catch RuntimeError:
    print("RE")

would print "VE" and "RE" and then raise (bubble up) RuntimeError() (or MultiError([RuntimeError()]).

If there's an else block it only gets run if no exceptions were raised in the first place.

If there's a finally block it gets run after all catch blocks (if any) have run, before bubbling up the unhandled exceptions (if any).

The order in which the exceptions in the multi-error are handled is just the order in which the MultiError object regurgitates them.

Multiple exceptions of the same type

This is an open issue.

What if the try block raises MultiError([ValueError("A"), ValueError("B")])? We could define different semantics.

Note that there would also be a question about two different exception classes that derive from the same base class, where the catch class specifies that base (or ultimately catch BaseException). So we cannot rely on MultiError to split exceptions based on class before we start matching exceptions to catch blocks.

TO BE CONTINUED IN A LATER COMMENT (I deleted some meta-comments related to this.)

aeros commented 3 years ago

I very much like the semantics and look of the proposed try...catch! +1 from me (based on what's currently written).

My only hesitation really is about selling the deprecation and eventual removal of try...except in the PEP; considering the amount of existing non-concurrent code, docs, books, etc. that already uses it (where there would be no benefit from switching to try...catch). Assuming there's zero risk of that code ever seeing a multi-exception, they could automatically switch everything to using catch, but for stateful code that relies on an operation occurring once (such as a rollback), propagating a multi-exception to it could be rather disastrous (I believe @gvanrossum brought this up during the sprint meeting as an argument against changing the behavior of try...except).

Also, I anticipate that if except were to be eventually removed, the number of versions between the deprecation and removal would have to be much higher than the usual (maybe even 4.x?).

During the phase that except is around and multi-exceptions exist, would a try...except simply ignore it and treat the entire exception group as if it were uncaught or instead somehow direct users towards using try...catch with a more specific error? The latter seems preferable to me, although it could come at a slight cost to existing code that uses catch (the main concern being frequently executed functions with try: d['k'] except KeyError: ....

What if the try block raises MultiError([ValueError("A"), ValueError("B")])? We could define different semantics.

I'd expect that if there was a catch ValueError block, it would be run twice for both of the above exceptions, or "swallow" both at once and execute a single time with catch Group[ValueError]. If a user wants to handle the passed arguments differently for "A" and "B", they could access the .args of the exception object (this is assuming that they could use catch ValueError as ex, and on the first execution ex would be ValueError("A") and on the second ex would be ValueError("B")). To me, it makes sense based on the other logic of catch, but this might be too implicit.

As an alternative, maybe we could use something along the lines of the pattern matching syntax to catch an exception based on its arguments. However, that does come with the potential issue of causing breakages when exception messages change (which is not something that should be currently relied on since they're not guaranteed to be the same across versions).

Note that there would also be a question about two different exception classes that derive from the same base class, where the catch class specifies that base (or ultimately catch BaseException). So we cannot rely on MultiError to split exceptions based on class before we start matching exceptions to catch blocks.

This does not solve the issue, but I think part of this could be simplified if we were to specifically forbid catch BaseException. In cases like asyncio that rely on CancelledError for signaling (which inherits from BaseException), catch BaseException would definitely be a harmful anti-pattern.

iritkatriel commented 3 years ago

Regarding catch semantics for multiple exceptions of the same type - we could do this:

try:
    raise ValueError(1), ValueError(2)
catch ValueError as e:
    # this block executes twice, once for each ValueError
try:
    raise ValueError(1), ValueError(2)
catch Group[ValueError] as e:       # or *ValueError or some other variation
    # this block executes once, for e which is the whole group (tuple or something)

And this is a parser error:

try:
    raise ValueError(1), ValueError(2)
catch Group[ValueError] as e:       # or *ValueError or some other variation
    pass
catch ValueError as e:
    pass
aeros commented 3 years ago

Regarding catch semantics for multiple exceptions of the same type - we could do this:

It looks like at least the two of us agree on the semantics of multiple exceptions of the same type -- those examples do a good job of effectively expressing what I wrote in my above comment (other than the parser error, but I agree with that as well). :-)

iritkatriel commented 3 years ago

Actually currently you can do this:

>>> try:
...     pass
... except ValueError as e:
...     pass
... except ValueError as e:
...     pass
...
>>>

I guess the first matching block wins. So maybe we don't need to trouble the parser with this.

aeros commented 3 years ago

I guess the first matching block wins. So maybe we don't need to trouble the parser with this.

Although that's the current behavior, I'm not sure that it would be at all desirable, and might be a bit more of a subtle issue with there being a catch Group[ValueError] and a catch ValueError immediately after. Over on the pattern matching side of things (PEP 634), during the sprint, there was work on the following causing a syntax error:

>>> match ...:
...     case spam | eggs if whatever():
...         pass
... 
  File "<stdin>", line 2
SyntaxError: the previous pattern always matches, making this one unreachable

I think a somewhat similar approach could be taken with exception groups, and would likely help users that are just learning how catch works, with it having more complexity than except.

gvanrossum commented 3 years ago

Multiple exceptions of the same type

What if the try block raises MultiError([ValueError("A"), ValueError("B")])? We could define different semantics.

Note that there would also be a question about two different exception classes that derive from the same base class, where the catch class specifies that base (or ultimately catch BaseException). So we probably can't rely on MultiError to split exceptions based on class before we start matching exceptions to catch blocks.

Another concern here is that static type checkers expect the type of e in except ValueError as e: to be ValueError, and we should keep this for catch ValueError as e:. We can preserve this by running the catch block multiple times, once for each matching exception.

Another approach might be to allow a * in the syntax, like this:

catch *ValueError as e:  # type of e is MultiError[ValueError]

However I don't know how this would work if a superclass of some exception type is being caught:

class E(Exception): ...
class E1(E): ...
class E2(E): ...

try:
    raise MultiError([E1(), ValueError(), E2()])
catch *E as e:
    # Ideally e would be MultiError([E1(), E2()])

To unpack this problem I need to take a few steps back.

=== Translation into pseudo bytecode

Let's consider how try/except is currently translated into bytecode. Consider:

try:
    foo()
except E as e:
    bar(e)

This currently translates to something like

    SETUP_FINALLY (handler1)
    <foo()>
    POP_BLOCK
    JUMP_FORWARD (done)

handler1:
    DUP_TOP
    <load E>
    JUMP_IF_NOT_EXC_MATCH(cleanup)
    POP_TOP
    STORE_FAST(e)
    POP_TOP
    SETUP_FINALLY(handler1a)
    <bar(e)>
    POP_BLOCK
    POP_EXCEPT
    <e = None; del e>
    JUMP_FORWARD(done)
handler1a:
    <e = None; del e>
    RERAISE
cleanup:
    RERAISE

done:

(This looks differently from when I first invented Python bytecode, though not terribly so. :-)

Let's try to translate it to Python code. We get something like

try:
    foo()
except BaseException as _err:
    if _exc_match(_err, E1):
        e = _err
        bar(e)
        <dance to erase e>
    elif _exc_match(_err, E2):
        e = _err
        <other handler code>
        <dance to erase e>
    else:
        raise

Except that the <dance to erase e> stuff actually looks more like this:

    try:
        e = _err
        <handler code>
    finally:
        e = None
        del e

So if we factor that out, our rewrite looks a little simpler, even if we add a second except block:

try:
    <try block>
except BaseException as _err:
    if _exc_match(_err, E1):
        <handler 1>
    elif _exc_match(_err, E2):
        <handler 2>
    else:
        raise

Now I think we have the tools in hand to propose rewrites for multi-errors.

Handling multi-errors this way

The semantics that call the handler block once for each exception it matches would be like this, modulo several details:

try:
    <try block>
except MultiError as _multi:
    unhandled = []
    for _err in _multi:
        if _exc_match(_err, E1):
            <handler 1>
        elif _exc_match(_err, E2):
            <handler 2>
        else:
            unhandled.append(_err)
    if unhandled:
        raise MultiError(unhandled)

But each <handler> block should have its own try..except to deal with errors bubbling out of there, something like this:

try:
    <user handler code>
except BaseException as _err1:
    unhandled.append(_err1)

(It's really a bit more complicated still due to exception context for bare raise and raise..from, leaving this as an exercise for the reader.)

There's also another complication, the additional structure inside MultiError (or, in Trio's terms, ExceptionGroup) used for tracebacks that share a common tail. We need to preserve this structure, so instead of just calling MultiError(unhandled) we probably will have to call something like _multi.extended_subset(unhandled) (or perhaps exceptions raised from handlers will have to be collected in a separate list that is passed in separately -- though even those will have to keep their __cause__ and __context__ attributes).

Anyway, these added complexities seem manageable.

(Another issue is how to handle non-multi-errors without adding more code. Another exercise for the reader.)

But now for the alternative proposal, using similar primitives:

Handling multi-errors using catch *E as e:

Maybe this requires us to mutate _multi in place, something like this:

...
except MultiError as _multi:
    # No more loop; no elif either
    if (e := _multi.exc_match(E1)) is not None:
        <handle E1>
    if (e := _multi.exc_match(E2)) is not None:
        <handle E2>
    if not _multi.is_empty():
        raise _multi

Here _multi.exc_match(E) needs to extract all exceptions that match E (including subclasses) -- it basically splits the original MultiError into two subsets, one of individual exceptions that match E, one of individual exceptions that don't match E. It returns the matching exceptions as another MultiError instance, and prunes the original down to the non-matching exceptions.

Again the handlers need to somehow push exceptions that bubble out of them back into _multi, something like

try:
    <user handler code>
except BaseException as _err1:
    _multi.push_back(_err1)

This should of course take context, cause and bare raise (re-raise) into account.

Honestly, both approaches look doable, though maybe it wouldn't be great to combine catch E and catch *E in the same try statement.

Another option

Maybe we needn't switch from except to catch -- maybe the new syntax can just be except *E as e: and in that case the type of e will be MultiError[E]. Either all handlers should use *E or none of them should.

The semantics for except E should then be entirely unchanged from the current semantics (in particular, at most one handler runs), and if a multi-error is ever raised from the try block in that case, it will not be caught except by except BaseException and by bare except: (and by except MultiError).

iritkatriel commented 3 years ago

The alternative at the end ("Another Option") means that as the user of an async library, if you raise X you must catch *X. If you try to catch what you raised you won't get it. I think this will be confusing.

iritkatriel commented 3 years ago
class MultiTraceback(Traceback):
    tb_frame : Frame
    tb_next_all: Dict[BaseException, Traceback]

    def __init__(self, frame, tb_next_all = {}):
        self.tb_frame = frame
        self.tb_next_all = tb_next_all

    def add(exc : BaseException):
        ''' add an exception to this tb group '''
        tb_next_all[exc] = exc.tb

    def split(List[BaseException] excs):  
        ''' remove excs from this tb group and return a 
        new tb group for them, with same frame
        '''
        r = {(k,v) for k,v in self.tb_next_all if k in excs}
        self.tb_next_all.remove(r)
        return MultiTraceback(self.tb_frame, r)
iritkatriel commented 3 years ago
class MultiException(BaseException):
    excs: List[BaseException]
    tb : MultiTraceBack

    def __init__(self, excs: List[BaseException], tb=None):
        self.excs = excs
        if tb:
            self.tb = tb
        else:
            self.tb = MultiTraceback(currentFrame())
            for e in excs:
                 self.tb.add(e)

    def add_exc(self, e):
        self.excs.add(e)
        self.tb.add(e)

    def exc_match(self, E):
        ''' remove the exceptions that match E
        and return them in a new MultiException
        '''
        matches = set()
        for e in self.excs:
            if match(e, E):                
                matches.add(e)
        self.excs.remove(matches)
        tb = self.tb.split(matches)
        return MultiException(matches, tb)

    def push_frame(self, frame):
        self.tb = Traceback(frame, tb_next=self.tb)
iritkatriel commented 3 years ago

Now we use it in @gvanrossum's code snippets like this:

    if (e := _multi.exc_match(E1)) is not None:
        tb1 = _multi.tb.split(e)    #  <----     remove the matched exceptions from the traceback
        <handle E1>

At the end of all the except clauses: _multi contains the unhandled exceptions, so we do

[ _multi.add_exc(e) for e in new_exceptions_we_raised_here]

and raise _multi

gvanrossum commented 3 years ago

Thanks! We should probably compare this to https://github.com/python-trio/trio/issues/611

gvanrossum commented 3 years ago

Perhaps OT, but here's a list of things (probably incomplete) in the life of an exception.

In all cases, "traceback item pushed" is basically

e.__traceback__ = traceback(tb_frame=sys._getframe(), tb_next=e.__traceback__)

Also, raise e from e1 sets e.__cause__ = e1 and otherwise does whatever raise e does. (In particular, doing this inside an except block sets both __cause__ and __context__.)

iritkatriel commented 3 years ago

There's also raise e.with_traceback()

gvanrossum commented 3 years ago

Oh! So

raise e.with_traceback(tb)

seems to be equivalent to

e.__traceback__ = tb
raise e
gvanrossum commented 3 years ago

Anyway, the relevance of that list is that in all cases where an exception is raised, either nothing is done to it (bare raise), or one traceback item is pushed.

With this proposal, none of the places that do that have to change, and that seems a big advantage over having to change the code to reverse the linked list (note that the public API exposes the linked list so it somehow has to be reversed when the traceback goes public) or having to push a traceback item on each of several exceptions.

The code to catch multi-errors will be complicated no matter what we do.

Constructing and raising multi-errors should probably be done in "userspace", i.e. Python, not C, at least for the initial release, maybe forever. I imagine that task groups will be implemented in userspace, catching exceptions one at a time and constructing a multi-error by hand from these, then raising the multi-error using raise. The RAISE opcodes[1] then separate the exception from the traceback to conform to the triple (type, value, tb) format used internally.

Internally, the "push traceback item" is then implemented in PyTraceback_Here(), which just creates a new traceback object pointing to the current frame and pushes it onto the separately maintained list of tracebacks in the thread state (curexc_traceback).

The final piece of the puzzle (for me) is how the exception and traceback are recombined after some frames have been popped and corresponding traceback items pushed on the thread state. This seems to be the call to PyException_SetTraceback() in ceval.c, in the block labeled exception_unwind. It looks like at that point the traceback alread present in the exception is simply overwritten (by PyException_SetTraceback()) with the traceback from the thread state. Effectively this just extends the traceback with items that were pushed on the thread state since the RAISE opcode (unless the intervening C code messed with this stuff: you can only reach Python code by catching the exception, and you only go back to C by raising it).

[1] There are two separate RAISE opcodes, RERAISE which is used for bare raise and RAISE_VARARGS which is used for raise e and raise e from e1. (Actually bare 'raise' outside an 'except' block uses RAISE_VARARGS; it is valid as long as it's called from an 'except' block.)

gvanrossum commented 3 years ago

(Note that pushing a traceback item also records the frame's current instruction pointer and linenumber, so you can tell exceptions apart that were raised at different points in the same frame.)

iritkatriel commented 3 years ago

Experimental implementation of exception groups: https://github.com/iritkatriel/cpython/tree/exceptionGroup

There are two demo files there - exception_group.py along with its output.txt.

Comments:

  1. Here tb_next_map was added to traceback to avoid needing to subclass traceback for now. Q: if we turn the tb_next field from (struct _traceback *) to (PyObject*), can we then just assign a dict to it for traceback group and not subclass?

  2. There is some error checking missing (including cycle detection)

  3. the tb_next_map should be made to hold weak keys.

  4. I gave up for now on adjusting the traceback.py print_tb and implemented my own render_exception.

  5. The tracebacks contain the init of ExceptionGroup - that's an artefact of how the test was constructed.

1st1 commented 3 years ago

@iritkatriel Great! I've created a PR against your repo: https://github.com/iritkatriel/cpython/pull/1. The PR has an implementation of asyncio.TaskGroups that I wrote for EdgeDB. It's relatively well tested and we can use it as a tool to discover polish our ExceptionGroup implementation.

So far the output of a test script (tg1.py in the PR) looks a bit confusing:

import asyncio
import types

async def t1():
    await asyncio.sleep(0.5)
    1 / 0

async def t2():
    async with asyncio.TaskGroup() as tg:
        tg.create_task(t21())
        tg.create_task(t22())

async def t21():
    await asyncio.sleep(0.3)
    raise ValueError

async def t22():
    await asyncio.sleep(0.7)
    raise TypeError

async def main():
    async with asyncio.TaskGroup() as tg:
        tg.create_task(t1())
        tg.create_task(t2())

def run(*args):
    try:
        asyncio.run(*args)
    except types.ExceptionGroup as e:
        print('============')
        types.ExceptionGroup.render(e)
        print('^^^^^^^^^^^^')
        raise

run(main())
ExceptionGroup({ExceptionGroup({ValueError()})})
 <frame at 0x7fc872d13150, file '/Users/yury/dev/pydev/cpython/tg1.py', line 32, code run>
 <frame at 0x7fc862c832c0, file '/Users/yury/dev/pydev/cpython/Lib/asyncio/runners.py', line 52, code run>
 <frame at 0x1054ef240, file '/Users/yury/dev/pydev/cpython/Lib/asyncio/base_events.py', line 642, code run_until_complete>
 <frame at 0x104bda810, file '/Users/yury/dev/pydev/cpython/tg1.py', line 24, code main>
 <frame at 0x104cd5850, file '/Users/yury/dev/pydev/cpython/Lib/asyncio/taskgroup.py', line 168, code __aexit__>
 <frame at 0x1054efbf0, file '/Users/yury/dev/pydev/cpython/Lib/types.py', line 314, code __init__>
---------------------------------------
ExceptionGroup({ValueError()})
     <frame at 0x1054f0410, file '/Users/yury/dev/pydev/cpython/tg1.py', line 11, code t2>
     <frame at 0x10534d250, file '/Users/yury/dev/pydev/cpython/Lib/asyncio/taskgroup.py', line 168, code __aexit__>
     <frame at 0x1054efa00, file '/Users/yury/dev/pydev/cpython/Lib/types.py', line 314, code __init__>
---------------------------------------

         <frame at 0x1053a8dd0, file '/Users/yury/dev/pydev/cpython/tg1.py', line 15, code t21>

Looks like the traceback includes a bunch of lines from asyncio/taskgroup.py and I'm not sure it's actually correct. We'll need to figure out what to do there.

Update: For those who're curious why t22 isn't in the traceback output: it was likely cancelled early because t21 has crashed.

gvanrossum commented 3 years ago

I think this shows that Irit's strategy for grouping tracebacks is viable. In theory we could productionize this by itself, to help experiments with multi-error implementation (then again, Trio already did that without any C changes).

I think our next step is to reevaluate how to change the semantics of try..except -- switch to try..catch, or keep try..except but use except *E to catch multi-errors, etc.

Here's my strawman: multi-error is a final subclass of BaseException. except *E catches multi-errors containing E (we all agree on the semantics here, the multi-error is split in two multi-errors etc.). OTOH, except E is unchanged. So if you get a multi-error containing E and your except block says except E, it won't be caught, even if the multi-error wraps a single E instance. Instead it'll bubble just like any exception you don't catch, and presumably either your program terminates with an uncaught multi-error, or it is caught somewhere earlier on the stack. Traceback printing for uncaught multi-errors should print them nicely (both the C code and traceback.py, and whatever traceback formatting exists in the logging module or elsewhere in the stdlib).

Looking at .NET's AggregateException, it seems clear that multi-errors can be nested (the nesting reflects the nesting of task groups), but the splitting should recurse into nested multi-errors and "just do the right thing" there.

In terms of naming, I propose to name our multi-error AggregateException, since it's clear that .NET's task groups went before us, so we might as well pay homage (similar to async and await). (Though I don't know how much else of their task group API we can borrow.)

iritkatriel commented 3 years ago

We have three basic strategies now:

  1. No changes in python, use an AggregateException wrapper and do everything in user code (trio).
  2. Backwards compatible changes: Guido's strawman above which adds the except *X syntax.
  3. Non-backwards compatible changes: the catch keyword, which is like the except keyword from option 2, but also does the right thing when catch X sees an AggregateException that contains an X.

Am I right that this is incremental, so we can do 2 and then add 3?

gvanrossum commented 3 years ago

But why would we want to do 3 if we can do everything with 2? Certainly we can make except *E catch plain E exceptions as well (maybe wrapped in a singleton multi-error?).

I don't think I would ever want except E or catch E to catch a multi-error, because of the type issue with except E as e -- static checkers (and users!) assume e is an instance of E. And even if the multi-error only has one inner exception, unwrapping that would just make it harder to find buggy code, since the code will still break when a double error is thrown. So better break it early.

OTOH except *E should catch a plain E as well as one or more wrapped E instances. So except *E would have the semantics of the proposed catch E.

We could use catch E instead of except *E; it's shorter, but I don't like having both except and catch in the language with different semantics; I'd rather have except E and except *E with different semantics, since the * pretty much screams "something's different here". :-)

I think we probably should not allow some clauses using except *E1 and others except E2 -- the semantics seem too murky (e.g. could both clauses run?).

Finally, if after handling a bunch of exceptions the multi-error has exactly one inner exception left -- do we simplify (unwrap) it? I think not, because (again) that would make it harder for the user to realize that they could get a multi-error and they're not handling it right. However multi-errors should probably have methods to let user code do the unwrapping.

gvanrossum commented 3 years ago

BTW I forgot one important aspect of my strawman -- the presence of except *E implies that multiple except blocks may be run, e.g.

try:
    raise AggregateError([E1("a"), E2("b"), E1("c")])
except *E1:
    print("caught E1")
except *E2:
    print("caught E2")

will print both lines (once each).

Occasionally I forget this scenario and I believe that we can do it without changes in except syntax or handling by using

except AggregateException[E1]:
    print("caught E1")

but without changes to the generated code that would not work with the previous example (it would catch E1 but not E2). So we'll have to drop that idea -- except *E doesn't have this problem, we can generate any code we want for that.

iritkatriel commented 3 years ago

I see, this could indeed work in (2).

I thought (see above at https://github.com/python/exceptiongroups/issues/3#issuecomment-714781584) that the suggestion is that "catch *E" lets you handle the list of Es in one go while "catch E" executes the except clause once for each E in the list. That would not be possible in (2).

But your new suggestion would be possible, and I think it's better anyway. It's unlikely that you really need to process the whole list, and if you do you can construct it.

gvanrossum commented 3 years ago

It's unlikely that you really need to process the whole list [...]

Wait, are we talking about the same thing? I know in the past we were talking about some form where the handler would run once for each exception. I think I've changed my mind about that though and I'd rather run each handler 0 or 1 times, so except *E as e would bind e to a multi-error containing one or more E instances.

So for

try:
    ...
except *E1 as e:
    handler1(e)
except *E2 as e:
    handler2(e)

would become

try:
    ...
except BaseException as _err:
    if not isinstance(_err, ExceptionGroup):
        _err = ExceptionGroup(_err)
    if (e := _err.split(E1)) is not None:
        handler1(e)
    if (e := _err.split(E2)) is not None:
        handler2(e)
    if _err.excs:
        raise  # Cleverly, this raises the *original* exception if it wasn't an ExceptionGroup

If there are else and/or finally blocks those get tacked on the very end (so the else block is skipped if we caught anything, and the finally block is always executed).

iritkatriel commented 3 years ago

So e is a sequence of E1s. Makes sense. I like it.

Should there be an option to break - as in, if I got this type of exception then I don't want to process any of the others?

iritkatriel commented 3 years ago

Details:

    if not _err.excs:
        raise  # Cleverly, this raises the *original* exception if it wasn't an ExceptionGroup

You mean "if _err.excs" (so it's not empty), right? And _err needs to be checked for type - if it's not an ExceptionGroup it doesn't have .excs.

gvanrossum commented 3 years ago

Hm, I thought the type of e would be ExceptionGroup[E1]. There would of course be a method to iterate over the inner exceptions.

gvanrossum commented 3 years ago

Should there be an option to break - as in, if I got this type of exception then I don't want to process any of the others?

I don't know how common that use case would be -- they can solve it with a flag the set and test in the other handlers.

gvanrossum commented 3 years ago

You mean "if _err.excs" (so it's not empty), right?

Oops, yes. Correcting.

gvanrossum commented 3 years ago

And _err needs to be checked for type - if it's not an ExceptionGroup it doesn't have .excs.

That's why I have

    if not isinstance(_err, ExceptionGroup):
        _err = ExceptionGroup(_err)
iritkatriel commented 3 years ago

Should there be an option to break - as in, if I got this type of exception then I don't want to process any of the others?

I don't know how common that use case would be -- they can solve it with a flag the set and test in the other handlers.

break/continue can always be added later if we decide to.

iritkatriel commented 3 years ago

break/continue can always be added later if we decide to.

No, they can't be - there might be a break in an except block where the whole try-except is inside a loop.

1st1 commented 3 years ago

I've been thinking about how task groups and exceptions groups would be used by asyncio users and independently arrived to what Guido proposed in https://github.com/python/exceptiongroups/issues/3#issuecomment-716182663 and in https://github.com/python/exceptiongroups/issues/3#issuecomment-716190576.

Below is my thought train that led me to it. Apologies for the long write up, but I hope this will help us.


Types of errors

Fundamentally there are two kinds of exceptions: control flow exceptions and operation exceptions. The examples of former are KeyboardInterrupt, asyncio.CancelledError, etc. Latter are TypeError, KeyError, etc.

When writing async/await code that uses a concept of TaskGroups (or Trio's nurseries) to schedule different code concurrently, the users should approach these kinds in a fundamentally different way.

Operation exceptions such as KeyError should be handled within the async Task that runs the code. E.g. this is what users should do:

try:
  dct[key]
except KeyError:
  # handle the exception

and this is what they shouldn't do:

try:
  async with asyncio.TaskGroup() as g:
    g.create_task(task1); g.create_task(task2)
except *KeyError:
  # handling KeyError here is meaningless, there's
  # no context to do anything with it but to log it.

Control flow exceptions are a different beast. If, for example, we want to cancel an asyncio Task that spawned multiple concurrent Tasks in it with a TaskGroup, we need to make sure that:

So suppose we have the except *ExceptionType syntax that can only handle an ExceptionGroup with ExceptionType in it. This means that we'd see a lot of code like this:

try:
  async with asyncio.TaskGroup() as g:
    g.create_task(task1); g.create_task(task2)
except *CancelledError:
  log('cancelling server bootstrap')
  await server.stop()
  raise
except CancelledError:
  # Same code, really.
  log('cancelling server bootstrap')
  await server.stop()
  raise

Which led me to the conclusion that except *CancelledError as e should both:

Why "handle all exceptions at once with one run of the code code in except "? Why not run the code in the except clause for every matching exception that we have in the group? Basically because there's no need to. As I mentioned above, catching operation exceptions should be done with the regular old except KeyError within the Task boundary, where there's context to handle that KeyError. Catching control flow exceptions* is needed to react to some global signal, do cleanup or logging, but ultimately to either stop the signal or propagate it up the caller chain.

Separating exceptions kinds to two distinct groups (operation & control flow) leads to another conclusion: an individual try..except block usually handles either the former or the latter, but not a mix of both. Which led me to conclusion that except *CancelledError should switch the behavior of the entire try block to make it run several of its except * clauses if necessary. So:

try:
  # code
except KeyError:
  # handle
except ValueError:
  # handle

is the old and familiar try..except, we don't need to change it. And:

try:
  # code
except *TimeoutError:
  # handle
except *CancelledError:
  # handle

is an entirely different mode and it's OK, and moreover, almost expected from the user standpoint, to run both except clauses here.

And:

try:
  # code
except ValueError:
  # handle
except *CancelledError:
  # handle

is weird and most likely suggests that the code should be refactored.

Types of user code

Fundamentally we have applications and libraries. Applications are somewhat simpler -- they typically can dictate what version of Python they require. Which makes introducing TaskGroups and the new except * clause somewhat easier. Basically, upon switching to Python 3.10, the application developer can grep their application code for every control flow exception they handle (search for except CancelledError) and mechanically change it to except *CancelledError. Generally, judging by my own experience, there are not so many places with code like that. Typically things like CancelledError and TimeoutError are handled only in a few places, the rest of the code relies on try..finally to cleanup its state.

Library developers are in a worse position: they'll need to maintain backwards compatibility with older Pythons, so they can't start using the new except * syntax. Two thoughts here:

This means that we'll need to have a proper programmatic API to work with ExceptionGroups, so that libraries can write code like:

try:
  # run user code
except CancelledError:
  # handle cancellation
except ExceptionGroup as e:
  g1, g2 = e.split(CancelledError)
  # handle cancellation

The API isn't going to be pretty, but that's OK, because a lot of existing asyncio libraries don't run user-provided coroutines that might use TaskGroups. In other words, a mysql or redis driver will never be able to catch an ExceptionGroup until it starts using TaskGroups itself.

Summary

I just cannot wrap my head around introducing try..catch syntax to Python. I don't think we actually need it, with the right approach to documentation we can explain to the users how to use TaskGroups and except * syntax correctly and I believe it will only make their code better. Therefore, to conclude I think we should:

gvanrossum commented 3 years ago

break/continue can always be added later if we decide to.

No, they can't be - there might be a break in an except block where the whole try-except is inside a loop.

Oh, any flow control other than raise should probably just interrupt the flow -- if handler1(e) says break (it's not really a function, just the code from the block) then none of the other stuff happens except finally.

But we need to think more about this later.

1st1 commented 3 years ago

Adding to https://github.com/python/exceptiongroups/issues/3#issuecomment-716203284:

Perhaps, if there's one except in a try block, we should require all other except clauses to be written as except .

This is also the strategy we use when designing APIs/syntax for EdgeDB: if there's no clear need to allow both except and except * clauses in one try we should only allow one of them. This is something that we can relax in later releases if we discover a use case. OTOH allowing both and later restricting it would be a breaking change.

gvanrossum commented 3 years ago

Perhaps, if there's one except * in a try block, we should require all other except clauses to be written as except *.

Oh, definitely.

gvanrossum commented 3 years ago

Perhaps we should open a new issue, since this one is still called "Introducing try..catch"? The new one could be called "Introducing 'except *'", and cleanly introduce the new proposal (probably linking to Irit's traceback group prototype).

1st1 commented 3 years ago

The new one could be called "Introducing 'except *'", and cleanly introduce the new proposal (probably linking to Irit's traceback group prototype).

If you want I can take a stab at it.

gvanrossum commented 3 years ago

Go for it.

1st1 commented 3 years ago

Done: https://github.com/python/exceptiongroups/issues/4.

ShadowJonathan commented 3 years ago

What's the current status of this proposal? Will this go-ahead to also soft-introduce multierrors/exception groups in the language? Or is this specific idea forgone for a different one?

iritkatriel commented 3 years ago

A variation of this is now being proposed as PEP 654.

ShadowJonathan commented 3 years ago

According to this header, this specific idea (try...catch) seems to be rejected, should this issue be closed?