Closed Zac-HD closed 2 months ago
Relatedly, we should document any design patterns we like or dislike for interfaces with an "inner nursery", c.f. https://github.com/python-trio/trio-websocket/issues/132
The basic problems are that (1) forcing users to handle errors which may or may not be an ExceptionGroup
has horrible ergonomics, and that (2) usually one or the other is more common, so it's really easy to end up with latent bugs from only handling the common kind. That's why we made strict_exception_groups=True
the default!
Unfortunately, this also means that using a nursery internally means that error handling doesn't quite follow the black box rule - you can still catch an ungrouped exception with except*:
, but most code has good reason to prefer except:
.
Example patterns:
ExceptionGroup
, letting users handle it.
raise SomeInternalError(...) from group
SomeInternalError
instead of a group, so it's not clear that we've actually gained much - except Exception:
will catch an ExceptionGroup
directly, after all.PleaseReportThisBugError
raise SomeInternalError(...) from group
but exceptions in the body of the nursery block are raised directly.
SomeInternalError from ExceptionGroup(...) from body-error
. Verbose but clear IMO....and that's probably a decent outline for the section, I suppose.
It's scary that these approaches to the nursery-as-implementation-detail problem haven't been fleshed out before committing to strict exception groups.
I think the best practice for avoiding concurrent error headaches is to minimize their scope, converging on a winning exception in a local context wherever reasonable. Up until now, I've only had to deal with that in a few places in our app or libraries where a concurrent error might leak. Now every single use of a nursery (including as implementation detail by 3rd party libraries) needs to be bandaged over. It's a large blow to trio ergonomics.
(6) sounds interesting, but I'm not sure if the body/background distinction is sufficient. I'd have to see it and try to apply it.
The careful wrapping of API's alluded to in (7) has been the approach in my application all along, and has worked fine. For example, see multi_error_defer_to()
and defer_to_cancelled()
in trio-util. Notably, these can be decorators on API async functions. (We've only needed @defer_to_cancelled
in our codebase.)
I think it's related to the patterns discussion, so I'll share thoughts on how I might migrate trio-websocket to trio's new regime.
This library seems like it would be a simple enough case-- no async package dependencies, doesn't expose concurrent errors in its API-- yet how to reach the goal isn't obvious. The points to watch for:
except:
within an async context, it might need to be adjusted to handle an exception group. It's hard to know from just reading the code, since nursery use may be an implementation detail of any call or context manager-- as we are aware. (We'd need to rely on diligent documentation on every async definition in the world regarding potentially raising a transitive ExceptionGroup.)Between these, the latter has me more concerned. At least with the former, the code to review is clearly marked with except
.
So, along the lines of pattern (2), I might make a context manager that translates ExceptionGroup into an internal error, unless the group consists entirely of Cancelled
. Then hope by way of unit tests, integration tests, and use in the field that any internal errors might be reported, and the holes eventually patched up.
Regarding pattern (2):
users still have to deal with maybe getting a SomeInternalError instead of a group, so it's not clear that we've actually gained much -
except Exception:
will catch anExceptionGroup
directly, after all
No one should be explicity catching an internal error of an API (a general catch is fine). I guess here I'm equating "internal error" with "error that the API does not expect to raise", so it's a bug to be fixed. It's important to use a dedicated error and make it clear that a bug should be reported.
It's scary that these approaches to the nursery-as-implementation-detail problem haven't been fleshed out before committing to strict exception groups.
I'm not delighted about this gap either, but https://github.com/python-trio/trio/issues/2785#issuecomment-1775013388 explains why I think we had to do it: strict is the standard semantics across asyncio
and anyio
, and mixing strict and non-strict is a recipe for bugs as authors neglect the chance of a group. Unfortunately you might not notice in testing - it's the kind of bug that shows up under (production) load.
I think I'm less bothered by the worse ergonomics because those bugs have repeatedly bitten me, in both our own and upstream code, and I'm working in a large enough codebase that buying reliably-correct-ness for some convenience is net-positive tradeoff.
In cases where you know that only one error can every happen at a time (although how sure should we be that we'd notice if that changed?), it's not too hard to flatten the group - although it'd be nice to have a helper for that and raise a PleaseReportBug
exception if there's more than one leaf. In cases where there can be multiple concurrent errors, I'm disinclined to hide that and would defer_to_cancelled()
or just raise the whole group - except*
is pretty nice to work with.
I'm still struggling with how to mitigate strict exception groups within the trio libraries I maintain.
The worst thing is that strict_exception_groups
was made a user-configurable option that can be set globally on run()
. This means that the application can enable this mode before authors of any of the transitive dependencies have given thought to whether they want their API's to leak ExceptionGroup.
I can't declare a trio <= X
dependency in my package as a stopgap while sorting things out (where version X enabled strict exceptions), again since strict_exception_groups
is a global runtime option set by the app.
I haven't seen any reference of explicitly setting strict_exception_groups=False
for internal nurseries (as a temp stop-gap for libraries: async with trio.open_nursery(strict_exception_groups=False) as nursery: ...
). Is there a big obvious flaw with it? Maybe people avoid it cause it's deprecated?
I don't maintain a library that uses trio so if it seems like I'm missing something, I probably am :P
explicitly setting
strict_exception_groups=False
for internal nurseries
I don't know how far you'd get, as there may be 3rd party things your code calls (within trio API itself or utility libs) that wrap nursery and don't give you access to the option.
Follow-up for https://github.com/python-trio/trio/pull/2886#discussion_r1444122554:
We should also skim through other mentions of
ExceptionGroup
to ensure that our messaging is consistent across all the docs.