Open njsmith opened 4 years ago
Yeah, this is tricky. On the one hand, you kind of have to report the failure, b/c "program hangs silently" is terrible UX. On the other hand, you kind of can't report the failure, b/c run_process
hasn't finished yet.
That's why I ended up putting warnings.warn
calls in the default deliver_cancel
– it gives the user an immediate heads-up, even though we can't propagate the error properly.
Maybe we should pull the try: ... except: warnings.warn(...)
out of the default deliver_cancel
, and put it into the code that calls deliver_cancel
instead?
I guess for exceptions out of arbitrary user code, it would be nice to include the full traceback. That's an awkward fit for the warnings
machinery though.
In a few other places where we have this kind of problem (instrument failures, EMFILE
in serve_listeners
), we use logging
to dump the warning onto the console, with the idea being that by default the user sees it, and at least its hookable if you want to do something else in production. Maybe we should do the same here? try: await deliver_cancel(proc) except: L = logging.getLogger("trio.run_process"); L.exception(...); raise
?
Logging the exception, and still waiting for the process to exit, seems like a good compromise to me. The log message should explicitly say that it's still waiting for the process to exit, so that the reason for the apparent deadlock is as obvious as we can make it.
Hey guys so I'm gonna take a crack at this issue.
If I got this right the idea would be a change like, moving the try: .. except
to trio._subprocess.run_process.open_process
:
try:
await proc.wait()
except trio.Cancelled:
with trio.CancelScope(shield=True):
killer_cscope = trio.CancelScope(shield=True)
async def killer():
with killer_cscope:
try: # here is the new try except
await deliver_cancel(proc)
except OSError as exc:
warnings.warn(
RuntimeWarning(
f"tried to kill process {proc!r}, but failed with: {exc!r}"
)
)
nursery.start_soon(killer)
await proc.wait()
killer_cscope.cancel()
raise
Also what is a good way to trigger an OSError
to test this?, just closing the process manually while trio
thinks is open will probably work I think, but maybe there is a cleaner way.
If I got this right the idea would be a change like, moving the try: .. except to
trio._subprocess.run_process.open_process
:
You mean trio._subprocess.run_process.killer
, right?
Also what is a good way to trigger an OSError to test this?
You can also just do:
async def custom_deliver_cancel(proc):
raise OSError("whoops!")
Also:
BaseException
and then re-raiselogging
instead of warnings
. For an example of what this might look like, check out trio/_highlevel_server_listeners.py
and how LOGGER
is used there.Sorry can you expand on the BaseException
point?
@guilledk In your comment, you had try
/except OSError
. For the original version where we were just catching errors from the default deliver_cancel
, that was fine, because we knew OSError
was the only thing that could be raised. But now we'll be catching errors from user code, so we want to instead do try
/except BaseException
. And then afterwards, we want to re-raise the exception.
Oh ok got it
What should happen if deliver_cancel raises an exception? In the current implementation, the shielded cancel scope will prevent it from propagating until the process exits, which might take a while if the crash occurred before signaling the process in any way. Maybe on exception from a user-specified deliver_cancel we should call the default deliver_cancel to kill the process? Or just kill() since we don't care much about polish at that point, only about letting the user know what they need to fix.
Originally posted by @oremanj in https://github.com/python-trio/trio/pull/1525