python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.19k stars 341 forks source link

In run_process, what should happen if the deliver_cancel function raises an exception? #1532

Open njsmith opened 4 years ago

njsmith commented 4 years ago

What should happen if deliver_cancel raises an exception? In the current implementation, the shielded cancel scope will prevent it from propagating until the process exits, which might take a while if the crash occurred before signaling the process in any way. Maybe on exception from a user-specified deliver_cancel we should call the default deliver_cancel to kill the process? Or just kill() since we don't care much about polish at that point, only about letting the user know what they need to fix.

Originally posted by @oremanj in https://github.com/python-trio/trio/pull/1525

njsmith commented 4 years ago

Yeah, this is tricky. On the one hand, you kind of have to report the failure, b/c "program hangs silently" is terrible UX. On the other hand, you kind of can't report the failure, b/c run_process hasn't finished yet.

That's why I ended up putting warnings.warn calls in the default deliver_cancel – it gives the user an immediate heads-up, even though we can't propagate the error properly.

Maybe we should pull the try: ... except: warnings.warn(...) out of the default deliver_cancel, and put it into the code that calls deliver_cancel instead?

I guess for exceptions out of arbitrary user code, it would be nice to include the full traceback. That's an awkward fit for the warnings machinery though.

In a few other places where we have this kind of problem (instrument failures, EMFILE in serve_listeners), we use logging to dump the warning onto the console, with the idea being that by default the user sees it, and at least its hookable if you want to do something else in production. Maybe we should do the same here? try: await deliver_cancel(proc) except: L = logging.getLogger("trio.run_process"); L.exception(...); raise?

oremanj commented 4 years ago

Logging the exception, and still waiting for the process to exit, seems like a good compromise to me. The log message should explicitly say that it's still waiting for the process to exit, so that the reason for the apparent deadlock is as obvious as we can make it.

guilledk commented 4 years ago

Hey guys so I'm gonna take a crack at this issue. If I got this right the idea would be a change like, moving the try: .. except to trio._subprocess.run_process.open_process:

try:
    await proc.wait()
except trio.Cancelled:
    with trio.CancelScope(shield=True):
        killer_cscope = trio.CancelScope(shield=True)

        async def killer():
            with killer_cscope:
                try:  # here is the new try except
                    await deliver_cancel(proc)
                except OSError as exc:
                    warnings.warn(
                        RuntimeWarning(
                            f"tried to kill process {proc!r}, but failed with: {exc!r}"
                        )
                    )

        nursery.start_soon(killer)
        await proc.wait()
        killer_cscope.cancel()
        raise

Also what is a good way to trigger an OSError to test this?, just closing the process manually while trio thinks is open will probably work I think, but maybe there is a cleaner way.

njsmith commented 4 years ago

If I got this right the idea would be a change like, moving the try: .. except to trio._subprocess.run_process.open_process:

You mean trio._subprocess.run_process.killer, right?

Also what is a good way to trigger an OSError to test this?

You can also just do:

async def custom_deliver_cancel(proc):
    raise OSError("whoops!")

Also:

guilledk commented 4 years ago

Sorry can you expand on the BaseException point?

njsmith commented 4 years ago

@guilledk In your comment, you had try/except OSError. For the original version where we were just catching errors from the default deliver_cancel, that was fine, because we knew OSError was the only thing that could be raised. But now we'll be catching errors from user code, so we want to instead do try/except BaseException. And then afterwards, we want to re-raise the exception.

guilledk commented 4 years ago

Oh ok got it