Document how to handle yielding many tasks, and catching exceptions.

diranged commented 9 years ago

In a recent change by @bdarnell (https://github.com/tornadoweb/tornado/commit/241956a6cdd1e96de7afe9355fc3dec303f2365e#diff-f9417e85a5edaa0ca2318cad516e1d6aL122), if yield [taska, taskb, taskc] is called, and more than one of those tasks raises an exception, Tornado jumps in and decides to log the exception for you when the Future object is being cleaned up. I get why this is done. However, I don't understand how we're supposed to handle it.

We use this yield [... many things ...] pattern a ton in Kingpin. Since Tornado 4.1, we are unable to control the logging output when multiple asynchronous tasks fail, even though we are absolutely logging each of those exceptions.

Here is a super simple example yield_all.py code:

import random
import time

from tornado import gen
from tornado import ioloop

@gen.coroutine
def tornado_sleep(seconds=1.0):
    yield gen.Task(ioloop.IOLoop.current().add_timeout,
                   time.time() + seconds)

@gen.coroutine
def raise_exception(name):
    print ("%s is sleeping" % name)
    yield tornado_sleep(random.random())
    print ("%s is raising exception!" % name)
    try:
        raise Exception('Bad things from %s' % name)
    except Exception as e:
        print ("%s raised exception: %s" % (name, e))
        raise e

@gen.coroutine
def raise_many():

    tasks = [raise_exception('one'),
             raise_exception('two'),
             raise_exception('three'),
             raise_exception('four')]

    print ("Beginning yield")
    yield tasks

def print_tidy_exception(e):
    # Skip traceback that involves site-packages.
    import traceback
    trace_lines = traceback.format_exc(e).splitlines()
    skip_next = False
    for l in trace_lines:
        if 'tornado' in l:
            skip_next = True
            continue
        if not skip_next:
            print(l)
        skip_next = False

if __name__ == '__main__':
    try:
        ioloop.IOLoop.instance().run_sync(raise_many)
    except Exception as e:
        print_tidy_exception(e)

Here is the example output in Tornado 4:

python yield_all.py 
one is sleeping
two is sleeping
three is sleeping
four is sleeping
Beginning yield
one is raising exception!
one raised exception: Bad things from one
two is raising exception!
two raised exception: Bad things from two
four is raising exception!
four raised exception: Bad things from four
three is raising exception!
three raised exception: Bad things from three
Traceback (most recent call last):
  File "yield_all.py", line 53, in <module>
    ioloop.IOLoop.instance().run_sync(raise_many)
  File "yield_all.py", line 35, in raise_many
    yield tasks
  File "yield_all.py", line 23, in raise_exception
    raise e
Exception: Bad things from one

Here is the example output in Tornado 4.1:

python yield_all.py 
one is sleeping
two is sleeping
three is sleeping
four is sleeping
Beginning yield
four is raising exception!
four raised exception: Bad things from four
one is raising exception!
one raised exception: Bad things from one
two is raising exception!
two raised exception: Bad things from two
three is raising exception!
three raised exception: Bad things from three
Traceback (most recent call last):
  File "yield_all.py", line 53, in <module>
    ioloop.IOLoop.instance().run_sync(raise_many)
  File "yield_all.py", line 35, in raise_many
    yield tasks
  File "yield_all.py", line 23, in raise_exception
    raise e
Exception: Bad things from one
ERROR:tornado.application:Future exception was never retrieved: Traceback (most recent call last):
  File "/Users/diranged/git/Nextdoor/kingpin/.venv/lib/python2.7/site-packages/tornado/gen.py", line 812, in run
    yielded = self.gen.send(value)
  File "yield_all.py", line 23, in raise_exception
    raise e
Exception: Bad things from two
ERROR:tornado.application:Future exception was never retrieved: Traceback (most recent call last):
  File "/Users/diranged/git/Nextdoor/kingpin/.venv/lib/python2.7/site-packages/tornado/gen.py", line 812, in run
    yielded = self.gen.send(value)
  File "yield_all.py", line 23, in raise_exception
    raise e
Exception: Bad things from three
ERROR:tornado.application:Future exception was never retrieved: Traceback (most recent call last):
  File "/Users/diranged/git/Nextdoor/kingpin/.venv/lib/python2.7/site-packages/tornado/gen.py", line 812, in run
    yielded = self.gen.send(value)
  File "yield_all.py", line 23, in raise_exception
    raise e
Exception: Bad things from four

The question here is, how do we properly handle multiple Future objects raising Exceptions? We're already handling and logging the Exception, but we want to continue to raise it up the stack anyways.

bdarnell commented 9 years ago

As a general rule, anything which logs an exception and re-raises it is prone to double logging (in your first example one of the four exceptions gets double-logged), so it's best to log at the point where an exception will no longer be re-raised. I hadn't really considered the possibility of log-and-re-raise but I see why you're doing it and I see two general strategies:

Log and re-raise, but do it in a way that Tornado considers the error to be "retrieved". This means moving the try/except out of the function itself and into a wrapper:

    def log_exception(f):
        """This decorator ensures that any exception raised by the decorated coroutine will be logged."""
        def wrapped(*args, **kw):
            try:
                future = f(*args, **kw)
            except Exception:
                logging.warning("exception", exc_info=True)
            def exception_logger(future):
                try:
                    future.result()
                except Exception:
                    logging.warning("exception", exc_info=True)
            future.add_done_callback(exception_logger)
            return future
        return wrapped

This is untested and the double try/except is unfortunate (I can't remember offhand whether the first one is necessary or not), but it should do what you're doing now but in a way that satisfies Tornado's destructor hook. (There are other ways to do this besides a decorator, but this demonstrates the concept)

Use a variant of gen.multi_future that doesn't ignore multiple failures. Rewrite the bit in multi_future that builds result_list in a way that logs any failure after the first:

    if not unfinished_children:
        result_list = []
        for f in children:
            try:
                result_list.append(f.result())
            except Exception:
                if future.done():
                    logging.error("exception", exc_info=True)
                else:
                    future.set_exc_info(sys.exc_info())
         if not future.done():
             future.set_result(result_list)

I think it would be a good idea to make a change like this to multi_future (and to gen.Multi). This would let you remove your own logging and trust that the framework would log things for you (without relying on the somewhat fickle GC hook). It would also remove the double-logging of the first exception to fire.

diranged commented 9 years ago

@bdarnell, thanks for the response. For the time being, given our current use case, we are working around the issue in one small place by following this pattern:

tasks = []
for job in jobs:
    tasks.append(self.do_thing_that_returns_future(job))

returns = []
exceptions = []
for task in tasks:
    try:
        ret = yield task
        returns.append(ret)
    except Exception as e:
        msg = ('%s failed: %s' % (task, e))
        exceptions.append(msg)

if exceptions:
    raise SomeOtherException(', '.join(exceptions))

raise gen.Return(returns)

Now, it seems like the multi_future() method itself should do this. @siminm and I were chatting about this, and we thought that multi_future() could raise a MultipleFutureFailures exception object that contained an actual list of all of the exceptions that were raised. This allows the caller to receive all of the failures, and handle them however they wish, while still matching pythons semantics around raising a single exception.

bdarnell commented 9 years ago

This is problematic because do_thing_that_returns_future() may raise an exception instead of returning a Future (the @gen.coroutine decorator will always wrap its exceptions in a Future, but e.g. most IOStream methods do not make this guarantee and may raise instead of returning). This complicates exception handling because you must now handle multiple exception types (instead of just MultipleFutureFailures) and you still have to deal with the remnants of the tasks list (probably resulting in exception handling logic that is repeated in several places).

My feeling is that in most cases, "raise the first and log the rest" is the right behavior for multi_future; any exception handling other than logging should be done in the parallel tasks themselves rather than after the parent task has joined with them. I think there might be room for some variations (cf. the return_exceptions option to asyncio.gather: https://docs.python.org/3/library/asyncio-task.html#asyncio.gather), but if it's easy enough to create these outside of Tornado as you've done I'm not sure if it's worth adding them to the library.

mikhail commented 9 years ago

I might be biased, but it seems reasonable to me that if you leverage this feature then you have to be prepared for a more complex exception handling.

try:
  yield simple
except MyException:
  # do stuff

vs

try:
  yield [simple1, simple2]
except MyException:
  # do stuff
except tornado.exceptions.MultipleExceptionsRaised as multiple:
  for e in multiple.get_all_exceptions():
    # do stuff

I like the above because it's explicit and considering the benefit of having access to all exception - it is very worth the overhead.

bdarnell commented 9 years ago

Yes, but when you have the more complex exception handling, how much are you gaining by yielding a list instead of the individual futures as in your first snippet? It's a few lines shorter, but how common is this? You say you were able to make this change in one place. And how do you indicate that you want to leverage this feature? All the code today that yields lists is expecting the original exception type to be raised, not a MultipleExceptionsRaised.

It's also unclear to me whether this even has the semantics you want, since it doesn't guarantee that all tasks will be started. It sounds to me like you want to start all the tasks even if some of them fail, which suggests to me that each task should be wrapped in something that does a try/except and returns rather than raises an error.

mikhail commented 9 years ago

I agree that the current implementation keeps the runtime code backwards-compatible. I was suggesting that it doesn't have to be when multiple errors happen.

The advantage is that I can still yield a list of tasks; I think that part of Tornado is sexy. Then if many exceptions are raised (a concept that's tornado-specific) its up to the developer to handle them or not. Otherwise any scenario that cares about exception handling simply cannot use yielding lists.

bdarnell commented 9 years ago

Backwards compatibility still matters when multiple exceptions occur; people today are relying on the fact that yielding a list will not raise any exceptions that none of the futures would raise individually (and this doesn't mean that they "don't care about exception handling"). It's a compatibility break unless the multi-exception is somehow opt-in.

A list of asynchronous operations is not unique to Tornado; it's directly analogous to concurrent.futures.Executor.map, and that method has no way to access any failures after the first. Exceptions are inherently "singular" in python; you're trying to work with them as data which makes me think that asyncio.gather has the right idea: with opt-in, return exceptions instead of raising them (and don't try to raise a meta-exception at the end if there were non-zero exceptions)

bdarnell commented 9 years ago

I've made a couple of changes to address this issue: multi_future now logs exceptions directly instead of leaving that to the destructor hook, and as a result it now accepts a quiet_exceptions parameter similar to with_timeout. With these changes, my recommendations are as follows:

In general, applications can rely on the framework to log exceptions for them; explicit logging as in your original raise_exception example should not be required.
When applications want to customize logging, they can either log-and-reraise as you've done and use yield gen.multi_future([futures...], quiet_exceptions=Exception) or log-and-return, turning exceptions into unexceptional return values.
In some cases, it may be possible to customize logging by attaching a handler to tornado.log.app_log, but I don't know how flexible that is in practice.

tornadoweb / tornado

Document how to handle yielding many tasks, and catching exceptions. #1378