Closed diranged closed 9 years ago
As a general rule, anything which logs an exception and re-raises it is prone to double logging (in your first example one of the four exceptions gets double-logged), so it's best to log at the point where an exception will no longer be re-raised. I hadn't really considered the possibility of log-and-re-raise but I see why you're doing it and I see two general strategies:
def log_exception(f):
"""This decorator ensures that any exception raised by the decorated coroutine will be logged."""
def wrapped(*args, **kw):
try:
future = f(*args, **kw)
except Exception:
logging.warning("exception", exc_info=True)
def exception_logger(future):
try:
future.result()
except Exception:
logging.warning("exception", exc_info=True)
future.add_done_callback(exception_logger)
return future
return wrapped
This is untested and the double try/except is unfortunate (I can't remember offhand whether the first one is necessary or not), but it should do what you're doing now but in a way that satisfies Tornado's destructor hook. (There are other ways to do this besides a decorator, but this demonstrates the concept)
if not unfinished_children:
result_list = []
for f in children:
try:
result_list.append(f.result())
except Exception:
if future.done():
logging.error("exception", exc_info=True)
else:
future.set_exc_info(sys.exc_info())
if not future.done():
future.set_result(result_list)
I think it would be a good idea to make a change like this to multi_future (and to gen.Multi). This would let you remove your own logging and trust that the framework would log things for you (without relying on the somewhat fickle GC hook). It would also remove the double-logging of the first exception to fire.
@bdarnell, thanks for the response. For the time being, given our current use case, we are working around the issue in one small place by following this pattern:
tasks = []
for job in jobs:
tasks.append(self.do_thing_that_returns_future(job))
returns = []
exceptions = []
for task in tasks:
try:
ret = yield task
returns.append(ret)
except Exception as e:
msg = ('%s failed: %s' % (task, e))
exceptions.append(msg)
if exceptions:
raise SomeOtherException(', '.join(exceptions))
raise gen.Return(returns)
Now, it seems like the multi_future()
method itself should do this. @siminm and I were chatting about this, and we thought that multi_future()
could raise a MultipleFutureFailures
exception object that contained an actual list of all of the exceptions that were raised. This allows the caller to receive all of the failures, and handle them however they wish, while still matching pythons semantics around raising a single exception.
This is problematic because do_thing_that_returns_future()
may raise an exception instead of returning a Future (the @gen.coroutine
decorator will always wrap its exceptions in a Future, but e.g. most IOStream methods do not make this guarantee and may raise instead of returning). This complicates exception handling because you must now handle multiple exception types (instead of just MultipleFutureFailures) and you still have to deal with the remnants of the tasks
list (probably resulting in exception handling logic that is repeated in several places).
My feeling is that in most cases, "raise the first and log the rest" is the right behavior for multi_future
; any exception handling other than logging should be done in the parallel tasks themselves rather than after the parent task has joined with them. I think there might be room for some variations (cf. the return_exceptions option to asyncio.gather: https://docs.python.org/3/library/asyncio-task.html#asyncio.gather), but if it's easy enough to create these outside of Tornado as you've done I'm not sure if it's worth adding them to the library.
I might be biased, but it seems reasonable to me that if you leverage this feature then you have to be prepared for a more complex exception handling.
try:
yield simple
except MyException:
# do stuff
vs
try:
yield [simple1, simple2]
except MyException:
# do stuff
except tornado.exceptions.MultipleExceptionsRaised as multiple:
for e in multiple.get_all_exceptions():
# do stuff
I like the above because it's explicit and considering the benefit of having access to all exception - it is very worth the overhead.
Yes, but when you have the more complex exception handling, how much are you gaining by yielding a list instead of the individual futures as in your first snippet? It's a few lines shorter, but how common is this? You say you were able to make this change in one place. And how do you indicate that you want to leverage this feature? All the code today that yields lists is expecting the original exception type to be raised, not a MultipleExceptionsRaised.
It's also unclear to me whether this even has the semantics you want, since it doesn't guarantee that all tasks will be started. It sounds to me like you want to start all the tasks even if some of them fail, which suggests to me that each task should be wrapped in something that does a try/except and returns rather than raises an error.
I agree that the current implementation keeps the runtime code backwards-compatible. I was suggesting that it doesn't have to be when multiple errors happen.
The advantage is that I can still yield a list of tasks; I think that part of Tornado is sexy. Then if many exceptions are raised (a concept that's tornado-specific) its up to the developer to handle them or not. Otherwise any scenario that cares about exception handling simply cannot use yielding lists.
Backwards compatibility still matters when multiple exceptions occur; people today are relying on the fact that yielding a list will not raise any exceptions that none of the futures would raise individually (and this doesn't mean that they "don't care about exception handling"). It's a compatibility break unless the multi-exception is somehow opt-in.
A list of asynchronous operations is not unique to Tornado; it's directly analogous to concurrent.futures.Executor.map, and that method has no way to access any failures after the first. Exceptions are inherently "singular" in python; you're trying to work with them as data which makes me think that asyncio.gather has the right idea: with opt-in, return exceptions instead of raising them (and don't try to raise a meta-exception at the end if there were non-zero exceptions)
I've made a couple of changes to address this issue: multi_future now logs exceptions directly instead of leaving that to the destructor hook, and as a result it now accepts a quiet_exceptions parameter similar to with_timeout. With these changes, my recommendations are as follows:
raise_exception
example should not be required.yield gen.multi_future([futures...], quiet_exceptions=Exception)
or log-and-return, turning exceptions into unexceptional return values.
In a recent change by @bdarnell (https://github.com/tornadoweb/tornado/commit/241956a6cdd1e96de7afe9355fc3dec303f2365e#diff-f9417e85a5edaa0ca2318cad516e1d6aL122), if
yield [taska, taskb, taskc]
is called, and more than one of those tasks raises an exception, Tornado jumps in and decides to log the exception for you when theFuture
object is being cleaned up. I get why this is done. However, I don't understand how we're supposed to handle it.We use this
yield [... many things ...]
pattern a ton in Kingpin. Since Tornado 4.1, we are unable to control the logging output when multiple asynchronous tasks fail, even though we are absolutely logging each of those exceptions.Here is a super simple example
yield_all.py
code:Here is the example output in Tornado 4:
Here is the example output in Tornado 4.1:
The question here is, how do we properly handle multiple
Future
objects raising Exceptions? We're already handling and logging the Exception, but we want to continue to raise it up the stack anyways.