ilovesoup / hyracks

Automatically exported from code.google.com/p/hyracks
Apache License 2.0
0 stars 0 forks source link

Exceptions in OperatorRunnable worker threads should cause Operator/Stagelet failure. #37

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Exceptions in OperatorRunnable worker threads do not trigger Stagelet failure.

Currently, exceptions are caught and ignored inside of the Runnables 
constructed in OperatorRunnable.run().  This makes operators harder to debug.

I've included a patch that uses Callables and a CompletionService to propagate 
exceptions thrown by the tasks submitted to the OperatorRunnable's executor.  
This allows exceptions thrown in operators to properly trigger Operator/Stagele 
failures.

My patch doesn't perform any cleanup after an error.  In particular, 
opNode.deinitialize() may not be called if a task fails with an exception.  I 
considered using a `finally` block to call opNode.deinitialize(), but I'm not 
sure whether it's a good idea to deinitialize the opNode before the other tasks 
are aborted.  I can't simply call abort() before opNode.deinitialize() because 
it won't immediately stop the tasks, which might be blocked in the call to 
reader.nextFrame() in pushFrames().

Original issue reported on code.google.com by jodarose...@gtempaccount.com on 14 Aug 2011 at 10:59

Attachments:

GoogleCodeExporter commented 9 years ago
This issue has been fixed in dev_next. The next release of Hyracks will have 
this change.

Original comment by vinay...@gmail.com on 25 Aug 2011 at 7:45

GoogleCodeExporter commented 9 years ago
The change appears to have been lost when OperatorRunnable was replaced by 
Task.  Task still ignores exceptions thrown by pushFrames().

Original comment by rosenville@gmail.com on 23 Oct 2011 at 6:54

GoogleCodeExporter commented 9 years ago
Fixed in dev_next r839. Josh, can you verify?

Original comment by vinay...@gmail.com on 30 Nov 2011 at 8:17

GoogleCodeExporter commented 9 years ago
dev_next r839 fixed the problem; I tested it by introducing a null pointer 
exception in the HashGroupOperator and running the WordCount example.  The job 
is properly marked as failed.

When the job fails, waitForCompletion() throws the exception that caused the 
failure rather than successfully returning; I like this design because it makes 
job failures obvious and prevents them from being ignored, rather than relying 
on the programmer to check for failure via getJobStatus().

Original comment by rosenville@gmail.com on 30 Nov 2011 at 11:44

GoogleCodeExporter commented 9 years ago

Original comment by vinay...@gmail.com on 1 Dec 2011 at 12:00