raysect / source

The main source repository for the Raysect project.
http://www.raysect.org
BSD 3-Clause "New" or "Revised" License
86 stars 23 forks source link

Propagate exceptions in worker processes to caller #335

Closed jacklovell closed 4 years ago

jacklovell commented 4 years ago

Ensure that exceptions in worker processes do not cause the entire program to hang. The consumer process in the Multicore engine expects every task to put something in the results queue, so we either put the result itself (if the task ran without error) or the exception raised by the task. This means the consumer is never stuck waiting for results in the queue which will never arrive. A calling program can catch the resulting RuntimeError if any of the workers fails, and deal with it accordingly.

Fixes #334

@CnlPepper I've only included the exception instance info from the workers here. Do you think it would be worthwhile to also include a traceback?

CnlPepper commented 4 years ago

Thanks @jacklovell I'm not going to merge this right now as I'd like to spend a bit of time thinking it through.Initial thoughts:

jacklovell commented 4 years ago

Workers are still shut down: after the while remaining loop in run the value None is still put into the job queue len(workers) times, which should cause all the workers to shut down. I didn't notice any zombie processes when testing this, though I may not have done the right test.

Different exceptions may be raised in different workers, depending on the tasks being performed. I suppose we could only return the exception from the first task which raised, and then if a user fixes that they can run again and deal with the next raising task, and so on. I'm happy with this suggestion: it means dealing with errors one at a time rather than being presented with everything which went wrong all at once. We could then raise something more specific than a RuntimeError if we're only returning one exception.

A full stack trace shouldn't be too hard. I was just being a bit lazy...