Closed jacklovell closed 4 years ago
Thanks @jacklovell I'm not going to merge this right now as I'd like to spend a bit of time thinking it through.Initial thoughts:
Workers are still shut down: after the while remaining
loop in run
the value None
is still put into the job queue len(workers)
times, which should cause all the workers to shut down. I didn't notice any zombie processes when testing this, though I may not have done the right test.
Different exceptions may be raised in different workers, depending on the tasks being performed. I suppose we could only return the exception from the first task which raised, and then if a user fixes that they can run again and deal with the next raising task, and so on. I'm happy with this suggestion: it means dealing with errors one at a time rather than being presented with everything which went wrong all at once. We could then raise something more specific than a RuntimeError
if we're only returning one exception.
A full stack trace shouldn't be too hard. I was just being a bit lazy...
Ensure that exceptions in worker processes do not cause the entire program to hang. The consumer process in the Multicore engine expects every task to put something in the results queue, so we either put the result itself (if the task ran without error) or the exception raised by the task. This means the consumer is never stuck waiting for results in the queue which will never arrive. A calling program can catch the resulting RuntimeError if any of the workers fails, and deal with it accordingly.
Fixes #334
@CnlPepper I've only included the exception instance info from the workers here. Do you think it would be worthwhile to also include a traceback?