Open istvan-fodor opened 3 years ago
Good idea. Unfortunately, ipyparallel still doesn't have the ability to send interrupts in general, though you can often make it work depending on the environment using things like:
import os
import socket
e_all = rc[:]
hosts = e_all.apply_async(socket.gethostname).get_dict()
pids = e_all.apply_async(os.getpid).get_dict()
and then it's up to you to send signals to those processes whenever.
An approach that ought to work (on posix) would be to use signal.alarm
to interrupt executions if they take too long:
import signal
import time
def interrupt_alarm(*exc_info):
"""raise KeyboardInterrupt on SIGALRM"""
print("got alarm!")
raise KeyboardInterrupt()
previous_handle = signal.signal(signal.SIGALRM, interrupt_alarm)
timeout = 2
signal.alarm(timeout)
# here is where your real task goes. If it takes longer than timeout, it will be interrupted.
# this assumes it is interruptible.
time.sleep(timeout + 1)
# got here, we finished. Make sure to clear the alarm
signal.alarm(0)
# may want to clear the alarm handler, but it's also okay to leave it raising interrupts instead of killing the process
signal.signal(signal.SIGALRM, previous_handle)
With the signal/restart/streaming features we have now in 8.0, I think there's a simple missing feature: add a client-side timeout to the parallel magics. The situation is much improved, though:
%px
streams output and errors immediately as they happen, so if one engine actually raised or produced useful error output, it will show up immediately to give you the hint that it might not finishSo the only missing feature is really an optional %%px --timeout
to automatically stop the client waiting. Though due to streaming, it will no longer result in more or earlier feedback about the failure, only halting of the cell.
I would like to be able to set a timeout PX magics with running in blocked mode. Currently the cell can run infinitely which is problematic in our use case: we rely on MPI and barriers, and if one worker fails, the barrier holds up all the other workers, making the cell “freeze” infinitely. The only option is to visually observe this and infer that this happened from other sources (log files, etc) and kill the ipyparallel engines.