Closed scttnlsn closed 2 months ago
FWIW I can change this to multiprocessing.get_context("spawn").Pool()
and the Procrastinate worker does not die on a multiprocessing timeout. I pay a performance penalty though since now I need to copy a bunch of data into each child process.
Another workaround is to call the following from the multiprocessing
child task:
import multiprocessing
import time
import signal
def sleep(sec):
signal.set_wakeup_fd(-1)
signal.signal(signal.SIGTERM, signal.SIG_DFL)
signal.signal(signal.SIGINT, signal.SIG_DFL)
time.sleep(sec)
@app.task(name="test_multiprocessing")
def test_multiprocessing():
with multiprocessing.Pool(maxtasksperchild=1) as pool:
future = pool.apply_async(sleep, (10,))
result = future.get(timeout=3)
print("done")
That seems to decouple the signal handling between the parent (Procrastinate worker) and the children (processes in the multiprocessing.Pool
). I don't completely understand the ramifications of this yet though so not sure if it's a proper solution.
EDIT: Nevermind - this is still terminating the parent process.
Anyway, I don't think this is anything related to Procrastinate so I'm going to close the issue.
I have a use case that involves a
multiprocessing.Pool
inside of a single Procrastinate task. When amultiprocessing
timeout happens I believe a signal is sent to the child process to terminate it and I suspect this is interfering with Procrastinate's own signal handling.Here's a minimal reproduction:
In the Procrastinate worker logs I see:
And then the worker process terminates.
Is there an obvious mistake I'm making? I know this is an unusual use case but I have a bunch of parallel computation to do and it's been easier to handle all that inside a single Procrastinate task so far (spreading it across multiple Procrastinate tasks and then collecting the results would require a bit of architecture changes - which I may do in the future).
I haven't dug into the nitty gritty of the signals yet but wanted to open this issue early in case you had any ideas about what specifically might be going on. If this turns out to be an actual bug I would be happy to attempt working on a fix with your guidance.