Open rayosborn opened 6 months ago
I have just discovered that the additional process is not created when mp_context
is forced to be 'fork' instead of 'spawn.' I thought that 'fork' was the default, but it appears that is not the case. The process is a ResourceTracker process, so if this is a bug, the issue is whether the shutdown of the ProcessPoolExecutor should include a call to stop this process when running with 'spawn'.
I posted a related question on StackOverflow, where I was informed that there is a private _stop
function in the resource_tracker module that will shut down the tracker. I have confirmed that this worked as I had hoped. So my question now is whether there is a reason why this cannot be added to the ProcessPoolExecutor
code, at least as an option.
For example, we could add another keyword argument to ProcessPoolExecutor.__init__
, shutdown_tracker=False
, creating a new private attribute, _shutdown_tracker
, and then add the following to the end of the shutdown
method.
self._executor_manager_thread_wakeup = None
if self._shutdown_tracker and self._mp_context.get_start_method(allow_none=False) != 'fork':
import resource_tracker
resource_tracker._resource_tracker._stop()
Are there good reasons not to make this an option? I am happy to submit a PR if there are no strong objections, because this is preventing us from using the 'spawn' method when submitting jobs to a distributed cluster.
Bug report
Bug description:
After launching jobs using a ProcessPoolExecutor instance and then shutting the instance down after they are complete, a subprocess launched by the executor to hold a semaphore lock is not shut down. This appears to be the reason why some jobs submitted to the batch queue of a distributed cluster are not terminated and have to be deleted manually.
I have confirmed the persistence of the process using the following code running on MacOS running Python 3.11 and Linux running Python 3.9.
The complication is that, if you run this from the command line, the code will complete as expected. To see the problem, you have to embed the functions in an importable module and run the test_pool function in a debugger. I ran this in an interactive IPython shell within the NeXpy application.
Here are screenshots taken in VS Code, showing the processes before, during, and after running
test_pool
.The issue is that I believe that the additional process (9812 in the above example) should be shut down when the executor's
shutdown
function is called. I have tried to modify the standard shutdown function to join and close theexecutor._call_queue
and tried to release theexecutor._mp_context._rlock
, which I think is what launches the additional process, but none of these shut it down.CPython versions tested on:
3.9, 3.11
Operating systems tested on:
Linux, macOS