rayosborn commented 6 months ago

Bug report

Bug description:

After launching jobs using a ProcessPoolExecutor instance and then shutting the instance down after they are complete, a subprocess launched by the executor to hold a semaphore lock is not shut down. This appears to be the reason why some jobs submitted to the batch queue of a distributed cluster are not terminated and have to be deleted manually.

I have confirmed the persistence of the process using the following code running on MacOS running Python 3.11 and Linux running Python 3.9.

import time

from concurrent.futures import ProcessPoolExecutor, as_completed

def test_function(i):
    time.sleep(20)
    return i

def test_pool():
    with ProcessPoolExecutor(max_workers=6) as executor:
        futures = []
        result = []
        for i in range(6):
            futures.append(executor.submit(test_function, i))
        for future in as_completed(futures):
            result.append(future.result())

    print(len(result))

if __name__ == '__main__':
    test_pool()

The complication is that, if you run this from the command line, the code will complete as expected. To see the problem, you have to embed the functions in an importable module and run the test_pool function in a debugger. I ran this in an interactive IPython shell within the NeXpy application.

Here are screenshots taken in VS Code, showing the processes before, during, and after running test_pool.

concurrent

The issue is that I believe that the additional process (9812 in the above example) should be shut down when the executor's shutdown function is called. I have tried to modify the standard shutdown function to join and close the executor._call_queue and tried to release the executor._mp_context._rlock, which I think is what launches the additional process, but none of these shut it down.

CPython versions tested on:

3.9, 3.11

Operating systems tested on:

Linux, macOS

rayosborn commented 6 months ago

I have just discovered that the additional process is not created when mp_context is forced to be 'fork' instead of 'spawn.' I thought that 'fork' was the default, but it appears that is not the case. The process is a ResourceTracker process, so if this is a bug, the issue is whether the shutdown of the ProcessPoolExecutor should include a call to stop this process when running with 'spawn'.

rayosborn commented 6 months ago

I posted a related question on StackOverflow, where I was informed that there is a private _stop function in the resource_tracker module that will shut down the tracker. I have confirmed that this worked as I had hoped. So my question now is whether there is a reason why this cannot be added to the ProcessPoolExecutor code, at least as an option.

For example, we could add another keyword argument to ProcessPoolExecutor.__init__, shutdown_tracker=False, creating a new private attribute, _shutdown_tracker, and then add the following to the end of the shutdown method.

        self._executor_manager_thread_wakeup = None
        if self._shutdown_tracker and self._mp_context.get_start_method(allow_none=False) != 'fork':
            import resource_tracker
            resource_tracker._resource_tracker._stop()

Are there good reasons not to make this an option? I am happy to submit a PR if there are no strong objections, because this is preventing us from using the 'spawn' method when submitting jobs to a distributed cluster.

python / cpython

Persistent resource tracker after shutdown of ProcessPoolExecutor #118918

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on: