python / cpython

The Python programming language
https://www.python.org
Other
63.86k stars 30.57k forks source link

Potential memory leak in multiprocessing #50902

Closed 48451349-1324-4d2a-a174-2b483bba355e closed 10 years ago

48451349-1324-4d2a-a174-2b483bba355e commented 15 years ago
BPO 6653
Nosy @pitrou
Files
  • test.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['library', 'performance'] title = 'Potential memory leak in multiprocessing' updated_at = user = 'https://bugs.python.org/jnoller' ``` bugs.python.org fields: ```python activity = actor = 'pitrou' assignee = 'jnoller' closed = True closed_date = closer = 'pitrou' components = ['Library (Lib)'] creation = creator = 'jnoller' dependencies = [] files = ['14660'] hgrepos = [] issue_num = 6653 keywords = [] message_count = 7.0 messages = ['91332', '91333', '91334', '91335', '185597', '185602', '228511'] nosy_count = 5.0 nosy_names = ['pitrou', 'jnoller', 'asksol', 'schlesin', 'sbt'] pr_nums = [] priority = 'normal' resolution = 'wont fix' stage = 'resolved' status = 'closed' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue6653' versions = ['Python 2.7', 'Python 3.3', 'Python 3.4'] ```

    48451349-1324-4d2a-a174-2b483bba355e commented 15 years ago

    I have example code to show this. It creates a system-wide memory leak on Linux/Unix (present until the next reboot), unless the last statement in the target of mp.Process ensures a manual clean up of the globals.

    The problem is line 353 in multiprocessing/forking.py. The function exit() is defined as os._exit on Linux and ExitProcess on Windows, none of which allows normal clean up.

    >>> help(os._exit)
    Help on built-in function _exit in module nt:
    _exit(...)
       _exit(status)

    Exit to the system with specified status, without normal exit processing.

    The problem is fixed if line 353 in forking.py is changed from

    exit(exitcode)

    to

    sys.exit(exitcode)

    Test run without bugfix:

    G:\DEVELO\~1\SHARED\~2>python test.py open handle to 569f439b24e24fc8a547b81932616066 [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] open handle to 0582d4b161c546f582c1c96e7bd0c39d open handle to 569f439b24e24fc8a547b81932616066 modified array closed handle to 569f439b24e24fc8a547b81932616066 [[ 1. 1. 1. 0.] [ 1. 1. 1. 0.]] closed handle to 569f439b24e24fc8a547b81932616066

    You can see here that opening and closing of handles are unmatched. This is on Windows, where the kernel ensures the clean-up, so it may not matter. But on Unix this would have created a permament (system wide) memory leak! What is happening here is globals not being cleaned up due to the use of os._exit instead of sys.exit.

    Test run with bugfix:

    G:\DEVELO\~1\SHARED\~2>python test.py open handle to 930778d27b414253bc329f2b70adaa05 [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] open handle to 3f6cebf8c5de413685bb770d02ae9666 open handle to 930778d27b414253bc329f2b70adaa05 modified array closed handle to 930778d27b414253bc329f2b70adaa05 closed handle to 3f6cebf8c5de413685bb770d02ae9666 [[ 1. 1. 1. 0.] [ 1. 1. 1. 0.]] closed handle to 930778d27b414253bc329f2b70adaa05

    Now all allocations and deallocations are matched.

    Regards, Sturla Molden

    48451349-1324-4d2a-a174-2b483bba355e commented 15 years ago

    Additional comments from Sturla:

    Hello Jesse,

    Yes there is a bug in multiprocessing.

    Diagnosis:

    Sub-processes seem to commit an ungraceful suicide on exit. If the kernel cleans up after a non-graceful exit this is ok. But if the kernel do not, as in the case of System V IPC objects, we have a permanent resource leak. This is very similar to the reason why manually killing threads is prohibited in Python.

    I have example code to show this. It creates a system-wide memory leak on Linux/Unix (present until the next reboot), unless the last statement in the target of mp.Process ensures a manual clean up of the globals.

    48451349-1324-4d2a-a174-2b483bba355e commented 15 years ago

    Calling os.exit in a child process may be dangerous. It can cause unflushed buffers to be flushed twice: once in the parent and once in the child.

    I assume you mean sys.exit. If this is the case, multiprocessing needs a mechanism to chose between os._exit and sys.exit for child processes. Calling os._exit might also be dangerous because it could prevent necessary clean-up code from executing (e.g. in C extensions). I had a case where shared memory on Linux (System V IPC) leaked due to os._exit. The deallocator for my extension type never got to execute in child processes. The deallocator was needed to release the shared segment when its reference count dropped to 0. Changing to sys.exit solved the problem. On Windows there was no leak, because the kernel did the reference counting.

    48451349-1324-4d2a-a174-2b483bba355e commented 15 years ago

    In the future please use the bug tracker to file and track bugs with, so things are not as lossy.

    Ok, sorry :) Also see Piet's comment here. He has a valid case against sys.exit in some cases. Thus it appears that both ways of shutting down child processes might be dangerous: If we don't want buffers to flush we have to use os._exit. If we want clean-up code to execute we have to use sys.exit. If we want both we are screwed. :(

    pitrou commented 11 years ago

    Richard, do you think this is an actual concern?

    e26428b1-70cf-4e9f-ae3c-9ef0478633fb commented 11 years ago

    I don't think this is a bug -- processes started with fork() should nearly always be exited with _exit(). And anyway, using sys.exit() does *not* guarantee that all deallocators will be called. To be sure of cleanup at exit you could use (the undocumented) multiprocessing.util.Finalize().

    Note that Python 3.4 on Unix will probably offer the choice of using os.fork()/os._exit() or _posixsubprocess.fork_exec()/sys.exit() for starting/exiting processes on Unix.

    Sturla's scheme for doing reference counting of shared memory is also flawed because reference counts can fall to zero while a shared memory object is in a pipe/queue, causing the memory to be prematurely deallocated.

    I think a more reliable scheme would be to use fds created using shm_open(), immediately unlinking the name with shm_unlink(). Then one could use the existing infrastructure for fd passing and let the operating system handle the reference counting. This would prevent leaked shared memory (unless the process is killed in between shm_open() and shm_unlink()). I would like to add something like this to multiprocessing.

    pitrou commented 10 years ago

    Closing as won't fix.