Open c553dae6-f62f-4844-b53a-1065eb620481 opened 6 years ago
Multiprocessing's pool apparently attempts to repopulate the pool in an event of sub-process worker crash. However the pool seems to hangs after about ~ 4*(number of worker) process re-spawns.
I've tracked the issue down to queue.get() stalling at multiprocessing.pool, line 102
Is this a known issue? Are there any known workaround?
To reproduce this issue:
import multiprocessing
import multiprocessing.util
import logging
multiprocessing.util._logger = multiprocessing.util.log_to_stderr(logging.DEBUG)
import time
import ctypes
def crash_py_interpreter():
print("attempting to crash the interpreter in ", multiprocessing.current_process())
i = ctypes.c_char('a'.encode())
j = ctypes.pointer(i)
c = 0
while True:
j[c] = 'a'.encode()
c += 1
j
def test_fn(x):
print("test_fn in ", multiprocessing.current_process().name, x)
exit(0)
time.sleep(0.1)
if __name__ == '__main__':
# pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
pool = multiprocessing.Pool(processes=1)
args_queue = [n for n in range(20)]
# subprocess quits
pool.map(test_fn, args_queue)
# subprocess crashes
# pool.map(test_fn,queue)
Generally speaking, queues can remain in an inconsistent state after a process crash (because the process might have crashed just after acquiring a shared semaphore or sending part of a large message). It's not obvious to me how we could make them safer, at least under Unix where there's no widely-available message-oriented communication facility that I know of.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.7', 'type-bug', 'library']
title = 'Multiprocessing.Pool hangs after re-spawning several worker process.'
updated_at =
user = 'https://bugs.python.org/olarn'
```
bugs.python.org fields:
```python
activity =
actor = 'pitrou'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'olarn'
dependencies = []
files = []
hgrepos = []
issue_num = 31886
keywords = []
message_count = 2.0
messages = ['305124', '305185']
nosy_count = 5.0
nosy_names = ['pitrou', 'Olivier.Grisel', 'davin', 'tomMoral', 'olarn']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue31886'
versions = ['Python 2.7', 'Python 3.6', 'Python 3.7']
```