Multiprocessing.Pool hangs after re-spawning several worker process.

c553dae6-f62f-4844-b53a-1065eb620481 commented 6 years ago

BPO	31886
Nosy	@pitrou, @ogrisel, @applio, @tomMoral

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.7', 'type-bug', 'library'] title = 'Multiprocessing.Pool hangs after re-spawning several worker process.' updated_at = user = 'https://bugs.python.org/olarn' ``` bugs.python.org fields: ```python activity = actor = 'pitrou' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'olarn' dependencies = [] files = [] hgrepos = [] issue_num = 31886 keywords = [] message_count = 2.0 messages = ['305124', '305185'] nosy_count = 5.0 nosy_names = ['pitrou', 'Olivier.Grisel', 'davin', 'tomMoral', 'olarn'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue31886' versions = ['Python 2.7', 'Python 3.6', 'Python 3.7'] ```

c553dae6-f62f-4844-b53a-1065eb620481 commented 6 years ago

Multiprocessing's pool apparently attempts to repopulate the pool in an event of sub-process worker crash. However the pool seems to hangs after about ~ 4*(number of worker) process re-spawns.

I've tracked the issue down to queue.get() stalling at multiprocessing.pool, line 102

Is this a known issue? Are there any known workaround?

To reproduce this issue:

import multiprocessing
import multiprocessing.util
import logging

multiprocessing.util._logger = multiprocessing.util.log_to_stderr(logging.DEBUG)
import time
import ctypes

def crash_py_interpreter():
    print("attempting to crash the interpreter in ", multiprocessing.current_process())
    i = ctypes.c_char('a'.encode())
    j = ctypes.pointer(i)
    c = 0
    while True:
        j[c] = 'a'.encode()
        c += 1
    j

def test_fn(x):
    print("test_fn in ", multiprocessing.current_process().name, x)
    exit(0)

    time.sleep(0.1)

if __name__ == '__main__':

    # pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
    pool = multiprocessing.Pool(processes=1)

    args_queue = [n for n in range(20)]

    # subprocess quits
    pool.map(test_fn, args_queue)

# subprocess crashes
# pool.map(test_fn,queue)

pitrou commented 6 years ago

Generally speaking, queues can remain in an inconsistent state after a process crash (because the process might have crashed just after acquiring a shared semaphore or sending part of a large message). It's not obvious to me how we could make them safer, at least under Unix where there's no widely-available message-oriented communication facility that I know of.

python / cpython

Multiprocessing.Pool hangs after re-spawning several worker process. #76067