numpy / numpy

The fundamental package for scientific computing with Python.
https://numpy.org
Other
27.61k stars 9.91k forks source link

numpy.seed hangs in child processes (minimal example provided) #9248

Closed mohamed-ezz closed 5 years ago

mohamed-ezz commented 7 years ago

I have a python script that concurrently processes numpy arrays and images in a random way. To have proper randomness inside the spawned processes I pass a random seed from the main process to the workers for them to be seeded.

When I use maxtasksperchild for the Pool, my script hangs after running Pool.map a number of times.

The following is a minimal snippet that reproduces the problem :

from multiprocessing import Pool
import numpy as np

def worker(n):
    # Removing np.random.seed solves the issue
    np.random.seed(1) #any seed value
    return 1234 # trivial return value

# Removing maxtasksperchild solves the issue
ppool = Pool(20 , maxtasksperchild=5)
i=0
while True:
    i += 1
    # Removing np.random.randint(10) or taking it out of the loop solves the issue
    rand = np.random.randint(10)
    l  = [3] # trivial input to ppool.map
    result = ppool.map(worker, l)
    print i,result[0]

This is the output :

1 1234
2 1234
3 1234
.
.
.
99 1234
100 1234 # at this point workers should've reached maxtasksperchild tasks
101 1234
102 1234
103 1234
104 1234
105 1234
106 1234
107 1234
108 1234
109 1234
110 1234
[hangs here indefinitely]

This snippet hangs on the following platforms : OS : Linux(Ubuntu) and OSX Python version : 2.7.10 Numpy versions : 1.11.0, 1.12.0, 1.13.0

If I replace np.random.randint(10) with np.random.random() then it works fine with Numpy 1.11 but not with Numpy 1.12 or 1.13.

charris commented 7 years ago

I don't know how you are doing this, but I hope that each process has its own instance of RandomState so that each process carries its own rng state. Apart from that, ISTR a recent report of a lock error -- unlocking an unlocked lock, or some such -- so there might be a locking problem, either in cpython or the cython code.

bashtage commented 7 years ago

This line

np.random.seed(1) #any seed value

should be

np.random.RandomState(1)

if you want different sets of random values across workers. Seeding is lock protected so in theory this should not cause a hang.

njsmith commented 7 years ago

When you fork, do the children get copies of the parent's lock, or do they all end up sharing a lock?

njsmith commented 7 years ago

(Also, while this oughtn't crash and we should fix it if we can: a good piece of general life advice is to never use any global np.random functions directly, and instead always do r = np.random.RandomState(seed) and then call methods on r. And in particular I suspect it would avoid the issue you're running into here.)

pv commented 7 years ago

This also occurs on linux. After fork(), child process should end up with only one thread, with mutexes in the state they happened to be in (potentially deadlocking if another thread locked them). AFAIK: basic mutexes are per-process, but it's possible to create special synchronization objects with shared memory, and those are shared. I don't know what atfork handlers there are though, and what multiprocessing does. . But here it looks like there is only one thread in every process, modulo BLAS threads? So it's not so obvious why it hangs...

mohamed-ezz commented 7 years ago

@charris @bashtage r = np.random.RandomState(1) has to be passed around for every time a random function is to be called AFAIK.

In my use real usecase, I have callbacks given to workers by the user, and those callbacks likely use numpy.random functions. I don't want the user (callback provider) to see a difference when their callbacks are called in a concurrent mode or in single-process mode. If I'm to use r = nupmy.random.RandomState(seed), I have to pass it to the callbacks and the user will need to inconveniently pass it too to all downstream functions as an argument.

Setting the process-global seed via numpy.seed seems like the way to go in my case and there's no reason for it not to work.

njsmith commented 7 years ago

Yeah, it looks like numpy.random uses a simple threading.Lock, which on Linux is secretly a POSIX semaphore, allocated here. POSIX semaphores can be cross-process, but the signature for sem_init is

int sem_init(sem_t *sem, int pshared, unsigned int value);

where pshared controls whether it can be shared across processes, so CPython's call to sem_init(lock, 0, 1) should be creating a semaphore that is not shared.

There aren't any calls to pthread_atfork in CPython or numpy.

njsmith commented 7 years ago

Oh.

multiprocessing.Pool spawns a background thread to manage workers: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L170-L173

It loops in the background calling _maintain_pool: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L366

If a worker exits, for example due to a maxtasksperchild limit, then _maintain_pool calls _repopulate_pool: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L240

And then _repopulate_pool forks some new workers, still in this background thread: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L224

So what's happening is that eventually you get unlucky, and at the same moment that your main thread is calling some np.random function and holding the lock, multiprocessing decides to fork a child, which starts out with the np.random lock already held but the thread that was holding it is gone. Then the child tries to call into np.random, which requires taking the lock, and so the child deadlocks.

The simple workaround here is to not use fork with multiprocessing. If you use the spawn or forkserver start methods then this should go away.

For a proper fix.... ughhh. I guess we.. need to register a pthread_atfork pre-fork handler that takes the np.random lock before fork and then releases it afterwards? And really I guess we need to do this for every lock in numpy, which requires something like keeping a weakset of every RandomState object, and _FFTCache also appears to have a lock...

(On the plus side, this would also give us an opportunity to reinitialize the global random state in the child, which we really should be doing in cases where the user hasn't explicitly seeded it.)

pv commented 7 years ago

This also hangs without Numpy, if you replace all np.random calls with with lock: pass with a global threading.Lock defined on top. Interestingly, it doesn't hang on Python 3.

bashtage commented 7 years ago

In my use real usecase, I have callbacks given to workers by the user, and those callbacks likely use numpy.random functions. I don't want the user (callback provider) to see a difference when their callbacks are called in a concurrent mode or in single-process mode.

I'm not sure they would see a difference since the execution order of workers in a multiprocess setup is effectively random anyway. And if you are setting the seed or restoring the state to ensure things are internally consistent, these can both be done as well with new instances of RandomState as they can with the global.

bashtage commented 7 years ago

FWIW It does not hang on Windows on either Python 2 or 3

njsmith commented 7 years ago

Interestingly, it doesn't hang on Python 3.

Python 2 and Python 3 have very different GIL implementations. It's possible this is just a bit of luck in how the threads are scheduled, where on Python 3 the multiprocessing thread never gets to run during the with lock: pass?

njsmith commented 7 years ago

Making it with lock: time.sleep(1) might make it much easier to hit.

pv commented 7 years ago

Correct, with sleeps added it hangs on Python 3.

njsmith commented 7 years ago

@bashtage: that's expected, because on Windows there is no fork, so these weird sharing issues don't arise – Windows always uses the spawn method I mentioned above as a possible workaround. (The trade-off is that on Windows spawning processes is much slower and some clever efficiency tricks are ruled out.)

mohamed-ezz commented 7 years ago

@njsmith set_start_method is in Python3 only, so with Python2 i'm not sure if one can change the process creation

pv commented 7 years ago

TBH, I'm not sure if there's any way to fix this in Python < 3.7, while retaining threadsafety in np.random. . os.register_at_fork is added only in Python 3.7. https://bugs.python.org/issue16500 . pthread_atfork is probably hard (what are you going to do with the GIL?)

njsmith commented 7 years ago

os.register_at_fork is added only in Python 3.7.

Whoa, they finally did it!

pthread_atfork is probably hard (what are you going to do with the GIL?)

Good question. The code calling fork either holds the GIL or not. If it does, then it will be held when the handlers are called, and all is fine. And if it doesn't, then running python in the child process isn't safe anyway, so we don't need to worry about trying to make it safe, so the handlers can detect this case and skip doing anything. I'm not sure how you detect if the GIL is held, but PyThreadState_Ensure does it so it must be possible, I hope?

OTOH if @gpshead and @pitrou think this won't work then I'm probably missing something. I guess code calling fork with the GIL held might be surprised if we start temporarily dropping the GIL internally? Even if we reacquire it again before returning there's some potential for weird effects there.

bashtage commented 7 years ago

Possibly related seg fault on FreeBSD 11.1

Running the test suite

nosetests numpy.random

on the default NumPy package for FreeBSD 11.1 amd64 produces a seg fault in test_random.TestThread.* (the actual function varies, but it is usually the second of thrid among test_exp, test_normal and test_multinomial.

charris commented 6 years ago

Any new ideas of how to deal with this?

EDIT: Do we want to wait on Python 3.7 and make the fix version dependent?

pitrou commented 6 years ago

Note that, from a user's POV, this is trivially fixed (on Python 3) by using the "forkserver" method (instead of "fork") in multiprocessing.

mattip commented 5 years ago

Closing, the new random APIs provide a way to seed child processes cleanly. Please reopen if I misunderstood.