Closed mohamed-ezz closed 5 years ago
I don't know how you are doing this, but I hope that each process has its own instance of RandomState so that each process carries its own rng state. Apart from that, ISTR a recent report of a lock error -- unlocking an unlocked lock, or some such -- so there might be a locking problem, either in cpython or the cython code.
This line
np.random.seed(1) #any seed value
should be
np.random.RandomState(1)
if you want different sets of random values across workers. Seeding is lock protected so in theory this should not cause a hang.
When you fork, do the children get copies of the parent's lock, or do they all end up sharing a lock?
(Also, while this oughtn't crash and we should fix it if we can: a good piece of general life advice is to never use any global np.random
functions directly, and instead always do r = np.random.RandomState(seed)
and then call methods on r
. And in particular I suspect it would avoid the issue you're running into here.)
This also occurs on linux. After fork(), child process should end up with only one thread, with mutexes in the state they happened to be in (potentially deadlocking if another thread locked them). AFAIK: basic mutexes are per-process, but it's possible to create special synchronization objects with shared memory, and those are shared. I don't know what atfork handlers there are though, and what multiprocessing does. . But here it looks like there is only one thread in every process, modulo BLAS threads? So it's not so obvious why it hangs...
@charris @bashtage r = np.random.RandomState(1)
has to be passed around for every time a random function is to be called AFAIK.
In my use real usecase, I have callbacks given to workers by the user, and those callbacks likely use numpy.random
functions. I don't want the user (callback provider) to see a difference when their callbacks are called in a concurrent mode or in single-process mode. If I'm to use r = nupmy.random.RandomState(seed)
, I have to pass it to the callbacks and the user will need to inconveniently pass it too to all downstream functions as an argument.
Setting the process-global seed via numpy.seed
seems like the way to go in my case and there's no reason for it not to work.
Yeah, it looks like numpy.random
uses a simple threading.Lock
, which on Linux is secretly a POSIX semaphore, allocated here. POSIX semaphores can be cross-process, but the signature for sem_init
is
int sem_init(sem_t *sem, int pshared, unsigned int value);
where pshared
controls whether it can be shared across processes, so CPython's call to sem_init(lock, 0, 1)
should be creating a semaphore that is not shared.
There aren't any calls to pthread_atfork
in CPython or numpy.
Oh.
multiprocessing.Pool
spawns a background thread to manage workers: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L170-L173
It loops in the background calling _maintain_pool
: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L366
If a worker exits, for example due to a maxtasksperchild
limit, then _maintain_pool
calls _repopulate_pool
: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L240
And then _repopulate_pool
forks some new workers, still in this background thread: https://github.com/python/cpython/blob/aefa7ebf0ff0f73feee7ab24f4cdcb2014d83ee5/Lib/multiprocessing/pool.py#L224
So what's happening is that eventually you get unlucky, and at the same moment that your main thread is calling some np.random
function and holding the lock, multiprocessing
decides to fork a child, which starts out with the np.random
lock already held but the thread that was holding it is gone. Then the child tries to call into np.random
, which requires taking the lock, and so the child deadlocks.
The simple workaround here is to not use fork
with multiprocessing
. If you use the spawn
or forkserver
start methods then this should go away.
For a proper fix.... ughhh. I guess we.. need to register a pthread_atfork
pre-fork handler that takes the np.random
lock before fork and then releases it afterwards? And really I guess we need to do this for every lock in numpy, which requires something like keeping a weakset of every RandomState
object, and _FFTCache
also appears to have a lock...
(On the plus side, this would also give us an opportunity to reinitialize the global random state in the child, which we really should be doing in cases where the user hasn't explicitly seeded it.)
This also hangs without Numpy, if you replace all np.random calls with
with lock: pass
with a global threading.Lock defined on top.
Interestingly, it doesn't hang on Python 3.
In my use real usecase, I have callbacks given to workers by the user, and those callbacks likely use numpy.random functions. I don't want the user (callback provider) to see a difference when their callbacks are called in a concurrent mode or in single-process mode.
I'm not sure they would see a difference since the execution order of workers in a multiprocess setup is effectively random anyway. And if you are setting the seed or restoring the state to ensure things are internally consistent, these can both be done as well with new instances of RandomState as they can with the global.
FWIW It does not hang on Windows on either Python 2 or 3
Interestingly, it doesn't hang on Python 3.
Python 2 and Python 3 have very different GIL implementations. It's possible this is just a bit of luck in how the threads are scheduled, where on Python 3 the multiprocessing thread never gets to run during the with lock: pass
?
Making it with lock: time.sleep(1)
might make it much easier to hit.
Correct, with sleeps added it hangs on Python 3.
@bashtage: that's expected, because on Windows there is no fork
, so these weird sharing issues don't arise – Windows always uses the spawn
method I mentioned above as a possible workaround. (The trade-off is that on Windows spawning processes is much slower and some clever efficiency tricks are ruled out.)
@njsmith set_start_method
is in Python3 only, so with Python2 i'm not sure if one can change the process creation
TBH, I'm not sure if there's any way to fix this in Python < 3.7, while retaining threadsafety in np.random. . os.register_at_fork is added only in Python 3.7. https://bugs.python.org/issue16500 . pthread_atfork is probably hard (what are you going to do with the GIL?)
os.register_at_fork is added only in Python 3.7.
Whoa, they finally did it!
pthread_atfork is probably hard (what are you going to do with the GIL?)
Good question. The code calling fork
either holds the GIL or not. If it does, then it will be held when the handlers are called, and all is fine. And if it doesn't, then running python in the child process isn't safe anyway, so we don't need to worry about trying to make it safe, so the handlers can detect this case and skip doing anything. I'm not sure how you detect if the GIL is held, but PyThreadState_Ensure
does it so it must be possible, I hope?
OTOH if @gpshead and @pitrou think this won't work then I'm probably missing something. I guess code calling fork
with the GIL held might be surprised if we start temporarily dropping the GIL internally? Even if we reacquire it again before returning there's some potential for weird effects there.
Possibly related seg fault on FreeBSD 11.1
Running the test suite
nosetests numpy.random
on the default NumPy package for FreeBSD 11.1 amd64 produces a seg fault in test_random.TestThread.*
(the actual function varies, but it is usually the second of thrid among test_exp
, test_normal
and test_multinomial
.
Any new ideas of how to deal with this?
EDIT: Do we want to wait on Python 3.7 and make the fix version dependent?
Note that, from a user's POV, this is trivially fixed (on Python 3) by using the "forkserver" method (instead of "fork") in multiprocessing.
Closing, the new random APIs provide a way to seed child processes cleanly. Please reopen if I misunderstood.
I have a python script that concurrently processes numpy arrays and images in a random way. To have proper randomness inside the spawned processes I pass a random seed from the main process to the workers for them to be seeded.
When I use maxtasksperchild for the Pool, my script hangs after running Pool.map a number of times.
The following is a minimal snippet that reproduces the problem :
This is the output :
This snippet hangs on the following platforms : OS : Linux(Ubuntu) and OSX Python version : 2.7.10 Numpy versions : 1.11.0, 1.12.0, 1.13.0
If I replace
np.random.randint(10)
withnp.random.random()
then it works fine with Numpy 1.11 but not with Numpy 1.12 or 1.13.