uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.38k stars 89 forks source link

AssertionError for large data size but ok for small ones #150

Open p1m3nt opened 6 years ago

p1m3nt commented 6 years ago

I'm running a multiprocess routine to process very large numpy array / pandas dataframe(roughly 1e8 entries), the problem is when I wrap the routine in a loop, it runs fine for the first loop, but complains when the second one starts:

Process SpawnPoolWorker-25:
Process SpawnPoolWorker-21:
Process SpawnPoolWorker-22:
Process SpawnPoolWorker-24:
Process SpawnPoolWorker-23:
Process SpawnPoolWorker-26:
Process SpawnPoolWorker-27:
Process SpawnPoolWorker-28:
Process SpawnPoolWorker-29:
Process SpawnPoolWorker-30:
Process SpawnPoolWorker-31:
Process SpawnPoolWorker-32:
Process SpawnPoolWorker-33:
Process SpawnPoolWorker-34:
Process SpawnPoolWorker-35:
Process SpawnPoolWorker-36:
Process SpawnPoolWorker-37:
Process SpawnPoolWorker-38:
Process SpawnPoolWorker-39:
Process SpawnPoolWorker-40:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):  
File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\pool.py", line 108, in worker
    task = get()
  File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\connection.py", line 219, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\queues.py", line 338, in get
    res = self._reader.recv_bytes()
  File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\queues.py", line 338, in get
    res = self._reader.recv_bytes()
  File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\connection.py", line 340, in _get_more_data
    assert left > 0
  File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\connection.py", line 340, in _get_more_data
    assert left > 0
  File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\queues.py", line 338, in get
    res = self._reader.recv_bytes()
  File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\connection.py", line 321, in _recv_bytes
    return self._get_more_data(ov, maxsize)
  File "C:\ProgramData\Anaconda3\envs\py35-cpu\lib\site-packages\multiprocess\connection.py", line 219, in recv_bytes
    buf = self._recv_bytes(maxlength)
AssertionError

Also the loop works fine with smaller data size (~1e6 entries). I searched around and found similar problem with the built-in multiprocessing package: https://stackoverflow.com/questions/47692566/python-multiprocessing-apply-async-assert-left-0-assertionerror which has been reported.

I was wondering if my problem is related to the same issue and how to resolve it except for using smaller data size? my code is as:

    def loop_run(self):
        pool = ProcessPool(nodes=self.n_thread)`
        count = 0
        jobs = self.get_jobs()
        for a in range(10):
              if count != 0:
                 pool.restart()
              self.set_a(a)
              outputs, out = pool.map(func, jobs), []
              for i, out_ in enumerate(outputs, 1):
                 out.append(out_)
              pool.close()
              pool.join()
              count += 1

Thanks.