uqfoundation / pathos

parallel graph management and execution in heterogeneous computing
http://pathos.rtfd.io
Other
1.38k stars 89 forks source link

Pathos with Pytorch and CUDA #175

Open DonkeyShot21 opened 5 years ago

DonkeyShot21 commented 5 years ago

Hi, thanks for the awesome tool!

I am trying to do multiprocessing in a class that uses Pytorch.

This is what I am doing:

from pathos.multiprocessing import ProcessingPool as Pool
import multiprocess.context as ctx
ctx._force_start_method('spawn') # fixes problems with CUDA and fork

class RNN(nn.Module):
     # do RNN stuff

class MyClass:
      def __init__(self, args):
          self.rnn_model = RNN(...)

    def predict_single(...):
        # predict single sequence

    def predict(...):
        # predict all the sequences
        pool = Pool(args.num_workers)
        return pool.map(self.predict_single, test_sequences, args_list)

I am using pathos to circumvent the problem with python's standard multiprocessing that cannot work with class methods (pickle complains).

Now, the code set up like that works, the problem is that I encounter another problem, that is related to CUDA and Pytorch. Basically the GPU goes Out Of Memory (OOM) because the threads keep growing instead of releasing their memory. Again, seems to be a well known problem that can be solved using torch.multiprocessing instead of standard multiprocessing.

The question is: How can I use both pathos (solves class method problem) and torch.multiprocessing (solves OOM problem)?

bastiaanzwanenburg commented 4 years ago

I'm dealing with a similar problem, have you found a solution in the meantime?

DonkeyShot21 commented 4 years ago

I ended up using torch.multiprocessing . I don't remember exactly what I did but you can find some info here https://github.com/pytorch/pytorch/issues/26344. Also if you want you can take a look at the implementation of the parallel_predict function of the uis-rnn https://github.com/google/uis-rnn/blob/master/uisrnn/uisrnn.py