use 'shared' with GPU and mupltiprocessing

lidongke commented 4 years ago

Hi~

After the issue #421 , I changed your code to async sampling and training. Now i have a subprocess(P1) created by the main process(P2) . P2 run with the env and sampling data , then P2 give the data to P1 by multiprocessing.Queue(), P1 get the data to replay buffer and training.Due to i am using "shared" mode, the global nets will be optimized by training in P2 and it also can be used by sampling in P1. Here i can use async sampling and training with CPU, i test that is correct. But i still want to increase the training speed.So that i want to change the training to GPU. First, i get the CUDA initialization error , i refactor my code and use 'spawn' start method. After that i get this error:

`Traceback (most recent call last):
  File "/home/me/miniconda3/envs/lab/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/me/miniconda3/envs/lab/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/me/cyrl/SLM-Lab/slm_lab/experiment/control.py", line 26, in mp_run_session
    session = Session(spec, global_nets)
  File "/home/me/cyrl/SLM-Lab/slm_lab/experiment/control.py", line 46, in __init__
    self.agent, self.env = make_agent_env(self.spec, global_nets)
  File "/home/me/cyrl/SLM-Lab/slm_lab/experiment/control.py", line 20, in make_agent_env
    agent = Agent(spec, body=body, global_nets=global_nets)
  File "/home/me/cyrl/SLM-Lab/slm_lab/agent/__init__.py", line 71, in __init__
    p.start()
  File "/home/me/miniconda3/envs/lab/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/me/miniconda3/envs/lab/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/me/miniconda3/envs/lab/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/me/miniconda3/envs/lab/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/me/miniconda3/envs/lab/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/me/miniconda3/envs/lab/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/home/me/miniconda3/envs/lab/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 231, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
RuntimeError: Attempted to send CUDA tensor received from another process; this is not currently supported. Consider cloning before sending.`

@kengz Could u please give me any help about that?

kengz commented 4 years ago

I wouldn't recommend using Python's native multiprocessing.Queue(), especially when you're using GPUs. If you wish to parallelize data, DataParallel or its distributed version is a better option: https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

lidongke commented 4 years ago

I see that DistributedDataParallel can use model parallel with multiprocess. In your SLM-Lab, i use 'global_nets' with sharing memory in multiprocess, i guess i can use DistributedDataParallel instead of sharing memory when i am using GPUs, am i right?

kengz commented 4 years ago

that's right, I think DataParallel suffices if you're not doing multi-node distributed training; but you'd have to write custom code to do that.

lidongke commented 4 years ago

But I see that DataParallel can only use with single process and multi GPU,am i right? https://pytorch.org/tutorials/intermediate/ddp_tutorial.html.

DataParallel is single-process, multi-thread, and only works on a single machine, while DistributedDataParallel is multi-process and works for both single- and multi- machine training

And if i want to use DataParallel or DistributedDataParallel for test, is it easily for me to change that?

kengz commented 4 years ago

ok then you'd need the distributed version. I'm not sure how easy is it for you to change, and at this rate it's out of the scope of what SLM Lab does.

kengz / SLM-Lab

use 'shared' with GPU and mupltiprocessing #450