Hey, is there a good way to share memory among different workers?

zhan0903 commented 5 years ago

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ray installed from (source or binary):
Ray version:
Python version:
Exact command to reproduce:

Describe the problem

Hello, I want to create some workers to generate experiences for reinforcement learning problems and add all these experiences into a share replay buff, is there a good way to implement it? I know ray example of Ape-X provide a similar solution, but that example is based on tensorflow which my code is based on pytorch, and I can not find out how the Ape-X example implement the replay memory. Thanks.

Source code / logs

ericl commented 5 years ago

Hi, the steps for pytorch would be:

implement a pytorch class that extends PolicyGraph: https://github.com/ray-project/ray/blob/master/python/ray/rllib/evaluation/policy_graph.py You just need to implement compute_actions() and compute_apply(batch).
Create an AsyncReplayOptimizer (same as used for apex): https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/async_replay_optimizer.py Use this make function: https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/policy_optimizer.py#L157

-or-

You can use the replay buffer class directly https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/replay_buffer.py (you'll have to parallelize this manually with Ray APIs).

zhan0903 commented 5 years ago

Hi, the steps for pytorch would be:

implement a pytorch class that extends PolicyGraph: https://github.com/ray-project/ray/blob/master/python/ray/rllib/evaluation/policy_graph.py You just need to implement compute_actions() and compute_apply(batch).

Create an AsyncReplayOptimizer (same as used for apex): https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/async_replay_optimizer.py Use this make function: https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/policy_optimizer.py#L157

-or-

You can use the replay buffer class directly https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/replay_buffer.py (you'll have to parallelize this manually with Ray APIs).

Thanks. I try to use Ray APIs to parallelize the simulation process, and I found an issue: when I create some actors to run the simulation to generate experiences, the policy network forward(net.forward) uses GPUs, and the simulation(env.step) use the CPUs. However, the workers only use GPUs(I set the worker @ray.remote(num_gpus=2)) for simulation, there is no parallel CPU processes happen. Is there a way to use multiple CPUs and GPUs both at the same time for simulation? Thanks.

ericl commented 5 years ago

I think you need to configure that in PyTorch. Ray doesn't control how many CPUs a worker uses, the num_cpus=N is just a hint to the scheduler to avoid over-parallelizing.

On Thu, Jan 24, 2019 at 3:39 PM zhan0903 notifications@github.com wrote:

Hi, the steps for pytorch would be:

implement a pytorch class that extends PolicyGraph: https://github.com/ray-project/ray/blob/master/python/ray/rllib/evaluation/policy_graph.py You just need to implement compute_actions() and compute_apply(batch).

Create an AsyncReplayOptimizer (same as used for apex): https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/async_replay_optimizer.py Use this make function:

https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/policy_optimizer.py#L157

-or-

You can use the replay buffer class directly https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/replay_buffer.py (you'll have to parallelize this manually with Ray APIs).

Thanks. I try to use Ray APIs to parallelize the simulation process, and I found an issue: when I create some actors to run the simulation to generate experiences, the policy network forward(net.forward) uses GPUs, and the simulation(env.step) use the CPUs. However, the workers only use GPUs(I set the worker @ray.remote(num_gpus=2)) for simulation, there is no parallel CPU processes happen. Is there a way to use multiple CPUs and GPUs both at the same time for simulation? Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/3828#issuecomment-457400106, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SlcpQnNhPQEicu-smfs9RtcFns6oks5vGkROgaJpZM4aN2hA .

zhan0903 commented 5 years ago

I think you need to configure that in PyTorch. Ray doesn't control how many CPUs a worker uses, the num_cpus=N is just a hint to the scheduler to avoid over-parallelizing. … On Thu, Jan 24, 2019 at 3:39 PM zhan0903 @.***> wrote: Hi, the steps for pytorch would be: 1. implement a pytorch class that extends PolicyGraph: https://github.com/ray-project/ray/blob/master/python/ray/rllib/evaluation/policy_graph.py You just need to implement compute_actions() and compute_apply(batch). 2. Create an AsyncReplayOptimizer (same as used for apex): https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/async_replay_optimizer.py Use this make function: https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/policy_optimizer.py#L157 -or- You can use the replay buffer class directly https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/replay_buffer.py (you'll have to parallelize this manually with Ray APIs). Thanks. I try to use Ray APIs to parallelize the simulation process, and I found an issue: when I create some actors to run the simulation to generate experiences, the policy network forward(net.forward) uses GPUs, and the simulation(env.step) use the CPUs. However, the workers only use GPUs(I set the worker @ray.remote(num_gpus=2)) for simulation, there is no parallel CPU processes happen. Is there a way to use multiple CPUs and GPUs both at the same time for simulation? Thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3828 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SlcpQnNhPQEicu-smfs9RtcFns6oks5vGkROgaJpZM4aN2hA .

Thanks. Another issue appears: the experiences generated by the workers are stored on GPUs, the ray seems can not return experiences stored on GPUs? Do I have to convert the experiences to CPU memory firstly? I prefer to use the experiences on GPUs for training. Thanks.

ericl commented 5 years ago

I think something like .cpu().numpy() will move the tensor into a ray compatible format.

On Thu, Jan 24, 2019, 4:49 PM zhan0903 notifications@github.com wrote:

I think you need to configure that in PyTorch. Ray doesn't control how many CPUs a worker uses, the num_cpus=N is just a hint to the scheduler to avoid over-parallelizing. … <#m1924305111331921381> On Thu, Jan 24, 2019 at 3:39 PM zhan0903 @.***> wrote: Hi, the steps for pytorch would be: 1. implement a pytorch class that extends PolicyGraph: https://github.com/ray-project/ray/blob/master/python/ray/rllib/evaluation/policy_graph.py You just need to implement compute_actions() and compute_apply(batch). 2. Create an AsyncReplayOptimizer (same as used for apex): https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/async_replay_optimizer.py Use this make function: https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/policy_optimizer.py#L157 -or- You can use the replay buffer class directly https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/replay_buffer.py (you'll have to parallelize this manually with Ray APIs). Thanks. I try to use Ray APIs to parallelize the simulation process, and I found an issue: when I create some actors to run the simulation to generate experiences, the policy network forward(net.forward) uses GPUs, and the simulation(env.step) use the CPUs. However, the workers only use GPUs(I set the worker @ray.remote(num_gpus=2)) for simulation, there is no parallel CPU processes happen. Is there a way to use multiple CPUs and GPUs both at the same time for simulation? Thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3828 (comment) https://github.com/ray-project/ray/issues/3828#issuecomment-457400106>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SlcpQnNhPQEicu-smfs9RtcFns6oks5vGkROgaJpZM4aN2hA .

Thanks. Another issue appears: the experiences generated by the workers are stored on GPUs, the ray seems can not return experiences stored on GPUs? Do I have to convert the experiences to CPU memory firstly? I prefer to use the experiences on GPUs for training. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/3828#issuecomment-457414439, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6Sk1ZrLIDW9PMTIYZorveUXwgkfeHks5vGlR9gaJpZM4aN2hA .

zhan0903 commented 5 years ago

Hi, the steps for pytorch would be:

implement a pytorch class that extends PolicyGraph: https://github.com/ray-project/ray/blob/master/python/ray/rllib/evaluation/policy_graph.py You just need to implement compute_actions() and compute_apply(batch).

Create an AsyncReplayOptimizer (same as used for apex): https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/async_replay_optimizer.py Use this make function: https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/policy_optimizer.py#L157

-or-

You can use the replay buffer class directly https://github.com/ray-project/ray/blob/master/python/ray/rllib/optimizers/replay_buffer.py (you'll have to parallelize this manually with Ray APIs).

Hi. Based on your suggestions, I am wondering which part of codes for generating the experiences and how to store these experiences into the shared global replay memory in the example codes? In AsyncReplayOptimizer, I just found the methods which use these replay memory. Thanks.

stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 3 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

ray-project / ray