hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.1k stars 727 forks source link

Parallel rollout implementation in HER+DDPG? #566

Open RyanRizzo96 opened 4 years ago

RyanRizzo96 commented 4 years ago

In ddpg.py, the parameter nb_rollout_steps is an integer containing the number of rollout steps. I believe that this is the same as the parameter T in OpenAI baselines which refers to "the time horizon for rollouts" as they put it.

My question is, where is the number of parallel rollouts per DDPG agents implemented in stable baselines? In OpenAI Baselines this value is passed when initializing DDPG as rollout_batch_size.

Any suggestions would be appreciated.

araffin commented 4 years ago

Hello,

It seems you are talking about the custom DDPG implementation of OpenAI they created for HER. To be honest, this one is quite confusing, has a lot of tricks, that's also why we rewrote HER completely.

the number of parallel rollouts per DDPG agents implemented in stable baselines?

If you explain me what it is then I can maybe give you the equivalent. I don't really get what number of parallel rollouts mean. Is it a number of episodes, is it a number of parallel agents?

Note that the DDPG implementation in stable-baselines is the one from the original baselines (but not the custom made for HER).

RyanRizzo96 commented 4 years ago

Hi,

Yes I am talking about the custom DDPG implementation. In Plappert et al. (2018), 38 trajectories were generated in parallel (19 MPI processes, each generating computing gradients from 2 trajectories and aggregating).

Their code comment states:

https://github.com/openai/baselines/blob/9ee399f5b20cd70ac0a871927a6cf043b478193f/baselines/her/ddpg.py#L50

I think that this refers to the set of trajectories simulated in parallel. Maybe the below image will help show what I mean.

image