openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.62k stars 4.86k forks source link

PPO2 globalseeds #822

Closed ecada closed 5 years ago

ecada commented 5 years ago

Hello, I've been experimenting with the PPO2 algorithm using DummyVecEnv where the number of environments is set to 16. I've realized that in the learn method set_global_seeds use a single seed for every 16 environments.

https://github.com/openai/baselines/blob/5b41c926c7a852df3f0928afdf2429f96a3965cb/baselines/ppo2/ppo2.py#L80

Doesn't that lead to sampling the same actions for each environment when using 1 process and hinder the exploration in parallel sampling?

Thank you so much for this wonderful library as always, it is the most essential repository for deep reinforcement learning research.

scotthuang1989 commented 5 years ago

sampling action is not done in environment.

ecada commented 5 years ago

@scotthuang1989 Sorry, I've put it wrong. What I meant was carrying out the sampled action (where the sampling was done using the same seed that was set in set_global_seeds(seed)) in each environment. In contrast, in PPO1 the seed used in "set_global_seeds(seed)" line is different for every process.

scotthuang1989 commented 5 years ago

if you are talking about the stochastics whihin a enviroment. there are a seed function in every gym enviroment.