Closed ecada closed 5 years ago
sampling action is not done in environment.
@scotthuang1989 Sorry, I've put it wrong. What I meant was carrying out the sampled action (where the sampling was done using the same seed that was set in set_global_seeds(seed)) in each environment. In contrast, in PPO1 the seed used in "set_global_seeds(seed)" line is different for every process.
if you are talking about the stochastics whihin a enviroment. there are a seed function in every gym enviroment.
Hello, I've been experimenting with the PPO2 algorithm using DummyVecEnv where the number of environments is set to 16. I've realized that in the learn method set_global_seeds use a single seed for every 16 environments.
https://github.com/openai/baselines/blob/5b41c926c7a852df3f0928afdf2429f96a3965cb/baselines/ppo2/ppo2.py#L80
Doesn't that lead to sampling the same actions for each environment when using 1 process and hinder the exploration in parallel sampling?
Thank you so much for this wonderful library as always, it is the most essential repository for deep reinforcement learning research.