What is the point of having DummyVecEnv if it is running sequentially?

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.14k stars 723 forks source link

What is the point of having DummyVecEnv if it is running sequentially? #1113

Closed jingxixu closed 3 years ago

jingxixu commented 3 years ago

I have a conceptual question here. If DummyVecEnv runs each env sequentially, what is the point of using it? Is the meaning of having DummyVecEnv only for debugging SubprocVecEnv?

araffin commented 3 years ago

Hello, you will find the answer in our multiprocessing tutorial (link is also in the doc): https://github.com/araffin/rl-tutorial-jnrr19#content

jingxixu commented 3 years ago

Hi, thanks for the reply.

My question is not about multi-processing vs non-multi-processing. My question is DummyVecEnv vs just having a single env collecting data. If DummyVecEnv is sequential, why not just create a single env and collect data? Is it true that having multiple envs even though running sequentially, will make the training stable?

araffin commented 3 years ago

Is it true that having multiple envs even though running sequentially, will make the training stable?

At the end, both are synchronous, so it does not change anything for the agent if you use a DummyVecEnv with 4 envs or a SubprocVecEnv with 4 envs. What may change is the fps (cf. notebook for a comparison).

jingxixu commented 3 years ago

At the end, both are synchronous, so it does not change anything for the agent if you use a DummyVecEnv with 4 envs or a SubprocVecEnv with 4 envs. What may change is the fps (cf. notebook for a comparison).

I am not comparing DummyVecEnv vs SubprocVecEnv, sorry for the confusion. I am comparing the DummyVecEnv with 4 envs vs a single env? If DummyVecEnv with 4 envs is run sequential, what are the advantages compared to the latter?

In other words, what is the advantage of having the batch of data (for example 2000 samples) form a single trajectory from a single env vs having the batch of data from multiple envs (10 envs and 200 samples each env)?

Miffyli commented 3 years ago

1) It makes the network interference part faster by parallelizing over multiple inputs when doing rollouts. The overhead to PyTorch calls can be significant. 2) Having decorrelated samples (i.e. from different instances of the environment) helps training, and generally the larger the number of environments, the stabler the training is. I do not have any specific source to share on this right now, OpenAI SpinningUp might have a word or two about it.

jingxixu commented 3 years ago

This makes a lot of sense and thanks so much.

araffin commented 3 years ago

that was in notebook i linked... : "Vectorized Environments are a method for stacking multiple independent environments into a single environment. Instead of training an RL agent on 1 environment per step, it allows us to train it on n environments per step. This provides two benefits:

Agent experience can be collected more quickly The experience will contain a more diverse range of states, it usually improves exploration"