hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

[question] running multiple environments in parallel without waiting for each other #866

Closed denyHell closed 4 years ago

denyHell commented 4 years ago

I noticed that for the vectorized environments, all of environment will take a step and will wait for all the steps to finish before starting to take another step. The issue with my customized environment is that step involving a reset() takes much longer than a usual step. In this case, all the other environments will have to wait for the one in reset() to finish, which makes it very inefficient.

I wonder if there is a way to handle issue like this. Ideally, we would like the agent to be able to interact with each environment independently through alternate calls of model.step() and env.step(), while keeping a counter of the number of steps that has been take in each individual environment. It will stop collecting experience once the total sum of number of steps reaches certain fixed threshold.

Miffyli commented 4 years ago

Overall this topic is covered and discussed in papers like original A3C (extra "A" for "Asynchronous") and IMPALA (a follow-up/improvement). These come with their own difficulties, as discussed in the papers, which is why they are not included in stable-baselines.

The suggested idea could improve performance, but hard to say by how much (heavily depends on the environment). It would require significant modifications to the core algorithms, and would be easier to do with PyTorch in stable-baselines3.