[question] running multiple environments in parallel without waiting for each other

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

MIT License

4.16k stars 725 forks source link

I noticed that for the vectorized environments, all of environment will take a step and will wait for all the steps to finish before starting to take another step. The issue with my customized environment is that step involving a reset() takes much longer than a usual step. In this case, all the other environments will have to wait for the one in reset() to finish, which makes it very inefficient.

I wonder if there is a way to handle issue like this. Ideally, we would like the agent to be able to interact with each environment independently through alternate calls of model.step() and env.step(), while keeping a counter of the number of steps that has been take in each individual environment. It will stop collecting experience once the total sum of number of steps reaches certain fixed threshold.

hill-a / stable-baselines

[question] running multiple environments in parallel without waiting for each other #866