Open pzhokhov opened 6 years ago
The MPI-parallelized algorithms (e.g. DDPG, TRPO_MPI) are the ones calling update within algorithm, and the subprocess-parallelized algorithms use VecNormalize? (I guess PPO2's normalization was problematic due to supporting both types of parallization #695.) Which is the preferred way to unify?
Possibly related: Are there plans for supporting MPI parallelization for other algorithms that currently only support subprocess-parallelization (e.g. ACKTR, ACER, A2C)? I was happy to see this added to PPO2, as I didn't realize an algorithm could support both. MPI is crucial when environments have a large variance in the cost of step, or, for example, variable-length episodic environments where reset is computationally expensive (in this case, each subprocess waiting for the one that got reset is killer).
Some algorithms use RunningMeanStd object and call update within algorithm (e.g. ddpg, trpo_mpi), others rely on VecNormalize env wrapper for observation normalization. Also, MPI support for VecNormalize needs to be added.