openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.81k stars 4.88k forks source link

Unify observation normalization code #698

Open pzhokhov opened 6 years ago

pzhokhov commented 6 years ago

Some algorithms use RunningMeanStd object and call update within algorithm (e.g. ddpg, trpo_mpi), others rely on VecNormalize env wrapper for observation normalization. Also, MPI support for VecNormalize needs to be added.

brendenpetersen commented 6 years ago

The MPI-parallelized algorithms (e.g. DDPG, TRPO_MPI) are the ones calling update within algorithm, and the subprocess-parallelized algorithms use VecNormalize? (I guess PPO2's normalization was problematic due to supporting both types of parallization #695.) Which is the preferred way to unify?

Possibly related: Are there plans for supporting MPI parallelization for other algorithms that currently only support subprocess-parallelization (e.g. ACKTR, ACER, A2C)? I was happy to see this added to PPO2, as I didn't realize an algorithm could support both. MPI is crucial when environments have a large variance in the cost of step, or, for example, variable-length episodic environments where reset is computationally expensive (in this case, each subprocess waiting for the one that got reset is killer).