hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

[question] Using PPO2 on multiple cluster nodes (MPI) #1054

Closed piotti closed 3 years ago

piotti commented 3 years ago

I'm looking to scale training of my PPO2 algorithms to multiple nodes of an HPC cluster. Currently, I'm constrained to the CPUs of a single node.

It looks like MPI would allow me to distribute the training to mutiple nodes, but the documentation says PPO2 doesn't support MPI.

However, this thread makes it sound like OpenAI Baselines PPO2 now supports MPI:

All that being said - ppo1 is now obsolete, and its functionality (including mpi) is fully covered by ppo2.

My questions are:

  1. Is there any plan to have Stable Baselines PPO2 support MPI?
  2. If not, is there an alternative for training PPO2 across multiple nodes?
Miffyli commented 3 years ago

TL;DR: No plans to support and no easy way to run between multiple nodes.

1) There are no plans to support MPI in new algorithms, and in fact have been completely dropped (at least for now) in the next iteration of stable-baselines. It is not a high priority due to complexity and relatively small amount of use uses.

2) Closest available solution is "SubProcVecEnv" that parallelize environment sample gathering to different processes but not between nodes. Such VecEnv could be written for multiple-node runs where environments run on different nodes, but the environment has to be very slow to get any benefit from it.

Considering the work required to get this working and possibly small gains, I recommend using the multiple nodes to run parallel runs of the same experiment with random seeds instead, as this is a crucial part of reliable RL results.

araffin commented 3 years ago

I would also add that you can always use PPO1 if you want to use mpi ;)

piotti commented 3 years ago

Understood, thanks!