[question] Issue with multiple instances for DDPG-MPI from stable-baselines[mpi]

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

http://stable-baselines.readthedocs.io/

MIT License

4.1k stars 727 forks source link

[question] Issue with multiple instances for DDPG-MPI from stable-baselines[mpi] #1044

Open UtkarshMishra04 opened 3 years ago

UtkarshMishra04 commented 3 years ago

Hello, I am pretty new to MPI. I am using stable-baselines DDPG for a custom environment. Everything is working fine and I am getting good results as well.

Question: When I use MPI and run the command: mpirun -n 4 python train_env.py It created 3 instances of DDPG with tensorboard log as DDPG_1, DDPG_2, DDPG_3. The progress is quite slow as well. Why doesn't it works on a single agent like in PPO2 with multiprocessing? Also, by the time PPO2 simulates 15k steps, DDPG-MPI does 5k for 3 separate agents. I guess they might also be saving the checkpoint weights individually and replacing others as in the checkpoint folder, for PPO2 with 20k steps, I see 5k, 10k, and 15k iteration weights but for DDPG-MPI, I see weights for 5k iterations only.

Am I missing something anywhere?

Miffyli commented 3 years ago

DDPG is an off-policy algorithm, which samples data from a buffer and updates the agent relatively frequently (every 1...100 steps), compared to PPO which is designed to be updated after hundreds of steps. This takes more compute to run and is reflected in slow training. I recommend reading on these algorithms on e.g. SpinningUp blog. Also check out stable-baselines3 for MPI-free implementations of same quality.

UtkarshMishra04 commented 3 years ago

Thanks for the stable-baselines3 info. But my question was not about why it's slow.

DDPG with MPI creates multiple instances running separately and saving weights individually unlike PPO2 with multiprocessing which runs a single instance and saves weights cumulatively.

Is there any inline code to run DDPG with MPI i.e. not explicitly running it with mpirun?

Miffyli commented 3 years ago

Is there any inline code to run DDPG with MPI i.e. not explicitly running it with mpirun?

Unfortunately no, the only implementation here is the MPI one (i.e. requires mpirun). The closest alternative is the implementation in stable-baselines3.

araffin commented 3 years ago

Well, you can always use SB2 DDPG without calling mpirun but then you will have to use only one environment.

And A2C/PPO are meant to be fast whereas DDPG was meant to be sample efficient (btw, please use TD3, which is the next iteration of DDPG).

UtkarshMishra04 commented 3 years ago

Thanks for all the answers and the suggestions. I will try TD3 but that comes without MPI I suppose. I am in dire need of multiprocessing and cannot use PPO (because of constraints).

But still, why do the DDPG MPI algorithm in this repo make multiple separate instances with mpirun? Are the nodes communicating with each other for the actor and critic updates? Even if they are communicating for the updates, shouldn't there be a single checkpoint save and not multiple ones?