openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.64k stars 4.86k forks source link

Is MPI being used in baselines DDPG? #168

Open hagrid67 opened 6 years ago

hagrid67 commented 6 years ago

I don't see any reference to mpiexec when searching in the repo. It it intended that we run with mpiexec to get a parallel version of DDPG?

eg I've tried this: mpiexec -n 4 python -m baselines.ddpg.main --env-id RoboschoolWalker2d-v1 --render (I don't have access to mujoco :( )

This runs 4 processes, but I'm not sure it's using episodes / transitions from the rank>0 processes in the rank 0 training... At least I can't see how it would in the code, and the transition / frame rate seems the same as for a single one. The only references I can see to MPI are functions to calculate mean, variance etc. eg I can't see any Send/Recv calls; there's some Bcast in MpiAdam; but is it taking anything from the other processes?

This begs the question - is MPI really necessary or of any benefit here? Is it used in some of the other agents, ie apart from DDPG? Might you be able to refer me to another agent that does make real use of MPI?

Perhaps it could be removed if it's not being used; though equally any advocacy for MPI would be informative.

Many thanks!

arpit15 commented 6 years ago

I think the way memory and agents are initialized, whenever transition is stored it goes to the same location. Therefore MPI is being used in all the relevant parts. However for me when I run the code with 4 workers, it freezes after certain number of epochs. I think this is because of the case when lets say worker 1 is trying to collect transition and all other works are trying to update the weights. I have tried to use 2 to 8 workers. The programs in all the case. Could some verify the issue or state if s/he is able to use the code with more than 1 worker thread for training DDPG?

xuhuazhe commented 6 years ago

Has anyone reproduced the results in the blog post? How many workers are you using?

kirk86 commented 6 years ago

For me is the case that I can't even use mpi with ddp. It always throws a tensorflow exhausted memory error. For some reason it's trying to allocate memory for the same objects multiple times. Has anyone successfully managed to run ddpg with mpi without errors. If so are you running only on cpu or gpu? In my case even if I explicitly specify things to run on cpu I still get an error regarding device not found. Anyone had the same experience?

ThGravo commented 6 years ago

@kirk86 Did you try tensorflow compiled for CPU?

kirk86 commented 6 years ago

@ThGravo yeah, some updates regarding that, I believe the issue was having tensorflow compiled for gpu and trying to run stuff on cpu throws errors, installing tf for cpu can overcome those errors.

joellutz commented 6 years ago

@arpit15

whenever transition is stored it goes to the same location

I can't see this happening in the DDPG code. Every MPI worker creates its own Memory, Actor & Critic object. When I run the algorithm with several workers and log the size of the memory each time worker 0 appends something in it, the size increases incrementally one by one. However, to my understanding the memory size should increase by the number of workers each time, as they allegedly write into the same memory. Or do I miss something?

So yeah, I can't really see a usage of MPI in the DDPG algorithm (only for summary statistics), like @hagrid67 mentioned. To me it seems that neither memory nor network parameters are shared between the workers. That would be the primary reason for me to use MPI in Reinforcement Learning, to have multiple agents exploring and interacting with the environment, all feeding their experience into one centralized "brain".

ThGravo commented 6 years ago

@hagrid67 @arpit15 DDPG (and others) use baselines.common.mpi_adam. This is where it all comes together.