openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
https://arxiv.org/pdf/1706.02275.pdf
MIT License
1.66k stars 494 forks source link

Cannot reproduce experiment results #12

Closed arbaazkhan2 closed 6 years ago

arbaazkhan2 commented 6 years ago

Is this code vastly different from the code used to generate results for the paper? I cannot reproduce any of the results of the experiments simple_spread, simple_reference, simple_tag even after running for over 2 million iterations. The policy doesn't even look like its getting better. Any tips on this? Has somebody else got it to work?

Further, I don't see the ensemble policy part or the estimating other agents policies part in the (Section 4,2 and 4.3 in the paper) code. Am I missing something?

JohannesAck commented 6 years ago

Further, I don't see the ensemble policy part or the estimating other agents policies part in the (Section 4,2 and 4.3 in the paper) code. Am I missing something?

This was answered in #8, it isn't in this repo.

ryan-lowe commented 6 years ago

Hi! There was a bug in the code that prevented the sharing of reward in collaborative environments. This should be fixed now! Note that the results will be different from the paper since we refactored the code since publication, but the models should still train.

For the ensemble policies/ estimating other agent's policies, that code was created by Yi Wu. Please contact him if you'd like it to be open-sourced.

jxwuyi commented 5 years ago

For policy ensemble and approximation, I have put the code online for easy access: https://www.dropbox.com/s/jlc6dtxo580lpl2/maddpg_ensemble_and_approx_code.zip?dl=0