Closed arbaazkhan2 closed 6 years ago
Further, I don't see the ensemble policy part or the estimating other agents policies part in the (Section 4,2 and 4.3 in the paper) code. Am I missing something?
This was answered in #8, it isn't in this repo.
Hi! There was a bug in the code that prevented the sharing of reward in collaborative environments. This should be fixed now! Note that the results will be different from the paper since we refactored the code since publication, but the models should still train.
For the ensemble policies/ estimating other agent's policies, that code was created by Yi Wu. Please contact him if you'd like it to be open-sourced.
For policy ensemble and approximation, I have put the code online for easy access: https://www.dropbox.com/s/jlc6dtxo580lpl2/maddpg_ensemble_and_approx_code.zip?dl=0
Is this code vastly different from the code used to generate results for the paper? I cannot reproduce any of the results of the experiments simple_spread, simple_reference, simple_tag even after running for over 2 million iterations. The policy doesn't even look like its getting better. Any tips on this? Has somebody else got it to work?
Further, I don't see the ensemble policy part or the estimating other agents policies part in the (Section 4,2 and 4.3 in the paper) code. Am I missing something?