openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
https://arxiv.org/pdf/1706.02275.pdf
MIT License
1.59k stars 484 forks source link

Episode in cooperative navigation env #47

Open kargarisaac opened 4 years ago

kargarisaac commented 4 years ago

Hi, Thank you for releasing the code. I have some questions about the 'done' situation in the cooperative navigation environment. I don't see any done function for the env. I just see the maximum time step for one episode for the terminal condition. 1- Is it the only situation that the env will be done and we need to reset the world? 2- How about when agents cover the landmarks? do they try to continue to cover the landmarks until the max time step is reached? 3- what is the max steps for the results you reported in table 2 in the paper for cooperative navigation env? Do you calculate the number of touches and the mean distance to landmarks in these number of time steps?

Thank you in advance