Hi,
Thank you for releasing the code. I have some questions about the 'done' situation in the cooperative navigation environment. I don't see any done function for the env. I just see the maximum time step for one episode for the terminal condition.
1- Is it the only situation that the env will be done and we need to reset the world?
2- How about when agents cover the landmarks? do they try to continue to cover the landmarks until the max time step is reached?
3- what is the max steps for the results you reported in table 2 in the paper for cooperative navigation env? Do you calculate the number of touches and the mean distance to landmarks in these number of time steps?
Hi, Thank you for releasing the code. I have some questions about the 'done' situation in the cooperative navigation environment. I don't see any done function for the env. I just see the maximum time step for one episode for the terminal condition. 1- Is it the only situation that the env will be done and we need to reset the world? 2- How about when agents cover the landmarks? do they try to continue to cover the landmarks until the max time step is reached? 3- what is the max steps for the results you reported in table 2 in the paper for cooperative navigation env? Do you calculate the number of touches and the mean distance to landmarks in these number of time steps?
Thank you in advance