Closed hiroignis closed 4 years ago
Hi @hiroignis I also noticed that. I Would like to ask you something else as well. I have been training 4 adversaries and 1 good agent in the simple tag scenario, but while for the same steps the good agent recieves rewards like -10.0000, -9.7257, or -17.5386, -18.1665, -19.3938, -20.0000 the adversaries always recieve 0. I think that the result of -10 for the good agent is because of the collision with one of the adversaries, but for the same step in the environment no adversary recieves +10 for this collision, but as I mentioned it is always 0.
Is this normal?
Additionaly, the rewards of the good agent that are not a result of a collision (-10 and -20) are due to the penalty function for exiting the bounds of the environment?
Thank you in advance!
Hello,
I thank you OpenAI, amazing contributions, the papers and codes help my research work a lot. I wonder how come the rewards for adversaries in simple_tag.py are same among them while checking MADDPG working on multiagent-particle-envs. As far as I looked through, I guess it is a bug (not sure it is not supposed) at the reward function
adversary_reward()
in simple_tag.py (called fromreward()
<-_get_reward()
ofMultiAgentEnv
in environment.py <-step()
<- env.step()... for example). The bug is, each adversarial agent gets its own reward from other adversarial agents' reward, so every time their rewards are same, because do they share their reward together..? Please check the code below if need to fix.The mentioned function in simple_tag.py is:
then the last part:
should be:
With this fix, each agent, with MADDPG in simple_tag.py, gets each reward. Thanks.