The reward and action is nan ?

openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"

https://arxiv.org/pdf/1706.02275.pdf

MIT License

1.6k stars 484 forks source link

The reward and action is nan ? #11

Closed yexm-ze closed 6 years ago

yexm-ze commented 6 years ago

Hello, when I run your code, everything seems to be fine. But when I display the result after 60000 episodes, the agent flashes quickly and disappears quickly. I printed the reward and action, at a certain time, probably after 1500 episodes, they become nan, I didn't change anything and what to know why, maybe you can give me some advice, thank you !

yexm-ze commented 6 years ago

The error was solved by change all instances of relu to tanh，I don't know why, but it worked. https://github.com/agakshat/maddpg/issues/5

camigord commented 6 years ago

Hi, I am experiencing the same problem when training in the simple_spread scenario, but I do not feel like changing the activation functions is the way to go... Can we re-open this issue?

Thanks

yexm-ze commented 6 years ago

@camigord ，Maybe you can refer to this implementation：https://github.com/shariqiqbal2810/maddpg-pytorch. It adds a BN layer，and change the activation functions according to different situations.