The sample function in distribution is implementation of Gumbel-softmax, I added it to my code, now it helps to speed up stabilize the training, but my speaker still can not tell the different landmarks.

openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"

https://arxiv.org/pdf/1706.02275.pdf

MIT License

1.65k stars 494 forks source link

The sample function in distribution is implementation of Gumbel-softmax, I added it to my code, now it helps to speed up stabilize the training, but my speaker still can not tell the different landmarks. #48

Closed tanxiangtj closed 4 years ago

tanxiangtj commented 4 years ago

The sample function in distribution is implementation of Gumbel-softmax, I added it to my code, now it helps to speed up stabilize the training, but my speaker still can not tell the different landmarks.

How do you handle the action exploration then?

Originally posted by @djbitbyte in https://github.com/openai/maddpg/issues/9#issuecomment-373083611