mzho7212 / LICA

[NeurIPS 2020] PyTorch implementation of "Learning Implicit Credit Assignment for Cooperative Muti-Agent Reinforcement Learning"
https://arxiv.org/abs/2007.02529
MIT License
59 stars 15 forks source link

Gumble softmax implementation error #1

Closed hijkzzz closed 4 years ago

hijkzzz commented 4 years ago

The 'logist' refers to log prob, but LICA use the output of DNN as 'logist'. see https://github.com/shaabhishek/gumbel-softmax-pytorch/blob/master/Categorical%20VAE.ipynb

kenziyuliu commented 4 years ago

Hi there,

Thanks a lot for your comment! For this implementation, we kind of just followed along with most open-source implementations for consistency, such as

In practice, with NN weights properly initialized, there’s unlikely any material difference, though to be precise we should use log_softmax on the network outputs as you suggested.

[1] https://github.com/ericjang/gumbel-softmax/blob/master/Categorical%20VAE.ipynb, block 4 [2] https://github.com/ericjang/gumbel-softmax/blob/master/gumbel_softmax_vae_v2.ipynb, block 5 [3] https://github.com/openai/maddpg/blob/master/maddpg/trainer/maddpg.py#L45 [4] https://github.com/shariqiqbal2810/maddpg-pytorch/blob/master/algorithms/maddpg.py#L143 [5] https://github.com/hsvgbkhgbv/SQDDPG/blob/master/models/maddpg.py#L104