starry-sky6688 / MADDPG

Pytorch implementation of the MARL algorithm, MADDPG, which correspondings to the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments".
537 stars 83 forks source link

关于Actor参数更新的问题 #18

Closed Duke-Allen closed 2 years ago

Duke-Allen commented 2 years ago

我看到MADDPG中在更新actor是用的是

微信截图_20220328214211

而Critic网络中计算只是把状态和动作拼接在一起:

微信截图_20220328214223

可按照论文伪代码中写的好像是乘?

微信截图_20220328214655

这块我还不是太理解,希望您能解答。感谢

starry-sky6688 commented 2 years ago

论文中的是Loss函数求导之后的梯度