关于Actor参数更新的问题

starry-sky6688 / MADDPG

Pytorch implementation of the MARL algorithm, MADDPG, which correspondings to the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments".

537 stars 83 forks source link

关于Actor参数更新的问题 #18

Closed Duke-Allen closed 2 years ago

Duke-Allen commented 2 years ago

我看到MADDPG中在更新actor是用的是

而Critic网络中计算只是把状态和动作拼接在一起：

可按照论文伪代码中写的好像是乘？

这块我还不是太理解，希望您能解答。感谢

starry-sky6688 commented 2 years ago

论文中的是Loss函数求导之后的梯度