关于agent.train的问题

starry-sky6688 / MARL-Algorithms

Implementations of IQL, QMIX, VDN, COMA, QTRAN, MAVEN, CommNet, DyMA-CL, and G2ANet on SMAC, the decentralised micromanagement scenario of StarCraft II

1.46k stars 283 forks source link

关于agent.train的问题 #42

Closed Sobbbbbber closed 3 years ago

Sobbbbbber commented 3 years ago

您好，代码中您使用RNN作为函数近似网络 off-policy的算法，训练时进行sample 这时sample的是以episode为单位进行sample的训练时，考虑到hidden state 输入的问题，所以就从头遍历<o,a,r,o'> 所以采样得到的episode中，每一步的transition都作为训练数据

请问上述的理解对吗？

starry-sky6688 commented 3 years ago

对的，因为RNN的原因，所以需要每个episode都从头开始，每条transition都用来计算loss