starry-sky6688 / MARL-Algorithms

Implementations of IQL, QMIX, VDN, COMA, QTRAN, MAVEN, CommNet, DyMA-CL, and G2ANet on SMAC, the decentralised micromanagement scenario of StarCraft II
1.46k stars 283 forks source link

关于qtran_base.py中_get_individual_q的一个小问题 #108

Closed Johnson221b closed 8 months ago

Johnson221b commented 8 months ago

前辈您好,您在qtran_base.py的_get_individual_q函数中首先进行了这样一个操作: if transitionidx == 0: , self.target_hidden = self.target_rnn(inputs, self.eval_hidden) 我想请教一下这样做的原因是什么,如果不这样做会导致什么错误?

另外,我在将您的代码与自己的工作结合的时候,遇到了一个报错:Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 9420]], which is output 0 of AsStridedBackward0, is at version 40; expected version 39 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). 您知道这大概是什么导致的问题吗? 很抱歉打扰您,非常感谢您的阅读与解答!

starry-sky6688 commented 8 months ago
  1. 因为target+rnn也要输入历史观察信息,用target_hidden来记忆;但是RL这里的target每次计算的都是下一个obs对应的target,所以对于每个episode中的第一条数据,要把第一个obs输入到taregt_rnn里让它记下来,不然它记忆的就是从第二个obs开始的了;

  2. 这个问题得你自己调试了,貌似是你对张量进行了原地修改导致的,一般都是用一个新的变量来作为修改后的张量

Johnson221b commented 8 months ago

非常感谢!!!