关于qtran_base.py中_get_individual_q的一个小问题

Johnson221b commented 8 months ago

前辈您好，您在qtran_base.py的_get_individual_q函数中首先进行了这样一个操作： if transitionidx == 0: , self.target_hidden = self.target_rnn(inputs, self.eval_hidden) 我想请教一下这样做的原因是什么，如果不这样做会导致什么错误？

另外，我在将您的代码与自己的工作结合的时候，遇到了一个报错：Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32, 9420]], which is output 0 of AsStridedBackward0, is at version 40; expected version 39 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). 您知道这大概是什么导致的问题吗？很抱歉打扰您，非常感谢您的阅读与解答！

starry-sky6688 commented 8 months ago

因为target+rnn也要输入历史观察信息，用target_hidden来记忆；但是RL这里的target每次计算的都是下一个obs对应的target，所以对于每个episode中的第一条数据，要把第一个obs输入到taregt_rnn里让它记下来，不然它记忆的就是从第二个obs开始的了；
这个问题得你自己调试了，貌似是你对张量进行了原地修改导致的，一般都是用一个新的变量来作为修改后的张量

Johnson221b commented 8 months ago

非常感谢！！！

starry-sky6688 / MARL-Algorithms

关于qtran_base.py中_get_individual_q的一个小问题 #108