marlbenchmark / off-policy

PyTorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.
MIT License
386 stars 67 forks source link

fix(wzl): fix vdn mixer to avoid Q value dim mismatch with reward #13

Open zerlinwang opened 1 year ago

zerlinwang commented 1 year ago

command

./train_mpe_vdn.sh

problem

Traceback (most recent call last): File "train/train_mpe.py", line 192, in main(sys.argv[1:]) File "train/train_mpe.py", line 177, in main total_num_steps = runner.run() File "/home/zerlinwang/Projects/off-policy/offpolicy/runner/rnn/base_runner.py", line 190, in run self.train() File "/home/zerlinwang/Projects/off-policy/offpolicy/runner/rnn/base_runner.py", line 272, in batch_train_q train_info, new_priorities, idxes = self.trainer.train_policy_on_batch(sample) File "/home/zerlinwang/Projects/off-policy/offpolicy/algorithms/qmix/qmix.py", line 164, in train_policy_on_batch Q_tot_target_seq = rewards + (1 - dones_env_batch) self.args.gamma next_step_Q_tot_seq RuntimeError: The size of tensor a (32) must match the size of tensor b (800) at non-singleton dimension 1

reason

next_step_Q_tot_seq dim error

solution

return agent_q_inps.sum(dim=-1).view(-1, 1, 1) -> batch_size = agent_q_inps.size(1) return agent_q_inps.sum(dim=-1).view(-1, batch_size, 1, 1)

result

图片