The algorithms do not perform well in hard scenarios?

Bingoyww commented 4 years ago

I've already tested some hard scenarios, such as 5m_vs_6m or 3s_vs_5z. But the results don't seem to do as well as Pymarl. And I have find some inadequacies, such as 'args.target_update_cycle = 200' and 'train_step % self.args.target_update_cycle == 0' in VDN. This needs to be changed to 'args.target_update_cycle = 20000' or 'epoch % self.args.target_update_cycle == 0'.

starry-sky6688 commented 4 years ago

Did you change the default arguments? 'args.target_update_cycle = 200' means that the target net will update every 200 training times, not every 200 epoches(In one epoch, you can train the network for 'args.train_steps' times, the default 'args.train_steps' is 1).

I have uploaded the results of 5m_vs_6m and 3s_vs_5z to this repo, and they are better than Pymarl, you can find them in path './result',

If you are still confused, feel free to ask questions in this issue.

Bingoyww commented 4 years ago

Thanks for your reply. I do not change the default arguments. And I run "5m_vs_6m with VDN" again after receiving your reply. By the way, the experimental result of '5m_vs_6m' is missing in folder '/result/vdn'. When I find the result of '5m_vs_6m' in folder './result/qmix', I realized that I needed to change the number of 'args.n_epoch = 20000' to '50000'.

'args.target_update_cycle = 200' means that the target net will update every 200 training times. However, in Pymarl, the target network update every 200 episodes: 'target_update_interval: 200 which is set in vdn.yaml'. I checked the experimental output and found that the number of target_net update steps was about 10,000 in the previous period, and 20,000 in the later period.

The other question is about "--reuse_network, type=bool, default=True, help='whether to use one network for all agents'". When it is false , it also seem to use one network for all agents (I am not sure). The only difference is whether adding an ID of agent to the input. I mean it do not create several different networks for agents (These networks have the same structure but different names). In other words, can this operation of source code achieve the effect of different networks?

Thanks again for your reply.

starry-sky6688 commented 4 years ago

You can read the paper of SMAC in which they update the target net every 200 training episodes, not simple episodes.
'--reuse_network' should to set to True, because we just implement the network sharing version, which does not support independent networks.

starry-sky6688 / MARL-Algorithms

The algorithms do not perform well in hard scenarios? #35