qian18long / epciclr2020

119 stars 27 forks source link

Can not recover the group evading behavior on the Adversary battle #7

Open KuoZhong opened 4 years ago

KuoZhong commented 4 years ago

In my experiment on the adversary battle, I find that the agents demonstrate great interest in the food instead of the evading when agents would take a great adventure to cross through an ensemble of enemies for foods. This is a short-sight behavior and might be caused by a short episode length. Hence, I check the number of frames of the gif file. Your given gif has 52 frames, and it is not consistent with the default value(25).

what is the value of the max-episode-len for the adversary battle game? What specific configuration do you use?

KuoZhong commented 4 years ago

Moreover, since the role of the adversary battle game is symmetric, Why do you use different models for each role?

footoredo commented 4 years ago

The max-episode-len for adversary battle game is 25. The provided scripts use exact configurations that we used for our experiments.

For your second question, are you referring to parameter sharing? We opt to not use parameter sharing as we want our algorithm to fit with more general games. We leave parameter sharing as a future work direction.

KuoZhong commented 4 years ago

As for the second question, the policy network uses three and two fully connected layers for the good and adversary players respectively in the adversary game. What is the reason for different architectures for symmetric roles?

Furthermore, one architecture detail also confuses me. It is that your Q network uses a very sophisticated module between the self-attention module and the observation-action encoder, which can be referred to the red dash square area. I guess this is something like channel attention. Hence, can you tell me the reason for this module? image