Qmix code seems cannot learn good policy

oxwhirl / smac

SMAC: The StarCraft Multi-Agent Challenge

MIT License

1.08k stars 228 forks source link

Qmix code seems cannot learn good policy #42

Closed GoingMyWay closed 4 years ago

GoingMyWay commented 4 years ago

I run the example code of qmix and it seems that it cannot learn good policy. The reward is nearly random policy.

samvelyan commented 4 years ago

What QMIX code are you running?

GoingMyWay commented 4 years ago

What QMIX code are you running?

Thanks for fast response, I use the code: https://github.com/oxwhirl/smac/blob/master/smac/examples/rllib/run_qmix.py

samvelyan commented 4 years ago

Have you tried using oxwhirl/pymarl? That's the codebase the results of our paper are based on. The RLlib examples here are provided by our friends from Berkeley who integrated QMIX with their RLlib library. Therefore, there are likely to be differences in results.

GoingMyWay commented 4 years ago

Have you tried using oxwhirl/pymarl? That's the codebase the results of our paper are based on. The RLlib examples here are provided by our friends from Berkeley who integrated QMIX with their RLlib library. Therefore, there are likely to be differences in results.

Wow, looks good, I did not try it. I will try it. Are there any issues in QMIX from RLlib? @richardliaw

samvelyan commented 4 years ago

There are some notable differences. For example, the RLlib qmix code doesn't use the global state that smac provides and instead uses only per-agent observations.

GoingMyWay commented 4 years ago

There are some notable differences. For example, the RLlib qmix code doesn't use the global state that smac provides and instead uses only per-agent observations.

Thanks, that's a big difference. So, I think that is the reason why QMIX from Rllib fails to learn a good policy.

samvelyan commented 4 years ago

No problem. Let us know should you face any issues.