Are cooperative tasks hard to train with QMIX?

semitable / lb-foraging

Level-based Foraging (LBF): A multi-agent environment for RL

MIT License

151 stars 64 forks source link

Are cooperative tasks hard to train with QMIX? #13

Closed GoingMyWay closed 2 years ago

GoingMyWay commented 2 years ago

Dear authors, in cooperative tasks (-coop), it seems it is hard to train converged policies with QMIX (the episode rewards are nearly zero). I used the default setting provided by PyMARL and used RLlib to train LBF with QMIX. I found in your paper, you trained Foraging-2s-8x8-2p-2f-coop-v2 and Foraging-8x8-2p-2f-coop-v2 with QMIX and the performance were converged. It would be great if you can provide some suggestions.

semitable commented 2 years ago

Dear @GoingMyWay, have you tried our fork of PyMARL (E-PyMARL)? It can be found here: https://github.com/uoe-agents/epymarl with instructions on how to run level based foraging. On the last page of the paper (p33)(https://arxiv.org/pdf/2006.07869.pdf), you will also find the exact hyperparameters we used to generate the results on the paper.

GoingMyWay commented 2 years ago

@semitable Dear author, thank you. I will try your code. I tried to use RLlib, which seems hard to train.