oxwhirl / wqmix

Code for Weighted QMIX
124 stars 35 forks source link

Can you share me the train logs about Figure 2 ? We can not get the similar results as shown in this figure. #3

Open shuyuandeqipa opened 3 years ago

shuyuandeqipa commented 3 years ago

-----------------------------------------------

Methods pred_prey_punish [test_return_mean in the log files]

OW-QMIX (w=0.1) 36.8333

OW-QMIX (w=0.5) 36.5000

CW-QMIX (w=0.1) 37.5417 (37.6667)

CW-QMIX (w=0.5) 37.5417 (36.9583)

QTRAN 38.0833

QPLEX 36.1667

QMIX 33.6250

COMA 0.0000

VDN 36.7083 ( 35.7500)

-----------------------------------------------

VDN 37.0833 run 1

VDN 36.1250 run 2

VDN 36.4167 run 3

QMIX 38.0417 run 1

QMIX 30.2083 run 2

QMIX 36.0000 run 3

QPLEX 36.7083 run 1

QPLEX 30.3333 run 2

QPLEX 24.5417 run 3

MADDPG 0

MASAC 0

-----------------------------------------------

tabzraz commented 3 years ago

Are you annealing epsilon over 50k or over 1mil timesteps? For the results in Figure 2 of the paper, epsilon is annealed over 50k timesteps.

shuyuandeqipa commented 3 years ago

CUDA_VISIBLE_DEVICES=3 nohup python3 -u src/main.py --config=vdn_smac --env-config=pred_prey_punish with epsilon_anneal_time=1000000 use_tensorboard=True > ./wjx_logs_1211/vdn_smac_pred_prey_punish_tensorboard_V2.log 2>&1 &

Yes, epsilon_anneal_time=1000000 ! We can not get the similar results of vdn, qmix, and qplex in figure 2.

shuyuandeqipa commented 3 years ago

Is my parameter setting wrong?

tabzraz commented 3 years ago

Yeah, for Figure 2 in the paper set epsilon_anneal_time=50000 (or remove it altogether since 50k is the default).

It seems that setting it as 1mil helps the performance (https://openreview.net/forum?id=Rcmk0xxIQV Appendix K.2 show similar results to yours I think).

shuyuandeqipa commented 3 years ago

Thanks for your help!