rafaelmp2 / marl-indep-comm

2 stars 3 forks source link

On Reproducing results of NPS+IQL+COMM #1

Open bbrighttaer opened 3 months ago

bbrighttaer commented 3 months ago

Hi, Thank you for your paper and also for sharing your project publicly. I have been able to reproduce the results of PS+IQL, NPS+IQL, and PS+IQL+COMM (using your shared codes) but have not been successful yet with NPS+IQL+COMM after several trials. I am hoping to reproduce the results of the 128_hidden setting. The running arguments are: python no_param_share/main.py --env PredatorPrey --n_steps 2000000 --alg idql --with_comm True --rnn_hidden_dim 128. The highest eval reward is mostly -4. What am I missing here? All other properties are the defaults that come with the project. Thanks once again.

rafaelmp2 commented 1 month ago

Hi, thanks and I am sorry for the late response! That might just be related with the random seeds since that is a quite challenging scenario just by itself, and the structural configurations of that algorithm make it even more difficult to solve. Increasing the value of the penalty for non-cooperative behaviours of that specific PredatorPrey environment will make the task easier (now it is -0.75). Increasing further --rnn_hidden_dim might also help.

bbrighttaer commented 1 month ago

Thanks for the feedback and tips. I will try that out.