Open bbrighttaer opened 3 months ago
Hi, thanks and I am sorry for the late response! That might just be related with the random seeds since that is a quite challenging scenario just by itself, and the structural configurations of that algorithm make it even more difficult to solve. Increasing the value of the penalty for non-cooperative behaviours of that specific PredatorPrey
environment will make the task easier (now it is -0.75
). Increasing further --rnn_hidden_dim
might also help.
Thanks for the feedback and tips. I will try that out.
Hi, Thank you for your paper and also for sharing your project publicly. I have been able to reproduce the results of
PS+IQL
,NPS+IQL
, andPS+IQL+COMM
(using your shared codes) but have not been successful yet withNPS+IQL+COMM
after several trials. I am hoping to reproduce the results of the128_hidden
setting. The running arguments are:python no_param_share/main.py --env PredatorPrey --n_steps 2000000 --alg idql --with_comm True --rnn_hidden_dim 128
. The highest eval reward is mostly-4
. What am I missing here? All other properties are the defaults that come with the project. Thanks once again.