marlbenchmark / on-policy

This is the official implementation of Multi-Agent PPO (MAPPO).
https://sites.google.com/view/mappo
MIT License
1.27k stars 292 forks source link

Questions on the episode length of 1000 on gfootball env instead of a maximum env limit of 400 #83

Open DeeDive opened 1 year ago

DeeDive commented 1 year ago

Dear authors,

Thank you for this work! Could you please address a question that confuses me? I notice that the gfootball env terminates at a maximum of 400 steps as stated in their paper. But I also notice that the training scripts of gfootball set an episode length of 1000. Can you explain your motivation on that? (football scripts e.g. see https://github.com/marlbenchmark/on-policy/blob/b21e0f743bd4516086825318452bb6927a33538d/onpolicy/scripts/train_football_scripts/train_football_ca_hard.sh#L14C16-L14C20)

Best!

DeeDive commented 1 year ago

I know that the vec env will automatically reset the env when it encounters the done=True flag. But I would appreciate it if you address my questions that

  1. how do we typically set this length value, and
  2. why do you set it here to more than two times the maximally allowed episode length?