Parallel_runner and episode_runner show obvious reward difference for the same test_battle_won_mean

oxwhirl / pymarl

Python Multi-Agent Reinforcement Learning framework

Apache License 2.0

1.89k stars 386 forks source link

Hi, thank you for your amazing contribution! I'm doing some research based on qmix which may include parallel and episode runner at the same training stage. But I got reward around 18 when test_battle_won_mean reached 85% for parallel_runner, while episode_runner only produced reward around 11 for the similar test_battle_won_mean on map MMM2. Can you tell me the crucial difference between the 2 runner that produce different reward? By the way, it seems that parallel_runner's 8*sample number performs worse when trained for 2 million steps on MMM2, could you please shed some light on this? Thanks a lot!

oxwhirl / pymarl

Parallel_runner and episode_runner show obvious reward difference for the same test_battle_won_mean #88