Hi, thank you for your amazing contribution! I'm doing some research based on qmix which may include parallel and episode runner at the same training stage. But I got reward around 18 when test_battle_won_mean reached 85% for parallel_runner, while episode_runner only produced reward around 11 for the similar test_battle_won_mean on map MMM2. Can you tell me the crucial difference between the 2 runner that produce different reward? By the way, it seems that parallel_runner's 8*sample number performs worse when trained for 2 million steps on MMM2, could you please shed some light on this? Thanks a lot!
It seems that I used an older version of smac which may calculate the enemy health twice for max reward once the first attempt to init_unit failed. When upgraded to the new smac, the problem disappeared.
Hi, thank you for your amazing contribution! I'm doing some research based on qmix which may include parallel and episode runner at the same training stage. But I got reward around 18 when test_battle_won_mean reached 85% for parallel_runner, while episode_runner only produced reward around 11 for the similar test_battle_won_mean on map MMM2. Can you tell me the crucial difference between the 2 runner that produce different reward? By the way, it seems that parallel_runner's 8*sample number performs worse when trained for 2 million steps on MMM2, could you please shed some light on this? Thanks a lot!