oxwhirl / smac

SMAC: The StarCraft Multi-Agent Challenge
MIT License
1.04k stars 227 forks source link

Simplify dense reward calculation if reward_only_positive is true #54

Closed douglasrizzo closed 3 years ago

douglasrizzo commented 3 years ago

In this part of the dense reward calculation method, the damage and deaths of ally units are accumulated into variables delta_ally and delta_deathsand used to compose the reward later. Notice how dealth_deaths is only changed if self.reward_only_positive is false:

https://github.com/oxwhirl/smac/blob/a185b7082dc5a12debdec8a344cf5177a7f67fff/smac/env/starcraft2/starcraft2.py#L684-L701

When the reward is calculated using the previous accumulated values, delta_ally is only used if self.reward_only_positive is false. The version of delta_deaths that is altered in the ally loop above is also only used if self.reward_only_positive is false.

https://github.com/oxwhirl/smac/blob/a185b7082dc5a12debdec8a344cf5177a7f67fff/smac/env/starcraft2/starcraft2.py#L716-L719

This makes me conclude that we only need to process ally units in this method if self.reward_only_positive is false, otherwise we can ignore the first loop. I don't know how much this would affect performance (this is a method that runs on every game step, after all) but I could come up with this simplified version. I'd just like others to validate if what I said is true.

douglasrizzo commented 3 years ago

Actually, I did a quick test and got different results after changing the method, so my assumptions were wrong.