In this part of the dense reward calculation method, the damage and deaths of ally units are accumulated into variables delta_ally and delta_deathsand used to compose the reward later. Notice how dealth_deaths is only changed if self.reward_only_positive is false:
When the reward is calculated using the previous accumulated values, delta_ally is only used if self.reward_only_positive is false. The version of delta_deaths that is altered in the ally loop above is also only used if self.reward_only_positive is false.
This makes me conclude that we only need to process ally units in this method if self.reward_only_positive is false, otherwise we can ignore the first loop. I don't know how much this would affect performance (this is a method that runs on every game step, after all) but I could come up with this simplified version. I'd just like others to validate if what I said is true.
In this part of the dense reward calculation method, the damage and deaths of ally units are accumulated into variables
delta_ally
anddelta_deaths
and used to compose the reward later. Notice howdealth_deaths
is only changed ifself.reward_only_positive
is false:https://github.com/oxwhirl/smac/blob/a185b7082dc5a12debdec8a344cf5177a7f67fff/smac/env/starcraft2/starcraft2.py#L684-L701
When the reward is calculated using the previous accumulated values,
delta_ally
is only used ifself.reward_only_positive
is false. The version ofdelta_deaths
that is altered in the ally loop above is also only used ifself.reward_only_positive
is false.https://github.com/oxwhirl/smac/blob/a185b7082dc5a12debdec8a344cf5177a7f67fff/smac/env/starcraft2/starcraft2.py#L716-L719
This makes me conclude that we only need to process ally units in this method if
self.reward_only_positive
is false, otherwise we can ignore the first loop. I don't know how much this would affect performance (this is a method that runs on every game step, after all) but I could come up with this simplified version. I'd just like others to validate if what I said is true.