Closed iacore closed 10 months ago
Hi, yes, this is intentional, since the reward for sample t actually occurs at t - 1 (an action happens, then a reward is received for that action on the next step).
Hopefully this makes sense!
Thanks for answering!
Another question: the code here looks like it should be sum /= max(1, count) * 255
;
That's just to avoid a divide by zero in rare cases, the two statements are equivalent since count is an integer.
when the count is zero they are not the same.
True, but in the 0 case no value is valid anyway - also, in those rare cases (only when the hierarchy is strangely configured), sum will also be zero, so it will be 0 / something.
Here,
t
is neverhistory_size
, andt2
starts att-1
. As a result,history_samples[history_size-1]
is never touched here. Is this intentional?https://github.com/ogmacorp/AOgmaNeo/blob/9e2e3c80c19eb46ef63cba75455ffd05ee2b2c1d/source/aogmaneo/actor.cpp#L411-L421