Why is the value loss multiplied by 0.5?

mimoralea / gdrl

Grokking Deep Reinforcement Learning

BSD 3-Clause "New" or "Revised" License

798 stars 231 forks source link

Closed ZachariahRosenberg closed 2 years ago

ZachariahRosenberg commented 3 years ago

In the VPG implementation, the value loss is calculated,

value_loss = value_error.pow(2).mul(0.5).mean()

Isn't the value loss simply the MSE, so just value_error.pow(2).mean()? Why the additional multiplication of 0.5?

Thank you!

ZachariahRosenberg commented 3 years ago

Is this so that the "2" is removed from the derivative i.e. f(x) = [(value_error) **2] / 2, so therefore f'(x) = value_error ?