mimoralea / gdrl

Grokking Deep Reinforcement Learning
https://www.manning.com/books/grokking-deep-reinforcement-learning
BSD 3-Clause "New" or "Revised" License
798 stars 231 forks source link

Why is the value loss multiplied by 0.5? #10

Closed ZachariahRosenberg closed 2 years ago

ZachariahRosenberg commented 3 years ago

In the VPG implementation, the value loss is calculated,

value_loss = value_error.pow(2).mul(0.5).mean()

Isn't the value loss simply the MSE, so just value_error.pow(2).mean()? Why the additional multiplication of 0.5?

Thank you!

ZachariahRosenberg commented 3 years ago

Is this so that the "2" is removed from the derivative i.e. f(x) = [(value_error) **2] / 2, so therefore f'(x) = value_error ?