Closed onlytailei closed 7 years ago
One is actually a hyperparameter that the authors failed to mention in the paper. Not exactly sure why it appears a second time though (I've opted to use it only once in my own code, on the loss itself), so I'd like to know too.
It's a typo. I will fix it.
value_loss = value_loss + 0.5 * advantage.pow(2)
this is huber loss
@dgriff777 It should be just a weighted MSE loss (as in the paper), not the Huber loss (which was used for the DQN, but not A3C). You can see the code can be written as L += 1/2 * (R - V)^2
:
advantage = R - values[i]
value_loss = value_loss + 0.5 * advantage.pow(2)
Yeah my mistake was testing both out. L2 it is 👍They the same thing most the time lol. Yeah paper does Mse but always good to mix it up and do better than the papers😉
In train.py Line 95 and Line 107 both have a 0.5 decay parameter for value loss. That's why? The original paper does not describe that.