ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 279 forks source link

question about the hyper-parameters #27

Closed onlytailei closed 7 years ago

onlytailei commented 7 years ago

In train.py Line 95 and Line 107 both have a 0.5 decay parameter for value loss. That's why? The original paper does not describe that.

Kaixhin commented 7 years ago

One is actually a hyperparameter that the authors failed to mention in the paper. Not exactly sure why it appears a second time though (I've opted to use it only once in my own code, on the loss itself), so I'd like to know too.

ikostrikov commented 7 years ago

It's a typo. I will fix it.

dgriff777 commented 7 years ago

value_loss = value_loss + 0.5 * advantage.pow(2)

this is huber loss

Kaixhin commented 7 years ago

@dgriff777 It should be just a weighted MSE loss (as in the paper), not the Huber loss (which was used for the DQN, but not A3C). You can see the code can be written as L += 1/2 * (R - V)^2:

advantage = R - values[i]
value_loss = value_loss + 0.5 * advantage.pow(2)
dgriff777 commented 7 years ago

Yeah my mistake was testing both out. L2 it is 👍They the same thing most the time lol. Yeah paper does Mse but always good to mix it up and do better than the papers😉