muupan / async-rl

Replicating "Asynchronous Methods for Deep Reinforcement Learning" (http://arxiv.org/abs/1602.01783)
MIT License
401 stars 83 forks source link

Trivial scaling question #11

Closed hholst80 closed 8 years ago

hholst80 commented 8 years ago

The loss function v_loss is accumulated like

v_loss += (v - R) ** 2 / 2

but then it is scaled with v_loss *= self.v_loss_coef where v_loss_coef is 0.5 by default.

Is there a reason why we're scaling it twice, termwise and also the final sum?

muupan commented 8 years ago

By (v - R) ** 2 / 2 I mean just dividing squared errors by 2, which I think is common, though I'm not sure if the authors also did so.

By v_loss_coef I mean a scaling factor for tuning the relative learning rate of v. One of the authors told me they multiplied the gradients of v by 0.5.

hholst80 commented 8 years ago

It seems a bit non-standard to scale both terms and sum like this. I'm trying without the 0.5 sum scaling and just dividing the terms as you do.

hholst80 commented 8 years ago

Ofc for a particular instance there could be reason to balance the two loss functions differently so the constants are good to have. I was just curious about their default values. Thanks for your clarification!