google / dopamine

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
https://github.com/google/dopamine
Apache License 2.0
10.56k stars 1.38k forks source link

About the IQN loss #27

Closed boluoweifenda closed 6 years ago

boluoweifenda commented 6 years ago

Hi everyone, I am reading the great IQN paper and following the implementation, but I find that the definition of loss function is slightly different from described in IQN and the previous QR-DQN: https://github.com/google/dopamine/blob/master/dopamine/agents/implicit_quantile/implicit_quantile_agent.py#L348

Why is the final quantiled huber loss divided by kappa ?

quantile_huber_loss = ( tf.abs(replay_quantiles - tf.stop_gradient( tf.to_float(bellman_errors < 0))) * huber_loss) / self.kappa

Although kappa equals to 1.0, thus make no difference. I'm confused here, is it a typo?

psc-g commented 6 years ago

hi, no it's not a typo. if you look at the definition of $\rho^{\kappa}_{\tau}$ in the paper you can see that $L_{\kappa}(\delta_{ij}}$ (the huber loss) is divided by $\kappa$.