When calculating the quantile huber loss in QR-DQN (here), the whole term torch.abs(taus[..., None] - (td_errors.detach() < 0).float()) * element_wise_huber_loss is divided by self.kappa.
I cannot find this equation in the paper. Is there any reason for this implementation?
When calculating the quantile huber loss in QR-DQN (here), the whole term
torch.abs(taus[..., None] - (td_errors.detach() < 0).float()) * element_wise_huber_loss
is divided byself.kappa
. I cannot find this equation in the paper. Is there any reason for this implementation?