ufal / npfl139

Materials for Deep Reinforcement Learning – ÚFAL course NPFL139
https://ufal.mff.cuni.cz/courses/npfl139
Creative Commons Attribution Share Alike 4.0 International
8 stars 10 forks source link

Incorrect quantile regression loss #17

Closed Reblexis closed 1 week ago

Reblexis commented 1 week ago

Hi,

I think there is a mistake in the quantile regression loss definition at slide 29 (lecture 5). The indicator should be the other way around. $L(\hat{x}) = \mathbb{E}_{x \sim P} \left[ (\hat{x}-x) \left( [{x <= \hat{x}}] - \tau \right) \right]$ image

I'm wondering if the proof didn't find $\tau$ at which the incorrect formula is maximized.

Reblexis commented 1 week ago

I think there is a mistake in the first step of the proof (where it's evaluated into the right form).

foxik commented 1 week ago

Yes, you are perfectly right -- with the previous formulation, the $\tau$ behaved in fact as $1-\tau$.

Using $\hat x - x$ seems more natural to me (the derivative of this term with respect to $\hat x$ is 1), but the QR-DQN and IQN papers consistently use the order $x - \hat x$ (it is also in the algorithms which are copy-pasted in the slides), so I changed the formulation to use the $x - \hat x$ order; I also updated the order on the previous slides with the MSE and MAE errors for consistency. Also, the definition of quantile Huber loss has been fixed accordingly.

Sorry for complications; you get 3 community work points for finding it and creating this issue (one gets 1 point for a typo, which is incomparable to this kind of error).