marload / DistRL-TensorFlow2

🐳 Implementation of various Distributional Reinforcement Learning Algorithms using TensorFlow2.
Apache License 2.0
67 stars 9 forks source link

Bug in the Quantile Huber loss? #1

Open mcshel opened 3 years ago

mcshel commented 3 years ago

Hi,

First of all, thanks for publicly sharing your implementations of the reinforcement learning algorithms. I find your repos very useful!

As I was playing around with the QR-DQN, I think I noticed a bug in your implementation of the Quantile Huber loss function. The code seems to run fine if batch_size == atoms. However, if you change one of the two, you get an error due to the incompatible tensor shapes in line 75 of QR-DQN.py:

loss = tf.where(tf.less(error_loss, 0.0), inv_tau * huber_loss, tau * huber_loss)

I think the error is related to the fact that TF2 implementation of the Huber loss reduces the dimension of the output by 1 with respect to the inputs (docu), even when setting reduction=tf.keras.losses.Reduction.NONE. This is different from the behavior in TF1, where the output dimension matches the one of the input (docu). Therefore, if I am not mistaken, one could fix this by changing the self.huber_loss to tf.compat.v1.losses.huber_loss? I am having a bit of a hard time working out the exact dimensions upon which different operations act, so I would be happy to hear from your side if my theory is correct :P

yubobao27 commented 1 year ago

Could you post the corrected solution here? When training IQN, loss doesn't seem to converge.