Closed RuofanKong closed 8 years ago
I have never tried to check the plot of Q loss function with time. My interpretation is, we usually expect Q loss function to decrease with time. This holds if we have a perfect supervisor to give us the target value (perhaps expected return). Since we are approximating this with only the Q value at next instant (and also approximating value function with NN), we cannot expect a steady pattern for Q loss function with time. It may fluctuate (sometimes diverges a bit and recover) but eventually the loss decreases.
This implementation does not diverge. Specifically, I found good improvement in terms of the convergence speed after using batch normalization.
From my experience, I used the the following checks to debug the divergence issue.
Hello,
I just read through your code on DDPG implementation, and it looks awesome :) Currently I have a question to consult you, and I wonder how's the curve of Q loss function looks like with respect to training time when you train
Inverted Pendulum
with DDPG. Actually, I also implemented the DDPG code by myself, and I noticed thatInverted Pendulum
did learn something, but the Q loss was diverged, and I wonder if you have the same issue with your implementation.Thank you so much!