stevenpjg / ddpg-aigym

Continuous control with deep reinforcement learning - Deep Deterministic Policy Gradient (DDPG) algorithm implemented in OpenAI Gym environments
MIT License
275 stars 74 forks source link

Question on Loss function of Critic Network training #7

Closed RuofanKong closed 8 years ago

RuofanKong commented 8 years ago

Hello,

I just read through your code on DDPG implementation, and it looks awesome :) Currently I have a question to consult you, and I wonder how's the curve of Q loss function looks like with respect to training time when you train Inverted Pendulum with DDPG. Actually, I also implemented the DDPG code by myself, and I noticed that Inverted Pendulum did learn something, but the Q loss was diverged, and I wonder if you have the same issue with your implementation.

Thank you so much!

stevenpjg commented 8 years ago

I have never tried to check the plot of Q loss function with time. My interpretation is, we usually expect Q loss function to decrease with time. This holds if we have a perfect supervisor to give us the target value (perhaps expected return). Since we are approximating this with only the Q value at next instant (and also approximating value function with NN), we cannot expect a steady pattern for Q loss function with time. It may fluctuate (sometimes diverges a bit and recover) but eventually the loss decreases.

This implementation does not diverge. Specifically, I found good improvement in terms of the convergence speed after using batch normalization.

From my experience, I used the the following checks to debug the divergence issue.

  1. Set the learning rate to zero (both actor and critic network) and check if it is still diverging. If it does, then there is a divide by zero. (Also check if you have not initialized weights with zeros)
  2. If it is not diverging, it might diverge due to exploding gradient, in that case try to clip the gradient.
  3. (or) Use grad inverter link to bound the parameter. Check the implementation here: https://github.com/stevenpjg/ddpg-aigym/blob/master/tensorflow_grad_inverter.py