Closed TDDFT closed 5 years ago
@HongleiZhuang, assign this to you. I remember that you've seen something similar.
Hi Degao,
There are multiple factors that could lead to this behavior, e.g. learning rate, optimizer. Empirically, when Adam
is used as optimizer with approx_ndcg_loss
, it is likely to observe unstable behaviors. If this is the case, you may try to switch to AdaGrad
optimizer. Also, using batch normalization would make the training more stable in many cases. Can you provide more details about what parameter configurations you are using so we may dig deeper?
Thanks!
@HongleiZhuang Hi, I was using adagrad and approx_ndcg_loss. I added batch normalization for each layer and it is more stable now.
Thanks for reporting! Closing this for now.
I am training using
approx_ndcg_loss
as the loss. I see very weird results from tensorboard. I see the gradient decreases in the first few steps and then steadily increases. On the other hand, the loss function for train and eval decreases. Is this expected forapprox_ndcg_loss
loss function?