Closed jankyee closed 4 years ago
I find that grad_l2_sum
may explode after several steps
so, adding 1e-8 can't help. But why exploding...
Have you modified the training code or the config of the project?
Have you modified the training code or the config of the project?
I just modified the dataset and pretrain path.
Now, I try using adv-loss-type: hinge
, and have trained 1300 steps, until now everything seems ok...
Ok, the hinge loss can also get similar performance.
Ok, the hinge loss can also get similar performance.
emmm...the reproduced result seems worse. I sent you my training log, would you kindly help reviewing my settings? Or share your logs?
I have already sent the training log to you through email. What data are you using? Have you obtained a good teacher?
@Jankyee Hi, I got the same error! Did you resolve this problem? If so, could you help me? Thanks.
can you sent your training log to me? I also got worse result
HI, I try to reproduce the training process, but get NaN D_loss after several training steps. I think that is because
torch.sqrt
inCriterionAdditionalGP
rerurns a NaN. I refer to https://github.com/pytorch/pytorch/issues/2534 and add 1e-8 to the grad_norm,but can't help. Any suggestions?