Hi, I meet a problem during training:D_loss becomes NAN
I have tried several times and the problem always appears at around 2600 steps.
I have loaded the teacher ckpt and I only changed data-dir and batch-size(from 8 to 6 since limited by memory) in "run_train_val.sh" .
issue25 seems to meet a similar problem. But I can't find branch d1ec858 mentioned in that issue.
Could you please give me some advice?
Thank you! and Best wishes!
Hi, I meet a problem during training:D_loss becomes NAN I have tried several times and the problem always appears at around 2600 steps. I have loaded the teacher ckpt and I only changed data-dir and batch-size(from 8 to 6 since limited by memory) in "run_train_val.sh" .
issue25 seems to meet a similar problem. But I can't find branch d1ec858 mentioned in that issue.
Could you please give me some advice? Thank you! and Best wishes!