Closed freddy5566 closed 1 year ago
Hi, have you tried to train the model with more than one epoch? The inf loss issue also happened to me, but only in the first epoch. I suspect that it is because, at the beginning of training, the model got some inf loss due to the overflow issue; and it caused the avg loss of the whole epoch to be shown as inf. If you look at the actual loss at each step instead of the average loss, it shouldn't always be inf.
Hi, have you tried to train the model with more than one epoch?
No, I have not tried it yet. I terminated my training when I saw the inf. loss.
The inf loss issue also happened to me, but only in the first epoch. I suspect that it is because, at the beginning of training, the model got some inf loss due to the overflow issue; and it caused the avg loss of the whole epoch to be shown as inf.
Do you have any suggestions for it? Just wait couple more epochs? I also noticed that you used cross-entropy for the first three epochs in this project. I tried a similar approach. However, I still got inf. loss during training.
Yes, after the first epoch, you should be able to see normal loss values. Let me know if you have further questions!
thank you for your help. you save my day! after a couple of epochs, it finally becomes normal. I really appreciate your suggestions. thank you!
That's great! Nice to hear! I am closing this issue now. Feel free to reopen it if you have further questions!
Hi,
I noticed that every script in this repo run in fp16, including scripts using adaptive input. Previously, I encountered a Gradient overflow issue when I ran the Adaptive Input Representations example on fairseq's example. I just want to know if you modified any part of the code base to achieve it.
Here is the reference for this issue: https://github.com/facebookresearch/fairseq/issues/4293
Thank you in advance.