Fp16 on Adaptive Input Representations

princeton-nlp / TRIME

[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674

194 stars 13 forks source link

Fp16 on Adaptive Input Representations #3

Closed freddy5566 closed 1 year ago

freddy5566 commented 1 year ago

Hi,

I noticed that every script in this repo run in fp16, including scripts using adaptive input. Previously, I encountered a Gradient overflow issue when I ran the Adaptive Input Representations example on fairseq's example. I just want to know if you modified any part of the code base to achieve it.

Here is the reference for this issue: https://github.com/facebookresearch/fairseq/issues/4293

Thank you in advance.

a3616001 commented 1 year ago

Hi, have you tried to train the model with more than one epoch? The inf loss issue also happened to me, but only in the first epoch. I suspect that it is because, at the beginning of training, the model got some inf loss due to the overflow issue; and it caused the avg loss of the whole epoch to be shown as inf. If you look at the actual loss at each step instead of the average loss, it shouldn't always be inf.

freddy5566 commented 1 year ago

Hi, have you tried to train the model with more than one epoch?

No, I have not tried it yet. I terminated my training when I saw the inf. loss.

The inf loss issue also happened to me, but only in the first epoch. I suspect that it is because, at the beginning of training, the model got some inf loss due to the overflow issue; and it caused the avg loss of the whole epoch to be shown as inf.

Do you have any suggestions for it? Just wait couple more epochs? I also noticed that you used cross-entropy for the first three epochs in this project. I tried a similar approach. However, I still got inf. loss during training.

a3616001 commented 1 year ago

Yes, after the first epoch, you should be able to see normal loss values. Let me know if you have further questions!

freddy5566 commented 1 year ago

thank you for your help. you save my day! after a couple of epochs, it finally becomes normal. I really appreciate your suggestions. thank you!

a3616001 commented 1 year ago

That's great! Nice to hear! I am closing this issue now. Feel free to reopen it if you have further questions!