microsoft / IRNet

An algorithm for cross-domain NL2SQL
MIT License
264 stars 81 forks source link

getting loss as 'nan' after 1st epoch only? #47

Open shubhamk16 opened 4 years ago

shubhamk16 commented 4 years ago

Hello guys, just like the Glove, I created a dictionary of all the possible words. with keys as words and values as 768 embedding vector for BERT. But when I use this dictionary and train the model, the loss is getting nan in 1st epoch only. 1) How to handle this problem? 2) what are the possible reasons for getting a loss 'nan'? 3) Is this a good approach, to make a dictionary of embedding vectors?

alkaideemo commented 4 years ago

I got a similar problem. Here it's not numerical stable while computing the loss. https://github.com/microsoft/IRNet/blob/c32946061b6640a4a715e663ba17bceeca2a05bf/src/model.py#L308-L309 https://github.com/microsoft/IRNet/blob/c32946061b6640a4a715e663ba17bceeca2a05bf/src/model.py#L479-L480 I add a small number before log operation, problem solved.

liguozhanglearner commented 3 years ago

i have no idea about the loss function computing way

ersaurabhverma commented 2 years ago

Try reducing the learning rate. Your gradient is exploding due to high learning rate