loss:nan from training starts - Githubissues

pandeydivesh15 / NER-using-Deep-Learning

A project on achieving Named-Entity Recognition using Deep Learning.

MIT License

25 stars 14 forks source link

loss:nan from training starts #2

Open jenniferzhu opened 7 years ago

jenniferzhu commented 7 years ago

For the English data sets, loss is nan since training starts. I tried to match the Keras and tensorflow versions (except that I use CPU), but it doesn't help. I also tried different datasets or optimizer, none of them helps. Can you please share some thoughts on troubleshooting?

FYI, every output looks the same before m.train(epochs=10) in the "NER-using-Deep-Learning" notebook.

pandeydivesh15 commented 7 years ago

I hope you have all spacy data downloaded (for getting vectors).

Different versions can surely be a problem. But I also faced a similar problem while I was working on this. Actually, value of dropout was creating problems. Try replacing m.make_and_compile() by m.make_and_compile(units=100, dropout=0.0, regul_alpha=0.0001) in the main ipython notebook. Also, for better results, if you have enough memory, you can increase word vector dim to 300. Change value of self.LEN_WORD_VECTORS(process_data.py) to 300.

jenniferzhu commented 7 years ago

Thanks for the prompt reply, Divesh! Do you mind checking which version of spacy library you are using? I did notice that my spacy gave slightly different results in the spacy tests, but no errors were reported. And thank you again for the suggestions on Keras codes. I will try that and see what happens.

pandeydivesh15 commented 7 years ago

You’re welcome @jenniferzhu I am using spacy==1.6.0 currently.

jenniferzhu commented 7 years ago

Hi Divesh, I modified the codes as you suggested. It does solve the "loss: nan" issue, but it will predict all tokens to the category "O". I guess dropout helps with the imbalanced dataset. Is there some other ways you have done to balance the dateset?

pandeydivesh15 commented 7 years ago

Are versions of keras and tensorflow matching with Keras==1.2.1 and tensorflow==0.12.1?

pandeydivesh15 commented 7 years ago

You can try changing regul_alpha while keeping dropout 0. Also, if you have sufficient memory, try changing len of vectors to 300. It will boost up results. Regarding imbalanced dataset, try to prune it using some conditions. For e.g. maximum length of sentence < threshold, atleast one entity in every sentence, etc.

jenniferzhu commented 7 years ago

These are great tips! I got similar results to you by reinstalling spacy and using your parameters! Thank you, Divesh! BTW, do you have any recommended webpages so that I can read and understand why and how to tune those parameters?

pandeydivesh15 commented 7 years ago

Currently, I don't have any specific webpages(for NER) but if you like, you can go through these links. how_to_choose_a_neural_network's_hyper-parameters distill.pub

jenniferzhu commented 7 years ago

This is great! Thanks Divesh. You’re an expert!

On Aug 27, 2017, at 8:54 PM, Divesh Pandey notifications@github.com wrote:

Currently, I don't have any specific webpages(for NER) but if you like, you can go through these links. how_to_choose_a_neural_network's_hyper-parameters https://urldefense.proofpoint.com/v2/url?u=http-3A__neuralnetworksanddeeplearning.com_chap3.html-23how-5Fto-5Fchoose-5Fa-5Fneural-5Fnetwork-27s-5Fhyper-2Dparameters&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=0968PGxAn_o3qaSY4gAOUc-KCsOVN0E7twnK9M6XH1I&e= distill.pub https://urldefense.proofpoint.com/v2/url?u=https-3A__distill.pub_&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=G9Deowz5mUn_le8hwLV3B8O-UlgePDjTu_N6rAFOWBw&e= — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pandeydivesh15_NER-2Dusing-2DDeep-2DLearning_issues_2-23issuecomment-2D325254200&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=434H8oE3xbj_cQbYrpz4M0Mjx4e7DYa233-LQs0bz14&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ATRquRodFGYOHBCbXAvJkqCYEUejKDjTks5scjnngaJpZM4O8zhf&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=H4436GeOhBeao4OYNkoV4-WXQsnQzkX6q0IH0o5sM1E&e=.

pandeydivesh15 commented 7 years ago

definitely not an expert :smile:

jenniferzhu commented 7 years ago

@pandeydivesh15 You suggestions worked great for the English dataset, but the same issue occurred again when I switched datasets. I am trying to understand what your logics is when you tuned the model, in order to avoid the same prediction "O" for all cases. I tried to grid search with learning rates, but it did not help. Can you please share your logics to tune the model?

pandeydivesh15 commented 7 years ago

Sorry for late replying. I had no specific logic while tuning model. The arguments to m.make_and_compile() played an important factor. The most important one being dropout. In my case, setting dropout helped me in Hindi dataset while it was failing for English.