Open jenniferzhu opened 7 years ago
I hope you have all spacy data downloaded (for getting vectors).
Different versions can surely be a problem.
But I also faced a similar problem while I was working on this.
Actually, value of dropout
was creating problems.
Try replacing m.make_and_compile()
by m.make_and_compile(units=100, dropout=0.0, regul_alpha=0.0001)
in the main ipython notebook.
Also, for better results, if you have enough memory, you can increase word vector dim to 300. Change value of self.LEN_WORD_VECTORS
(process_data.py) to 300.
Thanks for the prompt reply, Divesh! Do you mind checking which version of spacy library you are using? I did notice that my spacy gave slightly different results in the spacy tests, but no errors were reported. And thank you again for the suggestions on Keras codes. I will try that and see what happens.
You’re welcome @jenniferzhu
I am using spacy==1.6.0
currently.
Hi Divesh, I modified the codes as you suggested. It does solve the "loss: nan" issue, but it will predict all tokens to the category "O". I guess dropout helps with the imbalanced dataset. Is there some other ways you have done to balance the dateset?
Are versions of keras and tensorflow matching with Keras==1.2.1 and tensorflow==0.12.1
?
You can try changing regul_alpha
while keeping dropout 0. Also, if you have sufficient memory, try changing len of vectors to 300. It will boost up results.
Regarding imbalanced dataset, try to prune it using some conditions. For e.g. maximum length of sentence < threshold, atleast one entity in every sentence, etc.
These are great tips! I got similar results to you by reinstalling spacy and using your parameters! Thank you, Divesh! BTW, do you have any recommended webpages so that I can read and understand why and how to tune those parameters?
Currently, I don't have any specific webpages(for NER) but if you like, you can go through these links. how_to_choose_a_neural_network's_hyper-parameters distill.pub
This is great! Thanks Divesh. You’re an expert!
On Aug 27, 2017, at 8:54 PM, Divesh Pandey notifications@github.com wrote:
Currently, I don't have any specific webpages(for NER) but if you like, you can go through these links. how_to_choose_a_neural_network's_hyper-parameters https://urldefense.proofpoint.com/v2/url?u=http-3A__neuralnetworksanddeeplearning.com_chap3.html-23how-5Fto-5Fchoose-5Fa-5Fneural-5Fnetwork-27s-5Fhyper-2Dparameters&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=0968PGxAn_o3qaSY4gAOUc-KCsOVN0E7twnK9M6XH1I&e= distill.pub https://urldefense.proofpoint.com/v2/url?u=https-3A__distill.pub_&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=G9Deowz5mUn_le8hwLV3B8O-UlgePDjTu_N6rAFOWBw&e= — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pandeydivesh15_NER-2Dusing-2DDeep-2DLearning_issues_2-23issuecomment-2D325254200&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=434H8oE3xbj_cQbYrpz4M0Mjx4e7DYa233-LQs0bz14&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ATRquRodFGYOHBCbXAvJkqCYEUejKDjTks5scjnngaJpZM4O8zhf&d=DwMFaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=FFMRLxSvLRFG9Y4FCNeKxw&m=hYXYe7gdQV0Vx_4hdJLud6vI7VQIATAg1Il43MQKkcQ&s=H4436GeOhBeao4OYNkoV4-WXQsnQzkX6q0IH0o5sM1E&e=.
definitely not an expert :smile:
@pandeydivesh15 You suggestions worked great for the English dataset, but the same issue occurred again when I switched datasets. I am trying to understand what your logics is when you tuned the model, in order to avoid the same prediction "O" for all cases. I tried to grid search with learning rates, but it did not help. Can you please share your logics to tune the model?
Sorry for late replying.
I had no specific logic while tuning model. The arguments to m.make_and_compile()
played an important factor. The most important one being dropout
.
In my case, setting dropout helped me in Hindi dataset while it was failing for English.
For the English data sets, loss is nan since training starts. I tried to match the Keras and tensorflow versions (except that I use CPU), but it doesn't help. I also tried different datasets or optimizer, none of them helps. Can you please share some thoughts on troubleshooting?
FYI, every output looks the same before m.train(epochs=10) in the "NER-using-Deep-Learning" notebook.