Closed lantianx2020 closed 5 years ago
It is related to the way fasttext managed ngrams. Basically unigrams are in dict, but ngrams are managed by the hashing trick. Because hashing implies collisions, more ngrams you have more collisions you have. To fix, you can increase the size of the hashing trick (I have used up to 2^30, usually 2^25 is good enough for everything). But if just adding bigrams decrease the quality of your model, there are some chances ngram are not a good choice in your case.
@pommedeterresautee Thanks for your reply! Do you have any idea that in what cases ngram does not improve the performance of training? Thank you!
When you can do the stuff without using bigram :-) Some topical classification works well. When your text is long and so on. n-gram is a way to have local word order, and this is very useful for sentiment classification for instance.
Hi, I noticed that the accuracy of my prediction(using the function predict) decreased from 0.9 to 0.8 when I increased the parameter wordNgrams from 1 to 2 during training. And as I kept increasing wordNgrams, the accuracy kept decreasing and it even hit 0.002 when wordNgrams became 5. I was really confused since I thought increasing wordNgrams would improve the performance of training. Can anyone tell me what's going on? Thanks!