Closed nzw0301 closed 6 years ago
@nzw0301 Hi, Thank you very much for your valuable comments! I just tried 100 epochs with my preprocessing (cleaning and tokenizing text as Kim 2014, removing stop words using NLTK and words appearing less than 5 times).
Now the accuracy for fastText is 0.7876 and the accuracy for fastText (bigrams) is 0.7978.
Thanks again for your comments and correction!
Thank you for your quick response and experiment. My pleasure.
I read a paper, "Graph Convolutional Networks for Text Classification", on arXiv. I noticed a strange point in Table 2 and typo, so I report them.
Summary: fastText's test accuracy is too low on 20NG.
Pre-processing
I used the script below to normalize train and test data.
Training
Evaluate
So accuracy is
0.769
, but Table 2 reports0.1138
. Even if I do not remove stopwords usingNLTK
, this accuracy is too low in my experience. For the same reason,fastText (bigrams)
's result also seems to be unfair.The default epoch
5
is too small to learn classifier because 20NG is a small corpus. I suggest that you make the number of epoch bigger value for fair comparison.Misc
Adam's paper BibTeX seems to be wrong.
Diederik P. Kingma and Jimmy Lei Ba
, butKinga, D., and Adam, J. B.
in your paper.