Closed shlomihod closed 6 years ago
Data: Improvement with unlabeled negative training data
Corpus: Weekly Reader (2400 articles) Encyclopedia Britannica Britannica Elementary CNN - Western/Pacific Literacy Network
TIPSTER - negative training data (as grade 5+) Kidspot - Washington post - Generalization test
SVM
Lexical, syntax and other traditional features A2.2 Didn't introduce length of the article as a feature.
They've tried regression (and just classification)
https://pdfs.semanticscholar.org/288f/5d916d90d986b23d06660d89c71adfb1bf92.pdf