Closed kecaps closed 10 years ago
ping @japerk
Thanks for the updates @kecaps. If you have the time, I'd really appreciate more tests in tests/train_classifier.sh, especially for multi binary classifiers. This uses http://github.com/bmizerany/roundup to check script output. Also, any functions you want to extract for use elsewhere can be put in a module in the nltk_trainer package, or one of the subpackages (like featx).
I found your nltk-trainer, and it worked great as a basis for comparing different classifiers for my project. While working on it, there were a few bugs that I fixed and some pain points I had in dealing with large datasets. I changed some code to use generators rather than lists for intermediate processing, and I refactored the code to only read in the dataset once and changed it to only score word features based on the training set rather than the test set.