Closed kisabaka closed 13 years ago
Thanks for this. The only thing I don't like is the nmin_one arg (though I do understand why you added it). But I think it'd be better to support multiple ngram flags, like --ngrams 1 --ngrams 2 --ngrams 3 ... Since argparse supports an "append" option, you up for adding that?
I've changed the action of ngrams to "store". Have a look at https://github.com/dougk7/nltk-trainer/blob/master/train_classifier.py#L189 and let me know what you think
Pretty slick, though I don't want to lose --bigrams. Maybe --bigrams implies --ngrams 1 --ngrams 2, unless --ngrams is already given. If --ngrams and --bigrams are both given, then it could either raise ValueError (which is something I do in the other train_*.py scripts when two or more args are incompatible) or --bigrams would be equivalent to --ngrams 2. Thoughts? And I'm happy to add --bigrams back in myself if you're not interested, though it won't be for at least a week.
Ok, you're already ahead of me updating the documentation :) And --ngrams 1 --ngrams 2 isn't that much extra typing compared to --bigrams, so you can ignore most of my previous comment. And thanks again!
Currently train_classifier only supports unigrams and bigrams. I added support for ngrams of length > 2 but kept the current behaviour of the --bigrams flag.