Play around with different settings of the StringToWordVector:
- Different tokenizer
- Different stemmer
- Different stopword list or no stopword list
- Different min. word freq.
- Different num. of words to keep
- Different pruning
...
Document how each of those things influence the results.
Original issue reported on code.google.com by markus.neubrand on 5 Apr 2011 at 11:58
Original issue reported on code.google.com by
markus.neubrand
on 5 Apr 2011 at 11:58