Closed BasicallyOk closed 7 months ago
924d8f5 Uses NLTK stop words and tokenizer to support pruning. Language support will now be dependent on NLTK's own internal support (most popular languages).
close issue if done @BasicallyOk
I completely forgot, this issue was solved as part of #32 with NLTK tokenizer.
Issue persists, will fix in #66
Is your feature request related to a problem? Please describe. Word Cloud currently removes hyphens (-) and single quote ('). This causes short-forms like "can't" or "I've" to be treated as new words.
Describe the solution you'd like Support custom rules for languages possibly? A language parser to convert everything to full form. Hyphenated words is a little more complicated, so should be treated as its own word for now.
Describe alternatives you've considered Record possible short forms to database. This would be a little too inefficient imo. Custom rules could be beneficial for more features.
Additional context The text
Returns
as its 10 most significant words