learntextvis / textkit

Command line tool for manipulating and analyzing text
MIT License
28 stars 6 forks source link

Expanding stopwords corpus #52

Closed irealva closed 7 years ago

irealva commented 7 years ago

Hi textkit team, I was using your awesome little tool but noticed there were a ton of languages missing from the stopwords corpus. I was specifically trying to use a spanish stopwords list, but many more languages were missing. I updated my own fork of this repo with the corpus used by NLTK.

I updated the code as thoroughly as I could and tried a local test, but someone else should certainly take a look if you think you'd like to incorporate. Otherwise, go ahead and close the PR.

vlandham commented 7 years ago

Thanks very much!

Sorry for the delayed reply. You are completely correct - we should have a more diverse stop word corpus.

i will try to test this out more thoroughly this week or next and then merge.

Thanks again!