Closed kbenoit closed 4 years ago
NLTK has nice muli-lingual stopwords list: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip
@koheiw good idea. We should add them all as an NLTK source, so that it's source = "nltk"
.
Here is a source for ancient Greek and Latin: https://wiki.digitalclassicist.org/Stopwords_for_Greek_and_Latin
https://www.ranks.nl/stopwords includes some interesting discussion of the sources of their stopword lists in 40 languages, with some history of their usage and different versions for some languages, such as English. But they appear to be already in stopwords-iso.
Kevin Bougé's stopword lists: https://sites.google.com/site/kevinbouge/stopwords-lists.