quanteda / stopwords

Multilingual Stopword Lists in R
http://stopwords.quanteda.io
Other
113 stars 9 forks source link

Add more stopword sources #3

Closed kbenoit closed 4 years ago

kbenoit commented 6 years ago

Here is a source for ancient Greek and Latin: https://wiki.digitalclassicist.org/Stopwords_for_Greek_and_Latin

https://www.ranks.nl/stopwords includes some interesting discussion of the sources of their stopword lists in 40 languages, with some history of their usage and different versions for some languages, such as English. But they appear to be already in stopwords-iso.

Kevin Bougé's stopword lists: https://sites.google.com/site/kevinbouge/stopwords-lists.

koheiw commented 5 years ago

NLTK has nice muli-lingual stopwords list: https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/stopwords.zip

kbenoit commented 5 years ago

@koheiw good idea. We should add them all as an NLTK source, so that it's source = "nltk".