Closed MangoPomelo closed 2 years ago
What is the source of these?
He says this stopwords list is from BAIDU, the largest simplified Chinese searching engine. I have checked it and deleted part of words which contain non-Chinese characters.
It would be great to have chinese on ntlk
Resolved in https://github.com/nltk/nltk_data/commit/aa54613807a97886516d5f0d13c1374d29bf4257 Sorry for the long delay
chinese_simplified_stopwords.txt