nltk / nltk_data

NLTK Data
1.43k stars 1.03k forks source link

Chinese simplified stopwords #137

Closed MangoPomelo closed 2 years ago

MangoPomelo commented 5 years ago

chinese_simplified_stopwords.txt

stevenbird commented 4 years ago

What is the source of these?

MangoPomelo commented 4 years ago

https://github.com/goto456/stopwords/blob/master/%E7%99%BE%E5%BA%A6%E5%81%9C%E7%94%A8%E8%AF%8D%E8%A1%A8.txt

He says this stopwords list is from BAIDU, the largest simplified Chinese searching engine. I have checked it and deleted part of words which contain non-Chinese characters.

Alqua commented 3 years ago

It would be great to have chinese on ntlk

stevenbird commented 2 years ago

Resolved in https://github.com/nltk/nltk_data/commit/aa54613807a97886516d5f0d13c1374d29bf4257 Sorry for the long delay