Closed MissSoYa closed 5 years ago
Hello @MissSoYa, we do have the custom stop words in DeepcutTokenizer
but not currently in deepcut.tokenize
itself. However, for deepcut.tokenize
, you can manually do that by using for loop to remove the stop words
import deepcut
stop_words = ['ฉัน', 'อยาก']
[w for w in deepcut.tokenize('ฉันอยากกินข้าวของฉัน') if w not in stop_words]
>> ['กิน', 'ข้าว', 'ของ']
For DeepcutTokenizer
, adding stop words can be done as follows
from deepcut import DeepcutTokenizer
raw_documents = ['ฉันอยากกินข้าวของฉัน',
'ฉันอยากกินไก่มาก',
'อยากนอนอย่างสงบ']
tokenizer = DeepcutTokenizer(ngram_range=(1, 1), stop_words=['ฉัน', 'อยาก'])
X = tokenizer.fit_tranform(raw_documents) # will not have ฉัน, อยาก in tokenizer.vocabulary_
👍 Thank you but DeepcutTokenizer is not work.
Can you give a little more details and example why it doesn't work?
@MissSoYa if you re-install deepcut
via the repository with proper dependencies, DeepcutTokenizer
should work properly now. I will close the issue for now.
Hello Deepcut team, I would like to manually custom the stop word. Please let me know how.
Thank you :)