Closed Zylinks closed 6 years ago
Yes, because we have to use deepcut to tokenize Thai text, not just splitting words like in English. If you have English text, use CountVectorizer instead.
OK i use to tokenize thai text, i will still use deepcut. if i run process with GPU it's better or not. thank you. // sorry for ask to many question. I just learn ML a few month
CPU is fine. You can tweak the for loop (in DeepcutTokenizer) to use multiple processing to make it faster.
Thank you so much
Why Performance of DeepcutTokenizer is slow more than Countvectorizer. In case of Countvertorizer fit_transform 1000 sentense about 2-3 sec but In deeptokenizer fit_tranform using same data set about 5 min with same env.