rkcosmos / deepcut

A Thai word tokenization library using Deep Neural Network
MIT License
420 stars 96 forks source link

Performance of DeepcutTokenizer #44

Closed Zylinks closed 6 years ago

Zylinks commented 6 years ago

Why Performance of DeepcutTokenizer is slow more than Countvectorizer. In case of Countvertorizer fit_transform 1000 sentense about 2-3 sec but In deeptokenizer fit_tranform using same data set about 5 min with same env.

titipata commented 6 years ago

Yes, because we have to use deepcut to tokenize Thai text, not just splitting words like in English. If you have English text, use CountVectorizer instead.

Zylinks commented 6 years ago

OK i use to tokenize thai text, i will still use deepcut. if i run process with GPU it's better or not. thank you. // sorry for ask to many question. I just learn ML a few month

titipata commented 6 years ago

CPU is fine. You can tweak the for loop (in DeepcutTokenizer) to use multiple processing to make it faster.

Zylinks commented 6 years ago

Thank you so much