Closed goiPP closed 6 years ago
Oh, you cannot use deepcut.tokenize
in CountVectorizer
! It's a completely different tokenizer
that scikit learn requires. So there are 2 ways to turn your text into bag-of-words format:
1) Use DeepcutTokenizer
, see example in https://github.com/rkcosmos/deepcut/blob/master/deepcut/deepcut.py#L116-L120 (you can also use unigram/ bigram in this case).
2) You can tokenize
text yourself and transform it to sparse matrix. I wrote a blog post about it at https://tupleblog.github.io/deepcut-classify-news/
Let me know if you have future questions.
Sorry, it turns out to be issue of my server instead. And thanks for your blog suggestion.
I have used
deepcut.tokenize
as an analyzer in aCountVectorizer
and it raises an errorSo I don't know could this is the problem with
deepcut
usingkeras
or not.?? reference issue -> https://github.com/keras-team/keras/issues/2397