korean pre-processing (pos tagger)

kozistr / movie-rate-prediction

Movie Rate Prediction with Tensorflow

MIT License

5 stars 2 forks source link

Closed kozistr closed 6 years ago

kozistr commented 6 years ago

There're lots of Korean-morph-analyzers like Twitter, Mecab, Hannanum, etc...

Try all of them! But, the dataset in the wild isn't verified, not clean and of course, there're lots of coned-word. So, a proper analyzer is needed.

kozistr commented 6 years ago

I think using soynlp (L-Tokenizer) should be used after dealing with spacing problem.

Later, I'll try!