Closed kozistr closed 6 years ago
There're lots of Korean-morph-analyzers like Twitter, Mecab, Hannanum, etc...
Try all of them! But, the dataset in the wild isn't verified, not clean and of course, there're lots of coned-word. So, a proper analyzer is needed.
I think using soynlp (L-Tokenizer) should be used after dealing with spacing problem.
Later, I'll try!
There're lots of Korean-morph-analyzers like Twitter, Mecab, Hannanum, etc...
Try all of them! But, the dataset in the wild isn't verified, not clean and of course, there're lots of coned-word. So, a proper analyzer is needed.