Closed polm closed 4 years ago
Korean support is in since 0.1.8, but it needs more testing. If anyone could take a look at it and make sure it's OK that'd be much appreciated.
It's not clear anyone has used the Korean support and I still don't have a good way of testing it. Since it turns out there's a well-maintained Korean-specific NLP library, KoNLPy, that wraps MeCab, I'm going to remove Korean support from fugashi for now. If anyone has a need for it I can try to add it back in later.
One other thing to note is that mecab-ko makes some Korean-specific changes to Mecab's internal scoring algo, so it doesn't work with fugashi wheels anyway.
It'd be nice to support Korean. A simple way to do this would be to subclass the tagger with a KoreanTagger and overwrite the field names, or allow fields to be passed in at creation time.
The tagspec for mecab-ko-dict is here. 2.0 seems to be the most recent one so I guess it makes sense to support that.
Field names and meaning based on Google translate:
/
as delimiter)In Korean a fork of MeCab is used, it looks like one difference is how whitespace is handled. Not sure if fugashi will just work with it, but since natto-py seems to work there should be a way to support it.