polm / fugashi

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
MIT License
402 stars 33 forks source link

Is it possible to apply the user dictionary which is a object instead of a file ? #74

Closed rabbit19981023 closed 1 year ago

rabbit19981023 commented 1 year ago

Hello,

I would like to write a simple tool for processing the japanese ebooks, and I want to wrap the service into a web ui.

One of the feature I would like to implement is allowing user to provide the custom kana tag for specific word, like 地名 or 人名.

I have read the source code, it seems like it uses mecab's api to build the user dict and specify it in GenericTagger, but I still wondering if just use a json object (it could be other format) is possible?

polm commented 1 year ago

No, MeCab can only use user dictionaries that are files, that's simply a design limitation.

For your use, you can automate the process of creating a file from a JSON dictionary. Alternately, you can avoid using a user dictionary and just do post-processing, which will work if the tokenization is already what you want and you just need to override kana.

rabbit19981023 commented 1 year ago

Ok, I would like to do the post-processing.

Thanks a lot for your answer!