hexenq / kuroshiro

Japanese language library for converting Japanese sentence to Hiragana, Katakana or Romaji with furigana and okurigana modes supported.
https://kuroshiro.org
MIT License
781 stars 88 forks source link

Dictionary size #70

Closed gonzales-gerald closed 4 years ago

gonzales-gerald commented 4 years ago

Hi,

I would like to know if it is possible to load only the files required for Kanji to Katakana. Our problem is that the translation files are too big too load on first navigate of our application.

hexenq commented 4 years ago

Hello there. It would be slow on bootstrap when you use Kuromoji Analyzer. Actually, Kanji to Katakana or Kanji to Romaji doesn't make such big difference. It's the process of tokenization that costs the most which relies on big dict files, and this process is fundamental.

I suggest you have a look at the mecab or yahoo-api analyzer. Both of them would save front-end loading time by handing over the workload to back-end server. You could consider using them instead if it's possible for your situation.

gonzales-gerald commented 4 years ago

Hello! Yes, now I understand that it is not a one-to-one conversion so this dictionaries are needed. Thank you for the suggestions, I will take a look at them!

gonzales-gerald commented 4 years ago

I would like to clarify the usage of mecab as an alternative. If I understand it right, I will need to install the mecab dictionaries on the webserver and then use it on my front-end application. Am I correct?

Also, for the demo, I believe you are using an API version?

hexenq commented 4 years ago

Hello Gonzales. You're right. The mecab dictionary and kuroshiro both should be deployed on a server-end which exposes an API interface to your front-end. And I think the mecab-ipadic-neologd would be a good choice for you.

The demo page is using server-side kuroshiro with Yahoo-WebAPI analyzer plugin.

gonzales-gerald commented 4 years ago

Hi! We ended up using Express. Nothing changed, we are still using kuroshiro analyzer except it is running on it's own service and provides an API interface for our front-end.

Thank you for your help!