shirakaba / mecab-ko

A fork of Mecab, with support for both Japanese and Korean, organised as a Cocoapod and npm package for usage with iOS/macOS.
7 stars 3 forks source link

How to support Chinese? #3

Open fishfree opened 1 year ago

fishfree commented 1 year ago

I have downloaded a trained Chinese tokenization model and dictionaries. How to integrate it the same way you did for Korean? Which source file and which codes have you modified? Thanks very much!

shirakaba commented 1 year ago

Sorry, just saw this now.

The hard work was done by eunjeon, in the original mecab-ko project. I just organised all the source files into a Cocoapod. The only sources I've contributed are the bindings to Obj-C and Swift (all the files in the ios folder).

To my understanding, the source code changes made by eunjeon were just to better support whitespace, which is not an issue for Chinese, so I imagine you could use the source code as-is and just provide the assets (the dictionary file and all the other resources that go alongside it). In other words, all the contents of the dicDir (dictionary directory).

In my case, I created another Cocoapod to organise all these assets into a resource bundle. So just fork that project and place your dictionary assets into there.