Open kurtisc opened 4 years ago
Rebased on master and confirmed working when #125 is merged.
With regards to #145: I do have a test for this morphemizer, so hopefully that fulfils @shanrauf's comment.
Would you mind rebasing again, so I can see if the tests pass? I'll submit after.
I am really interested in this
I haven’t been able to build anki from scratch to import pyvi (I think because my hardware is a little old). Is there any other way I can get vietnamese parsing to work with morphman?
Hi!
Vietnamese doesn't separate words with spaces like most other languages that use the Latin alphabet[1], so the current spaces morphemizer is unsuitable.
[1] Fun read https://www.tandfonline.com/doi/pdf/10.1080/00437956.1963.11659787
I wasn't able to find a small library that would do word segmentation for Vietnamese like Jieba does for Chinese. To bundle pyvi in-code like Jieba has been bundled would require bundling many larger dependencies (e.g. Numpy).
So, if merged like this, it's unfortunately a burden on the end user to get the Vietnamese support working. On the other hand, if they don't want it, it won't appear or impact their usage.
If this gets included I'll look into packaging pyvi and it's dependencies as a separate addon like has been done for Mecab, licences permitting. That would make the installation more straight-forward and avoid forcing use of the source version of Anki.