obynio / anki-japanese-furigana

Anki add-on providing support for adding furigana on Japanese text
https://ankiweb.net/shared/info/678316993
GNU General Public License v3.0
21 stars 7 forks source link

Exclude kana in between kanji from generating furigana #16

Closed ahlec closed 2 years ago

ahlec commented 2 years ago

This closes #12.

The response from Mecab for a word looks like 食べる[タベル]. The current implementation has support for hiragana/katakana at the starts and ends of the word, and will not include these in the furigana output (eg: 食べる[タベル]食[た]べる. However, for a word that has hiragana between two kanji (eg 走り出す), the kana in between the kanji would be included in the furigana.

Expected: 走[はし]り出[だ]す Actual: 走り出[はしりだ]す

This PR reworks the algorithm that was handling words with kanji support, away from an implementation that had custom logic for the starts and ends of words, and into one that operates on a general string matching algorithm. Going character by character through both the reading and the kanji, it matches hiragana substrings together into ReadingNodes with no reading, and uses lazy string matching to find the shortest reading for the kanji before we encounter another kana match between reading and kanji.

In order to ensure there were no regressions, I fixed up the unit tests (which had some existing breaks) and added new ones that supported all previous cases as well as new ones that specifically target this use case.

I've tested this on both Anki 2.1.54 (M1 Silicon) and Anki 2.1.49 (Intel), and it loads and appears to work fine. I ran a large number of cards through this algorithm and found no outliers for this algorithm.

obynio commented 2 years ago

Oh great work ! I wrongly assumed it was not possible but I supposed I did not studied enough how MeCab is working.

That will be a major change for this module since it completely change the way furigana are generated historically, since ClozeFuriganaTool was released. But I believe it is a good change.

Thank you for testing for regressions on other Anki versions, I'm also happy to see that the native M1 support is working correctly for other users.

obynio commented 2 years ago

Done, the pull request is merged and published. This change was released in 1.3.0