obynio / anki-japanese-furigana

Anki add-on providing support for adding furigana on Japanese text
https://ankiweb.net/shared/info/678316993
GNU General Public License v3.0
17 stars 5 forks source link

Furigana added to katakana for single words that mix katakana and kanji #18

Closed ahlec closed 1 year ago

ahlec commented 1 year ago

In most cases, a word going into MeCab that consists of katakana and kanji (eg イタリア語) will be recognized and treated as two separate words (eg イタリア[イタリア] 語[ゴ]). This means that the first word goes down the pathway for standalone katakana words, and we don't generate furigana (as expected).

However, it seems that some words aren't recognized as separate nodes — for example: ローマ字 → ローマ字[ローマジ]. Because of this, it doesn't go down the "the whole word is katakana" pathway, and instead treats it like a complex word. As a result, we wind up attaching furigana to the katakana.

Expected: ローマ字[じ] Actual: ロ[ろ]ーマ字[まじ] (#16 refactor), or ローマ字[ろーまじ] (prior to #16)

I think the solution here would be to use Kakasi inside of the "complex word" loop. The issue here (at least in the new version) is that we're doing string equality 'ロ' == 'ろ' and if we check both hiragana and katakana during this equality, it should fix it.

I've not been able to figure out a pattern for this, or other test cases. Most other words are treated as two separate nodes.