obynio / anki-japanese-furigana

Anki add-on providing support for adding furigana on Japanese text
https://ankiweb.net/shared/info/678316993
GNU General Public License v3.0
21 stars 7 forks source link

Rework kanji/reading association using RegEx to address 息抜き bug #24

Closed ahlec closed 2 years ago

ahlec commented 2 years ago

This will close #23.

There was a bug with generating readings for 息抜き under the current kanji/reading association algorithm. This is because the reading is いきぬき, and when it goes character by character and arrives at the first き, it uses kanji.index and detects the き at the end of the string and skips ahead to there, believing that 息抜 together receives the reading of . It then runs out of characters in the kanji and crashes.

For this PR, I've rewritten the association code using regular expressions. What was important was to have an algorithm that had a full view of the entire string — one that would realize there's a second き in the reading.

The idea here is that we take the kanji (息抜き) and convert this into a regular expression. We want the plugin to only generate furigana for kanji and not kana, so this regular expression helps us detect what "holes" should have furigana and which ones should.

kanji (息抜き) becomes → ^(.+?)き$

We then apply this regular expression to the reading (いきぬき), which results a groups match of [ "いきぬ" ]. We then use the Kanji to piece it all back together in the original order, reading from the regular expression match whenever we're replacing a (.+?).

I've added the example sentence from the bug report as a unit test, and ensured that all existing unit tests continue to pass. I've also run it through more cards in my personal deck and found no issues with this algorithm yet.

I've tested in both Anki 2.1.54 and Anki 2.1.49.

obynio commented 2 years ago

Cool work, I never stumbled upon this bug during my learning lessons for the moment but I can reproduce it using your examples ! I'll merge it and deploy everything ! Thanks 👍