There was a bug with generating readings for 息抜き under the current kanji/reading association algorithm. This is because the reading is いきぬき, and when it goes character by character and arrives at the first き, it uses kanji.index and detects the き at the end of the string and skips ahead to there, believing that 息抜 together receives the reading of い. It then runs out of characters in the kanji and crashes.
For this PR, I've rewritten the association code using regular expressions. What was important was to have an algorithm that had a full view of the entire string — one that would realize there's a second き in the reading.
The idea here is that we take the kanji (息抜き) and convert this into a regular expression. We want the plugin to only generate furigana for kanji and not kana, so this regular expression helps us detect what "holes" should have furigana and which ones should.
kanji (息抜き) becomes → ^(.+?)き$
We then apply this regular expression to the reading (いきぬき), which results a groups match of [ "いきぬ" ]. We then use the Kanji to piece it all back together in the original order, reading from the regular expression match whenever we're replacing a (.+?).
I've added the example sentence from the bug report as a unit test, and ensured that all existing unit tests continue to pass. I've also run it through more cards in my personal deck and found no issues with this algorithm yet.
Cool work, I never stumbled upon this bug during my learning lessons for the moment but I can reproduce it using your examples ! I'll merge it and deploy everything ! Thanks 👍
This will close #23.
There was a bug with generating readings for 息抜き under the current kanji/reading association algorithm. This is because the reading is
いきぬき
, and when it goes character by character and arrives at the first き, it useskanji.index
and detects the き at the end of the string and skips ahead to there, believing that息抜
together receives the reading ofい
. It then runs out of characters in the kanji and crashes.For this PR, I've rewritten the association code using regular expressions. What was important was to have an algorithm that had a full view of the entire string — one that would realize there's a second き in the reading.
The idea here is that we take the
kanji
(息抜き) and convert this into a regular expression. We want the plugin to only generate furigana for kanji and not kana, so this regular expression helps us detect what "holes" should have furigana and which ones should.kanji
(息抜き) becomes →^(.+?)き$
We then apply this regular expression to the
reading
(いきぬき), which results a groups match of[ "いきぬ" ]
. We then use the Kanji to piece it all back together in the original order, reading from the regular expression match whenever we're replacing a(.+?)
.I've added the example sentence from the bug report as a unit test, and ensured that all existing unit tests continue to pass. I've also run it through more cards in my personal deck and found no issues with this algorithm yet.
I've tested in both Anki 2.1.54 and Anki 2.1.49.