ripose-jp / Memento

An mpv-based video player for studying Japanese
https://ripose-jp.github.io/Memento/
GNU General Public License v2.0
445 stars 21 forks source link

Add furigana on subtitles #89

Open ghost opened 2 years ago

ripose-jp commented 2 years ago

I'm aware that Voracious has this but it will likely never get added to Memento. There are two reasons:

  1. Qt doesn't support <ruby> tags in QTextEdit which is the backend for rendering subtitles in Memento. To get around this, I would have to rewrite Memento to use a custom subtitle rendering backend, likely using OpenGL and requiring forking of libass for the reasons described in #45. This isn't something I'm willing to do because of the sheer time commitment.
  2. The feature would work poorly because kanji can often be mapped to a bunch of different furigana. There's likely no way to tell which furigana would be correct without some heavy weight machine learning algorithm.
kik4444 commented 2 years ago

What about just displaying the furigana as a second line above the normal subtitles with some calculations to determine how long or big the furigana line should be to align with the normal line?

ripose-jp commented 2 years ago

That assumes I have way more control over QTextEdits and fonts than I actually do.

  1. I have no control over the position the the text beyond justifying it. For positioning, I would literally need to use whitespace characters.
  2. Most fonts are variable width, so I'd need some way to know the width of both the kanji and the furigana in advance. This isn't easy.
  3. If the furigana is wider than the kanji, I'd need to add padding to the kanji so the furigana doesn't hang over unrelated terms. This is also not easy.
  4. New lines in a QTextEdit are really fat by default. This can be fixed, but requires complicated logic.
  5. What do I do if kanji is split across multiple lines?

If you're not discouraged, feel free to implement it yourself. Personally I have no interest in hacking a QTextEdit to deal with this sort of stuff. Not to mention even after all this work is done, there is still the problem of kanji to furigana mappings not being one-to-one.

Calvin-Xu commented 2 years ago

From a non-technical point of view, I'd say generating furigana is a very hard problem to get right (elaborating on the point "kanji can often be mapped to a bunch of different furigana")

As an example, ImmersionKit provides a huge trove of sentences with furigana, taken from existing (mostly Jo-Mako's) decks that human oversight probably went into. If https://github.com/mathewthe2/immersion-kit-api/blob/3ec3a75f84fdc99ceb5967e345b009e19cf7d783/tokenizer/japanesetokenizer.py#L26 still reflects their process somewhat, you can see their NLP setup & the content-specific tweaks needed.

But there are still many issues. For a trivial example, search for sentences with 山道. The generated furigana for every instance of 山道 is さんどう (the onyomi reading), but if you listen to the sentences you'll find that some of them use the kunyomi reading やまみち. Both readings are valid and have the same meaning, and the only way to tell is to listen to the original dialogue audio. For examples where only one reading would be considered correct in the respective sentences, you can check out how 弾く(ひく、はじく)、堪える(こたえる、たえる、こらえる)and 惚ける(とぼける、ほうける、ほける)all get messed up.

I think maybe some ML solution can come in, and maybe speech recognition helps with Memento's case. But ultimately it's a hard problem that is influenced by literary sensibilities and artistic license (see the scholarly debate about whether 国境 should be くにざかい or こっきょう at the beginning of Snow Country). Because even state -of-the-art generated furigana likely has problems, it can be detrimental to rely on it when consuming new content as the mistakes can subvert your expectation of what you hear in subtle ways.