melink14 / rikaikun

rikaikun is a Chrome extension that helps you to read Japanese web pages by showing the reading and English definition of Japanese words when you hover over them.
https://chrome.google.com/webstore/detail/rikaikun/jipdnfibhldikgcjhfnomkfpcebammhp
GNU General Public License v3.0
424 stars 80 forks source link

Add `lang="ja"` to rikaikun content HTML so that Chrome uses Japanese fonts to render it always #220

Closed ChocoChopin closed 4 years ago

ChocoChopin commented 4 years ago

Chrome has known problems with Japanese text--if a particular font isn't specified, it'll arbitrarily render hanzi (from an actual hanzi font, which is Microsoft JhengHei UI by the looks of it), instead of kanji, alongside kana. This can be clearly observed in the characters that differ between Japanese and simplified Chinese, such as 直 and 辶.

It appears that Rikaikun doesn't specify any font at all, because not only have I noticed that its kana font displays as whatever the default kana font is for that particular version of Chrome (and the default has varied throughout a few different versions of Chrome), it consistently exhibits the problem I've described above, as seen here (the highlighted character is the correct Japanese character, displayed on a page I designed myself that manually specifies the Japanese font via CSS; the character Rikaikun displays is a hanzi character): https://prnt.sc/u4jtp4

The solution to this should be trivially simple: add the line font-family: Yu Gothic UI; (or whatever your font preference is, that one's my personal favorite by far) wherever Rikaikun's CSS code goes, and Rikaikun will never incorrectly display hanzi again.

For good measure, I'm on Windows 10 and Chromium 86, but the same problem exists on various versions of Chrome and Chromium I've used Rikaikun with.

Edit: as an example, I've added said line in all the relevant places in the Light Blue CSS file. The change works as expected, and the correct characters are now displayed. Also, the kana look a hell of a lot prettier if you ask me, those fat Meiryo kana drive me up the wall for some reason. https://prnt.sc/u4ked2

melink14 commented 4 years ago

Thanks for the report.

I agree it's annoying when the Japanese glyphs get rendered with Chinese fonts but I'm not sure if I should force a font in rikaikun or leave it up to the user to configure their system and fonts to their preference.

On my system, chrome is setup to use Japanese fonts so even when the page is displaying Chinese glyphs, rikaikun will display Japanese: image

(Taken from https://en.wiktionary.org/wiki/%E7%BD%AE which has separate sections for Chinese and Japanese)

Interestingly, #53 asked for the opposite configuration because they were confused that the page was displaying Chinese fonts even though rikaikun was displaying Japanese (due to lang='ja' I believe).

I'm not against choosing a font for rikaikun but there are some complications:

  1. Everybody has their own favorite font.
  2. Different systems have different fonts installed. I'd definitely need to include the font chosen inside the extension (or refer to fonts.google.com and hope caching is good)

I feel like it's a bit more flexible to allow the user to configure their system to their liking unless there's some reason you think people would want rikaikun to be different than the rest of their Japanese pages?

ChocoChopin commented 4 years ago

User preference is certainly a relevant concern, and a bit of a tedious nut to crack in terms of implementing font options, isn't it? From a purely practical perspective (rather than an aesthetic one), the real concern is that students of the language might be doing a lot of their immersion using Rikaikun (which is precisely what I do, thanks for that), and it could eventually prove somewhat confusing if they're learning hanzi instead of kanji.

I don't know enough to say for certain, but although it seems there's not an enormous amount of difference between the relevant characters, nor an enormous amount of characters that differ, the differences are meaningful--I myself wasn't actually aware that Chrome had this issue until I noticed its fonts looked different between different versions, and different from Firefox, and made a reddit thread in an attempt to identify which font was which. Someone with amazing eyes immediately recognized the exact hanzi font in question, and promptly warned me that I'd do best to learn Japanese with an actual Japanese font instead of Chrome's bugged one. Suddenly I realized why I was seeing all the discrepancies between characters I'd noticed, such as when looking them up on Jisho (which does specify its fonts).

As for users choosing their own font, is there some way they're able to alter Rikaikun's font without obtaining the source and editing it themselves, as I did? It didn't occur to me that I might do it any other way. It seems that as Rikaikun is, users are simply stuck with whatever their browsers choose for them, and in the case of Chrome, by default it chooses an incorrect font. In my opinion, it's better that Rikakun's users are stuck with a correct Japanese font as determined by Rikaikun itself than that they're stuck with whatever nonsense Chrome foists upon them.

Of course, Chrome's devs are ultimately to blame for all of this inconsistency, and it's not even clear how they managed to bungle this as they did, since Chrome's actually mashing together two separate typefaces (Jhenghei UI for kanji and Meiryo for kana, by default) to display Japanese text, instead of just using, you know, one Japanese typeface. Chrome's apparently had the problem for years, and who knows when it'll be addressed. As for CSS font selection, you don't much have to worry about compatibility with different platforms and machines since you can just list several widely available fonts in order of preference, and cover the defaults included on every platform. Yu Gothic UI or Meiryo UI should be available on all Windows machines, though I'm not familiar with the options for other platforms off the top of my head.

Even better than that, of course, would be some sort of font selection option--one that could also simply be disabled should you prefer whatever your system/browser defaults end up producing. But I don't know enough about how Rikaikun is coded to know how difficult that might be to code. If you think it's worth pursuing, maybe I'd work on it. I'm only a beginner coder, but I imagine I'd be able to figure it out.

In your case, how did you get Chrome to display Japanese fonts correctly by default? Does that have to do with a system language setting?

Edit: holy hell, literally nine years Chrome has been doing this. https://www.reddit.com/r/LearnJapanese/comments/iup3n/psa_google_chrome_kanji_rendering/

melink14 commented 4 years ago

As you noticed, it's a long standing problem but in general is fixable using setting. I researched the underlying technical reasons a bit which I'll document first.

First, I agree that showing the correct character is important but wonder if we can accomplish that in another way besides forcing a font.

Root Cause and history

This was reported as a bug to Chromium a long time ago via https://bugs.chromium.org/p/chromium/issues/detail?id=338076. There we can see the root cause is simply: Missing lang attributes specifying the correct language

At the time the dev mentioned that it was maybe possible to hope for better locale detection but would probably not be reasonable for a page mostly in another language with 1 or 2 characters sprinkled about.

He did mention that inferring default fallback font based on Chrome language settings was already being tracked (https://bugs.chromium.org/p/chromium/issues/detail?id=179331) and I found that feature already worked for me. Root cause 2: Japanese not set as a known langauge in Chrome settings

I did some testing and noticed that even if a page is 100% Japanese (https://www1.gifu-u.ac.jp/~satopy/ronnanori.htm) it still detects locale as Chinese so I opened a new issue here: https://bugs.chromium.org/p/chromium/issues/detail?id=1121074

Summary

For pages without lang attribute or where lang attribute is not 'ja', Chrome has no way of knowing what font to use and on Windows falls back to Chinese. If the user has Japanese as a known langauge in Chrome settings then it falls back to Japanese.

In my case, I have Japanese added as a secondary language in my Chrome settings which is why my rikaikun renders correctly. (If I disable that I get SimSun for kanji and Meiryo for kana.)

Next Steps

  1. It seems clear that we should add lang="ja" to the rikaikun popup which will ensure Chrome fallsback to the correct system font. I'm not sure if we should add it to the enire popup or just to the sections which are known to contain Japanese. The latin characters seem to render fine in lang="ja" so whole popup is easiest.

  2. Maybe write an FAQ about font rendering since it seems there's a lot of misinformation out there around exactly what's wrong and how to fix it.

  3. Possibly open a new issue for font customization. We could add a 'string' option to allow people to add any font string they wanted and add it directly to the injected HTML. Though maybe we can also just publicize "Advanced Font Settings" written by a chromium dev which allows the user to pick their own fonts per script type (https://chrome.google.com/webstore/detail/advanced-font-settings/caclkomlalccbpcdllchkeecicepbmbm)

Thanks again for the feedback.