scean / noto

Automatically exported from code.google.com/p/noto
0 stars 0 forks source link

Change the mapping for a few characters in Noto Sans CJK KR #151

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Noto Sans CJK KR has the Japanese glyphs (kinda half-width )  for the following 
characters : 

关 (U+5173) and 复 (U+590D) 

They'd better be mapped to Simp. Chinese glyphs in Noto Sans CJK KR. 

Perhaps, the following code points may need to be mapped to Hans glyphs as 
well. 
甩 (U+7529) , 门 (U+95E8)

I'm aware that this is beyond what's scoped for Korean, but the Japanese glyphs 
for U+5173 and U+590D being half the width is not expected by Korean users 
whose font setting is to use 'Noto Sans CJK KR' when no separate language 
information is available (i.e. when visiting web pages in Simplified Chinese 
with NO 'lang' tag ).   Given that U+5173/590D is much less frequent in 
Japanese documents than in SC,  picking SC glyphs for KR would work better on 
the average for Korean users visiting 'untagged' web pages. 

If there are other characters like U+5173 / U+590D (kinda half-width JP glyphs 
used for KR font instance; much more frequently used in zh-Hans than in ja ) , 
the same change would be necessary. 

Original issue reported on code.google.com by jungs...@google.com on 15 Sep 2014 at 11:49

GoogleCodeExporter commented 9 years ago
These mapping changes can be made for the Korean fonts, but I don’t 
understand the precedent for changing the mapping for 门 (U+95E8) other than 
possibly consistency with the other characters that use that component, all of 
which are completely out of scope of Korean.

Original comment by ken.lu...@gmail.com on 22 Sep 2014 at 10:14

GoogleCodeExporter commented 9 years ago
> I don’t understand the precedent for changing the mapping for 门 (U+95E8) 
other than
> possibly consistency with the other characters that use that component, 
> all of which are completely out of scope of Korean.

Somehow I misunderstood your question during the meeting. The rationale for 
doing so is already given in the bug report, but maybe it's not clear enough. 

They're out-of-scope for Korean (i.e. they'll not show up in ordinary Korean 
document).  

A scenario I'm concerned about is when Korean users visit Chinese (Simplified) 
web pages *without* 'lang' attribute set properly (to zh-Hans or zh-CN). Web 
browsers do not have any other information about the language of a document [1] 
 and a fallback font selection will rely on the user's font choice in browser's 
font preference for Korean.

If that font is 'Noto Sans CJK KR', I want the glyph for U+95E8 to be that of 
Hans instead of Japn because the chance of U+95E8 showing up in Japanese 
document is much lower than in Hans documents (multiply it by the chance of 
Korean users visiting Hans page vs Japn page). 

If there's an easy way to identify 'pre-dominantly Hans characters' (i.e. 
they're extremely rare in Japn but relatively common in Hans), we want Noto 
Sans CJK KR to use Hans glyphs for them. 

[1] Web browsers can take into account the encding (if it's a non-Unicode 
encoding such as GBK, it can assume that the doc is in zh-Hans although there's 
no guarantee) and ccTLD (ie. if it's .cn, assume that a web page is in 
zh-Hans), etc.  Some browsers do while others do not. My scenario is when even 
those are unavailable (i.e. foobar.com and the encoding is UTF-8), in which 
case browsers fall back to the UI language of the browser to disambiguate CJK 
variant issues. Yet another piece of information browsers can use is 
Accept-Language list (pick up the first CJK language in A-L list to pick what 
CJK font to use), but no browser does that, yet (I plan to implement that for 
Chrome). 

Original comment by jshin@chromium.org on 24 Sep 2014 at 12:05