notofonts / noto-cjk

Noto CJK fonts
http://www.google.com/get/noto/help/cjk
3.03k stars 220 forks source link

Some AJ1-6 kanji characters are missing from JP/CJKjp fonts #113

Closed HisashiS closed 6 years ago

HisashiS commented 6 years ago

Looks like three AJ1-6 kanji characters are missing from NotoSerifJP, NotoSansJP and their CJK-jp counterparts, at least in the "Regular" weight font files. The apparently missing characters are U+6AF8 (CID 20152), U+90DE (CID 8635), U+96B8 (CID 7114). Were they intentionally omitted? Maybe U+237F1 was thought to be more appropriate for CID 20152, but for other two I don't know any such plausible alternatives.

HisashiS commented 6 years ago

If I may refer to CJK radicals in the same thread, I realized U+2EA9 (CID 13729) is missing from the NotoSerif/Sans JP/CJKjp font files. U+2E8D (CID 13834) also seems to be missing from the NotoSerif JP/CJK-jp, but not from NotoSans JP/CJK-jp.

kenlunde commented 6 years ago

I am curious about how you determined that the glyphs for these characters were missing, because not only are the glyphs for those five characters present in both the full glyph set and the region-specific Japanese subset, and for both typeface families, they also all use an explicit JP (Japanese) glyph:

Noto Sans CJK: U+2E8D ⺍ uni2E8D-JP (CID+1355; Adobe-Jaoan1-6 CID+13834) U+2EA9 ⺩ uni2EA9-JP (CID+1364; Adobe-Jaoan1-6 CID+13729) U+6AF8 櫸 uni6AF8-JP (CID+22837; Adobe-Jaoan1-6 CID+20152) U+90DE 郞 uni90DE-JP (CID+40890; Adobe-Jaoan1-6 CID+8635) U+96B8 隸 uni96B8-JP (CID+43439; Adobe-Jaoan1-6 CID+7114)

Noto Serif CJK: U+2E8D ⺍ uni2E8D-JP (CID+1359; Adobe-Jaoan1-6 CID+13834) U+2EA9 ⺩ uni2EA9-JP (CID+1368; Adobe-Jaoan1-6 CID+13729) U+6AF8 櫸 uni6AF8-JP (CID+22899; Adobe-Jaoan1-6 CID+20152) U+90DE 郞 uni90DE-JP (CID+41716; Adobe-Jaoan1-6 CID+8635) U+96B8 隸 uni96B8-JP (CID+44146; Adobe-Jaoan1-6 CID+7114)

HisashiS commented 6 years ago

I am curious about how you determined that the glyphs for these characters were missing

Thank you for your comments. Opening the font file with fontforge (on windows 7) and applying "CID->Flatten", "CID->convert to CID" and I get the output below for U+96B8 (CID 7114). I presumed this meant the glyph in question is missing. The same thing happens to the other 4 characters that I listed. cid7114 I also tried extracting the character codes using a simple fontforge script show below, and found the characters in question are again missing in the output, even when I regard those whose equivalents in CJK ideograph area appear as not missing (i.e. regarding U+52D2 as not missing because U+F952 appears)

SelectWorthOutputting() WriteStringToFile("","code_check.txt",0) foreach code=GlyphInfo("Unicode") if (code>0) WriteStringToFile(ToString(code),"code_check.txt",1) WriteStringToFile(Utf8(0x2c),"code_check.txt",1) WriteStringToFile(Utf8(code),"code_check.txt",1) WriteStringToFile(Utf8(0x0d),"code_check.txt",1) WriteStringToFile(Utf8(0x0a),"code_check.txt",1) endif endloop

I'm not a font expert, so probably it is ME who is missing something. I appreciate it if you enlighten me about what it is.

*Please excuse me if this comment is shown in large typeface, which I don't know how to control.

kenlunde commented 6 years ago

This is a known FontForge bug/issue. It simply cannot deal with Adobe-Identity-0 ROS fonts. Search for FontForge in this repository's issues, or see Issue #109 for the best example.

HisashiS commented 6 years ago

Thank you for your quick reply. I'll look into the fontforge issue.

BTW, if you know other good tools (hopefully on windows) for displaying and editing content in font files, will you please let me know? I want to verify myself at the moment (all) the characters included in Noto CJK-jp/JP and other font files.

kenlunde commented 6 years ago

If your sole purpose is to confirm that all Adobe-Japan1-6 kanji are present in the Noto CJK typefaces, I can assure you that they are, in both the full glyph set fonts, and also in the region-specific subset fonts for Japanese. I also suggest that you explore the materials that are provided in the "release" branch of the Adobe-branded Source Han Sans and Source Han Serif typefaces.

HisashiS commented 6 years ago

Thank you again for your information. (I'm still downloading the materials...) May I take that Adobe branded 源ノ明朝/源ノ角ゴシック fonts are equivalent to noto Serif CJK-jp/Sans CJK-jp in terms of what characters are included and their glyphs? I'm also interested in making all the AJ1-6 characters (kanji and non-kanji) available sorely using noto font files, so would like to determine what AJ1-6 characters (e.g. those in latin extented A/B, symbols) are not in Noto JP.

kenlunde commented 6 years ago

Yes, the Adobe-branded Source Han (源ノ明朝/源ノ角ゴシック) typefaces are identical to the Google-branded Noto CJK ones. They differ only by name. I prepared very extensive (aka long) readMe files for the Source Han typefaces, which provide a lot of detail, and the materials that are in the "Resources" directory of the "release" branch should be helpful.

There was no attempt made to include all Adobe-Japan1-6 glyphs, because such typefaces serve a completely different purpose than these open source Pan-CJK typefaces. The only category of Adobe-Japan1-6 glyphs that are included in these open source Pan-CJK typefaces is kanji. The aj16-kanji.txt data file is what you need, along with the AI0-SourceHan{Sans,Serif} ordering file that maps the Unicode-based working glyph names in the second column of the aj16-kanji.txt data file to actual CIDs.

Also note that the Source Han/Noto CJK glyph sets are not a superset or subset of Adobe-Japan1-6, because thousands of glyphs of the latter are not present in the former, and there are even many Japanese glyphs in the former that are not present in the latter (such as additional 濁点-annotated kana and other symbols like U+3031 and U+3032). If you want to map Source Han/Noto CJK glyphs to Adobe-Japan1-6 CIDs, I'm afraid that you're on your own.

HisashiS commented 6 years ago

Thank you for the detailed and pratical guidance. I was unaware that mapping to AJ1-6 is by no means a simple task. Learning from the materials you've let me know about, I'll see what I should/want to do.