Closed satbyy closed 2 years ago
NotoSansCJKsc-Regular.otf’s 'cmap' table is shorter because it has fewer segments. A single continuous range of code points is more efficient for 'cmap' subtable format 4, but subsetting punches lots of little holes in the code point coverage.
$ spot -t cmap cached_fonts/NotoSansCJKsc-Regular.otf | grep segCountX2 | uniq
segCountX2 =1364
$ spot -t cmap GoNotoCJKCore2005.otf | grep segCountX2 | uniq
segCountX2 =10088
Thanks, I am a font hobbyist, so finding my way still :) Do you have any ideas on how to accomplish this task? From what you said, I think I'll try to reduce the number of "holes" in the range of code points covered.
That sounds like a good plan. You could find all the small ranges of ideographs not included in IICore and add support for them. The more gaps you fill, the smaller 'cmap' will be, but the bigger everything else will be. There’s a trade-off and it will require some experimentation to determine what counts as a small enough range.
Thanks David. Seems your idea worked. I blindly added all codepoints in range 0x4E00 - 0x5FFF to the IICore list. The resulting subsetting worked, cmap subtable format 4 ended up with size 63984 < 65535, so it worked.
Conversely, the number of glyphs increased by about 3000 but that's ok. Anyway, the final font (with CJK + everything else) contains 57,474 glyphs. pyftmerge
succeeded. I will probably make a pull request tomorrow.
Thanks for your help!
Latest CI builds create
GoNotoContemporary.ttf
which is a superset of all regional fonts (Asia + Africa + Europe + Americas), excluding historical scripts (and sign-writing). The only missing region is East Asia, aka CJK.It has plenty of room for expansion, as of now it encompasses 11706 codepoints and 34256 glyphs.
So there is space for at least (65K - 34K) ~ 30K glyphs before we max out the 65535 glyph limit.
We also generate GoNotoCJKCore.otf which has about 10K code points and 20K glyphs, so it should all nicely fit-in the same font.
However, the idea fails because cmap table format 4 hits the 65535 limit.
The actual length is about 66600, so just a little over 65K. The aim of this issue/ticket is to figure out a way to overcome the cmap limit.
What is strange is that the original, non-subsetted CJK font itself has cmap length about 40K but the subsetted CJK has 51K cmap length!
A brute-force way I found is to use
--no-layout-closure
while subsetting, but it also removeslocl
feature, so JP or KR cannot use a CN font.