notofonts / notobuilder

Python module for building Noto fonts
9 stars 0 forks source link

Add blank control characters to all script files that need it #1

Open davelab6 opened 3 years ago

davelab6 commented 3 years ago

@chrissimpkins noted that Noto Hebrew includes characters for

Raph said "BiDi control characters (that includes 200E and 200F, along with 202A-202E and 2066-2069, are handled entirely in the text shaping and layout engine, and do not need cmap entries in the font."

@simoncozens said "you don't normally have explicit glyphs in the font for control characters, since they are handled higher up the text-processing stack and won't appear in runs to be shaped."

I propose in the next build of Noto Hebrew, we remove them.

davelab6 commented 3 years ago

Behdad weighed in,

So, we don't show those characters. But in some systems they still affect font selection. Ie. might break a shape run if they are not in cmap. I'd check at least Android code and Firefox (ask Jonathan?). Chrome has no problem. Having them with an empty shape is a fine compromise IMO.

Raph confirmed,

Android is good here: https://android.googlesource.com/platform/frameworks/minikin/+/refs/heads/master/libs/minikin/FontCollection.cpp#302

That might serve as a useful reference for the future - all of the code points in that list are safe to leave out of a cmap with respect to breaking itemization on Android. I definitely agree with Behdad that including them as empty glyphs (zero advance) is the safest thing if we are worried about third party text layout.

So, I'll rename this issue to make sure we roll that out correctly across all the Noto fonts.

@marekjez86 noted,

LTR mark (U+200E), and RTL mark (U+200F):

In Noto ALL CJK fonts, LGC (LatinGreekCyrillic) fonts, all Hebrew fonts and all Arabic fonts support it.

  • Arimo
  • Cousine
  • NotoKufiArabic
  • NotoNaskhArabic
  • NotoNaskhArabicUI
  • NotoNastaliqUrdu
  • NotoRashiHebrew
  • NotoSans
  • NotoSans-Italic
  • NotoSansArabic
  • NotoSansArabicUI
  • NotoSansDisplay
  • NotoSansDisplay-Italic
  • NotoSansHebrew
  • NotoSansMono
  • NotoSerif
  • NotoSerif-Italic
  • NotoSerifDisplay
  • NotoSerifDisplay-Italic
  • NotoSerifHebrew
  • Tinos
  • NotoSansCuneiform
  • NotoSansNKo
  • NotoSansPhagsPa
  • NotoSansSyriac
  • NotoSansThaana

zero width nonjoiner (U+200C), zero width joiner (U+200D):

I believe it is a requirement for ALL Noto fonts to support these (I believe noto_lint.py checks for it), but only 112 out of 200 (or so) fonts support it (all CJK, all LGC, all Hebrew, all Arabic, all from south- southeast- Asia,... support it).

So, we need FB GF profile checks for this that are script aware so all scripts for GF do this correctly; and then this issue can track passing those checks across the Noto collection.

chrissimpkins commented 3 years ago

Full set of code points from the Android source that Raph linked:

0x00AD                            // SOFT HYPHEN
0x034F                            // COMBINING GRAPHEME JOINER
0x061C                            // ARABIC LETTER MARK
(0x200C <= c && c <= 0x200F)      // ZERO WIDTH NON-JOINER..RIGHT-TO-LEFT MARK
(0x202A <= c && c <= 0x202E)      // LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE
(0x2066 <= c && c <= 0x2069)      // LEFT-TO-RIGHT ISOLATE..POP DIRECTIONAL ISOLATE
0xFEFF                            // BYTE ORDER MARK
marekjez86 commented 3 years ago

ALL the characters/glyphs present in Noto were specified as REQUIRED for delivery before we would approve them. I will NOT delete anything unless I understand that this requirement is not a requirement any longer. Especially, I don't want to touch Indics (the rule here is "if you break it for any languages in India constitution, you will need to train a Google employee to deal with BIS to allow sales of Android phones [if there's a BIS issue :-)]")