Open dscorbett opened 3 years ago
Thanks for at-including me on this, @dscorbett. Whenever I see a bug I always wonder whether and how it could have been caught automatically. Are the Noto fonts tested by fontbakery? If so, these bugs could be "formally" caught by passing the handy test string you made to a collision check done by @simoncozens' collidoscope. Simon and I are both using collidoscope under fontbakery on our own projects but I don't think there is any "official" use of collidoscope in fontbakery yet.
On the substance of the bugs themselves, I'm not sure how important these bugs are, because I'm not sure how common it is to expect these special cases to be handled in fonts for non-Biblical Hebrew. (I'm guessing the Noto fonts only strive to support non-Biblical Hebrew.)
These special cases do not include trope (aka cantillation or accent) marks. Thus one could say they are not Biblical Hebrew, according to one definition of Biblical Hebrew. These special cases contain only vowel marks.
But the way that these special cases use the vowel marks is specific to a Biblical situation known as ketiv/qere. A ketiv/qere situation occurs when what is traditionally read aloud for a word (its qere) is incompatible with the consonants that are traditionally written for that word (its ketiv). Ketiv/qere situations are notated in a variety of ways, but in the cases we're concerned with here, the way they are notated is by putting the vowel marks of the (implied) qere on the consonants of the ketiv. So that's how you end up with the "illegal" situation of two vowels on a single consonant (lamed): in the (implied) qere, there's a yod consonant after the lamed that renders the situation at least sensible if not "legal."
While fonts for non-Biblical Hebrew should probably have some degree of support for vowel marks (though even that is debatable), it may be unreasonable to expect such fonts to handle this weird, Biblically-specific use of vowel marks.
@dscorbett : thank you -- keep these coming :-) no matter if they are non-Biblical Hebrew or Biblical Hebrew ... Sooner or later I'd like Noto to support Biblical Hebrew @bdenckla : we are using fontbakery when testing the newly built fonts but not using @simoncozens' collidoscope. Thanks for pointing this out. Good to know. Sooner or later we might want to start using it
Note that the example string is given in normalized (e.g. NFD) order, which is an order not corresponding to the desired RTL visual mark (vowel) order. Some shapers (notably HarfBuzz) will transiently re-order the code points of this string to the desired order before the font ever sees it. But other shapers (notably MS DirectWrite (descended from MS UniScribe)) will leave the string alone.
Thus, what behavior one should expect from the example string depends on the shaper in use.
Thus, this bug could be improved in the following ways:
Making some of my suggestions above concrete, here is the original example string and 2 other versions of it, rendered in Taamey Frank CLM under the "triple" of DirectWrite/Windows/MS Word: The two other versions of the example string shown above are:
Note that beyond the basic issues of visual order and collision, there are more subtle micro-positioning issues at play here, not all of which are well-handled by Taamey Frank CLM (TFC). In particular, TFC lacks rules to handle the sheva cases the same as the hiriq cases, and lacks rules to handle the CGJ cases the same as the non-CGJ cases.
There is another overlapping issue when it comes to the letter kuf sofit with a vowel underneath it (i.e. ךָּ, which is decently common), but only in the NotoSerifHebrew Version.
Just to clarify what @moshfrid reports above (assuming I understand it), it is referring to the code point sequence:
Or, hopefully equivalently, it refers to the code point sequence with the two "HEBREW POINT" elements reversed (this would be the normalized form (order), which most people find unintuitive but apparently not the authors of the Unicode standard, unless the combining classes they chose were not what they intended).
And, to further clarify, the appearance is expected to be something like this:
(Tastes may differ as to the details, for instance tastes may differ as to whether the qamats should float upward a bit compared to this example, but of course all agree that collision (overlap) between the qamats and the dagesh should not occur!)
This construction appears about a hundred times in the Hebrew Bible; I don't have the exact number. An early example is the word אַרְאֶֽךָּ׃, shown as an image below:
This example is from בראשית יב,א (Genesis 12:1).
The thing is: Noto basically already supports Biblical Hebrew. Cantillation marks, unique letters such as the nun hafuḵa (U+5C6) have practically no use in regular Hebrew. It'd be a shame to pass up this opportunity, especially since there is virtually no free Hebrew serif font that supports Biblical Hebrew and has multiple weights (which would be needed to highlight individual words or phrases in a Biblical text, for example). Of course, there are a couple of super-specific exceptions in Biblical Hebrew (https://www.win.tue.nl/~aeb/natlang/hebrew/hebrew_bible.html), but none of these are nearly as common as the issue a hand.
Fonts
NotoRashiHebrew-Regular.otf NotoSansHebrew-Regular.otf NotoSerifHebrew-Regular.otf
Where the fonts came from, and when
Site: https://github.com/googlefonts/noto-fonts/blob/81b283b55b3e5b80ec0e410d4b246d3573e1c7de/unhinted/otf/NotoRashiHebrew/NotoRashiHebrew-Regular.otf Site: https://github.com/googlefonts/noto-fonts/blob/81b283b55b3e5b80ec0e410d4b246d3573e1c7de/unhinted/otf/NotoSansHebrew/NotoSansHebrew-Regular.otf Site: https://github.com/googlefonts/noto-fonts/blob/81b283b55b3e5b80ec0e410d4b246d3573e1c7de/unhinted/otf/NotoSerifHebrew/NotoSerifHebrew-Regular.otf Date: 2021-04-15
Font versions
Noto Rashi Hebrew: Version 1.002 Noto Sans Hebrew: Version 3.000 Noto Serif Hebrew: Version 2.000
Issue
Various inflections of “Jerusalem” in the Tanakh include two vowel signs side by side after the lamed, but in Noto, the vowel signs overlap. See Firefox bug 662055 or ask @bdenckla for more information.
Character data
לְַמלְָמלִַםלִָם U+05DC HEBREW LETTER LAMED U+05B0 HEBREW POINT SHEVA U+05B7 HEBREW POINT PATAH U+05DE HEBREW LETTER MEM U+05DC HEBREW LETTER LAMED U+05B0 HEBREW POINT SHEVA U+05B8 HEBREW POINT QAMATS U+05DE HEBREW LETTER MEM U+05DC HEBREW LETTER LAMED U+05B4 HEBREW POINT HIRIQ U+05B7 HEBREW POINT PATAH U+05DD HEBREW LETTER FINAL MEM U+05DC HEBREW LETTER LAMED U+05B4 HEBREW POINT HIRIQ U+05B8 HEBREW POINT QAMATS U+05DD HEBREW LETTER FINAL MEM
Screenshots