n8willis / opentype-shaping-documents

Documentation of OpenType shaping behavior
170 stars 13 forks source link

[Indic] Pre-base matra reordering with ZWJ #73

Closed adrianwong closed 3 years ago

adrianwong commented 5 years ago

In section 4.2 of our spec, we state that a pre-base matra should be moved after a ZWJ/ZWNJ if said joiner/non-joiner follows a "Halant". The OpenType spec agrees with this.

However, the good folk at HarfBuzz have found that statement to be untrue for ZWJs.

In addition to the Devanagari test case in the code comments, here are some other examples that support the fact that the OpenType spec is likely incorrect:

Bengali - Nirmala UI or Noto Sans/Serif Bengali, using a glyph that doesn't possess a half form:

Screenshot from 2019-06-13 13-43-40

Screenshot from 2019-06-13 13-43-53

Oriya - Noto Sans Oriya

Screenshot from 2019-06-13 13-49-55

On top of the above, Uniscribe does something extra that HarfBuzz doesn't, which is to move a pre-base matra after a "Halant, ZWJ" if the glyph following the ZWJ is a half form.

Bengali - Noto Sans/Serif Bengali:

zwj-half-uniscribe

n8willis commented 4 years ago

So, in trying to regularize/clean-up our ZW[N]J treatment, this issue stands out as potentially flagging a spec problem that we ought to document in the errata.

Although the HarfBuzz test case is documented (inline) as a Uniscribe-compatibility thing, the original test case seems to be an Adobe thing (Adobe Devanagari, "wrservices" which the Googles says is a Photoshop library).

@lianghai do we have any guidance on this from Unicode?

lianghai commented 4 years ago

I’m not aware of any prescription from the Unicode Standard regarding ZWJ’s direct effect on reordering.

The Unicode Standard (more or less) specifies that ZWJ is generally used to request the regular conjoining form of a base when the base is together with a virama on the same side of the ZWJ. (Then the formation of the conjoining form naturally affects the sign reordering mechanism.) The standard does not yet have a clear recommendation about whether ZWJ should nonetheless affect reordering when the font is not able to shape the said conjoining form.

Well, the Unicode Standard’s R16 (page 462, Core Spec)—

Screen Shot 2020-04-22 at 21 49 06

—actually seems to be pretty explicit about the condition for reordering: as long as the virama is still explicit, the previous akshara should terminate.

Now, I understand the OTL Indic model allows pseudo half forms to be formed in the half feature, but letting ZWJ affect reordering without even forming a pseudo half form? This doesn’t sound right. I assume this behavior (that <base, virama, ZWJ> acts as a half form in reordering, regardless of the actual formation of a half form) is some overlooked legacy from Indic 1 (where such properties are very much resolved on the level of character sequences, without consulting the font’s shaping rules).

(This issue is also relevant, right? https://github.com/n8willis/opentype-shaping-documents/issues/68)

n8willis commented 3 years ago

So, re-examining this, it seems to me that the OTL script-doc and the Unicode std concur that an explicit virama should stop the matra move and that what HarfBuzz folk discovered is far more in-line with a Uniscribe-compatibility guarantee than it is an assertion that the specification is incorrect.

The "Consonant,halant,ZWJ,halfform,Base" case behaving differently from the "Consonant,halant,ZWJ,Base" case in Uniscribe makes me suspicious that they're just testing separately for half-forms.

I also don't think the fact that the parenthetical comment in R16 mentions ZWNJ but doesn't mention ZWJ can be taken as prescribing that a ZWJ would (or should) trigger the opposite behavior.

Since HarfBuzz does follow Uniscribe on the ZWJ case by default, however, to my mind that means we must note the discrepancy in the main docs rather than leave it to the Uniscribe-compatibility note.

I've put that into a PR; open for feedback. Only other facet of this that I have lingering concern over is the fact that there are known Uniscribe test cases for so few scripts.

n8willis commented 3 years ago

Closed by e514603.