[Indic] Indic1 double Halant

n8willis / opentype-shaping-documents

Documentation of OpenType shaping behavior

170 stars 13 forks source link

[Indic] Indic1 double Halant #64

Closed adrianwong closed 2 years ago

adrianwong commented 5 years ago

This one's a fun one, and concerns the reordering of post-base "Halant, Consonant" to "Consonant, Halant" sequences for compatibility with Indic1 shaping.

Some scripts do not perform this reordering if a Halant already exists after the final post-base consonant. HarfBuzz has discovered that Kannada is one of them.

lianghai commented 5 years ago

So, considering an abstract script with a consonant ka that can form a post-base sign:

char <ka, virama, ka> => char/glyph <ka, ka, virama> => glyph <ka, kasign>
char <ka, virama, ka, virama> => char/glyph <ka, ka, virama, virama> => glyph <ka, kasign, virama>
- but not for Kannada, so => glyph <ka, virama, kasign>

This feels like a result from confusion between virama reordering (for Indic1) and vowel/consonant sign reordering (for Telugu, Kannada, etc). For one, this is certainly not Kannada-specific, as Telugu has pretty much the same expectation. It’s just these days fonts generally use GSUB tricks like IgnoreMarks or MarkAttachmentType to allow a virama to skip conjoining signs in contexts like <ka, kasign, virama>.

n8willis commented 4 years ago

I'm a tad reluctant to say we're specifying it if we don't know for sure which scripts to say it applies to. @lianghai , would this be shared with any of the other South Brahmic scripts besides Telugu?

n8willis commented 4 years ago

(Follow-up: not meaning to say we'd skip this, just saying I think it bears at least a little more poking around)

lianghai commented 4 years ago

This Indic 1 behavior for Kannada, …, is just a bad (if not wrong) design, so if we want to know the situation for other scripts, we better build minimal fonts and test them all. What do you is a maximal suspected group of OTL script tags/shapers?

n8willis commented 4 years ago

For documentary purposes, I only want to be clear whether we should say that this is a. a Kannada thing (to be followed) b. a Kannada & Telugu thing (to be followed) c. a thing for multiple specific scripts (eg, Kannada, Telugu, Tamil, & Malayalam) (also to be followed) d. a mystery to be aware of

The HarfBuzz comments look like it's only been observed in Kannada, but it's vague, which is a little strange given how long it's been known. I hate to be one of Those Guys, but I don't have any Windows machines, and here in isolation I don't have access to one either, so I can't test.

If there are plausible strings to test in Uniscribe/DirectWrite and it wouldn't take too long, that would be useful. And if it was discovered that Uniscribe does the same thing for Telugu and/or other scripts, we could pass that info along to HarfBuzz as well.

n8willis commented 4 years ago

** the "to be followed" distinction is really about whether it's documented as "how you ought to handle Indic1" or is put into the "Uniscribe compatibility notes".

n8willis commented 3 years ago

@lianghai I was revisiting some Telugu; do you have any further thoughts on this?

n8willis commented 3 years ago

Without further evidence that this ought to be addressed to Telugu as well, there's now a patch in #140 to further explain the background and designate the entire thing as purely a Uniscribe bug-for-bug compatibility target.