n8willis / opentype-shaping-documents

Documentation of OpenType shaping behavior
170 stars 13 forks source link

Mark tagging #46

Closed adrianwong closed 5 years ago

adrianwong commented 5 years ago

Steps 2-4 in section 2.8 state the following:

(2) All remaining marks must be tagged with the same positioning tag as the closest non-mark character the mark has affinity with, so that they move together during the sorting step.

(3) For all marks preceding the base consonant, the mark must be tagged with the same positioning tag as the closest preceding non-mark consonant.

(4) For all marks occurring after the base consonant, the mark must be tagged with the same positioning tag as the closest subsequent consonant.

Does (2) effectively cover (3)?

Also, in (3), what is a "non-mark consonant"?

n8willis commented 5 years ago

Both (3) and (4) are examples specifying (2). Or, to put it another way, 3 and 4 define the meaning of "has affinity with". One for the pre-base case, one for the post-base case.

As for 'non-mark consonant', there are consonants like Reph that are in the Mark category; the wording is meant to make it explicit that the 'tag with the same position as the closest consonant' step doesn't stop at those.

adrianwong commented 5 years ago

Thanks Nathan. If (3) and (4) are elaborations of (2) (instead of (2) being a separate step), there may be cases where certain codepoints remain untagged.

One example that I can think of off the top of my head is the "Ka, Nukta" consonant syllable. "Ka" is tagged as base, but "Nukta" remains untagged due to there being no "closest subsequent consonant".

What HarfBuzz appears to do is perform step (2) over the entire buffer. This covers step (3) as well as cases such as the one I mentioned above.

n8willis commented 5 years ago

But it is doing (4) for post-base consonants just below....

adrianwong commented 5 years ago

Sorry, I don't think I've described this very well.

What I'm saying is if (2) is comprised of (3) and (4), some glyphs may remain untagged. In the example I gave, step (4) is never going to tag the "Nukta" because no post-base consonant exists.

What I think HarfBuzz is doing is first tagging all marks (not just marks preceding the base consonant) with the same tag as the preceding non-mark character, and then proceeding to do step (4). I.e. step (2) is distinct from steps (3) and (4).

n8willis commented 5 years ago

An attempt to address this -- in IG only for the moment -- is in b610ae3. Have a look and, if it works, I can push the related commits for the other Indic2 scripts.

adrianwong commented 5 years ago

Thanks Nathan! A couple of points for discussion:

Unless we want to draw specific attention to the "Nukta", the step:

All "Nukta"s must be tagged with the same positioning tag as the preceding consonant.

should already be covered by:

  1. Initially, all remaining marks should be tagged with the same positioning tag as the closest preceding consonant.

I'm not certain how descriptive we want to be here, but in this paragraph:

In other words, all consonants preceding the base consonant "own" the marks that follow them, while all consonants after the base consonant "own" the marks that come before them.

is it worth mentioning something along the lines of: "in the case of a syllable not having any post-base consonants, the base consonant should "own" all the marks that follow it"? Or is that already implied by:

  1. Initially, all remaining marks should be tagged with the same positioning tag as the closest preceding consonant.
n8willis commented 5 years ago

Unless we want to draw specific attention to the "Nukta", the step:

I did want to call special attention to Nukta. The concern is that Nukta occupies a unique role, since it changes a letter into a different letter, which is why there's the special step to re-order adjacent marks ensuring that the Nukta is always first.

is it worth mentioning something along the lines of: "in the case of a syllable not having any post-base consonants, the base consonant should "own" all the marks that follow it"? Or is that already implied

It was supposed to be implied, yeah. But I think you're right that saying it explicitly is important.

adrianwong commented 5 years ago

I did want to call special attention to Nukta. The concern is that Nukta occupies a unique role, since it changes a letter into a different letter, which is why there's the special step to re-order adjacent marks ensuring that the Nukta is always first.

Ah, right! Makes sense.

n8willis commented 5 years ago

OK. Merged into the other script docs with 3e5ebe6.