Closed adrianwong closed 5 years ago
Both (3) and (4) are examples specifying (2). Or, to put it another way, 3 and 4 define the meaning of "has affinity with". One for the pre-base case, one for the post-base case.
As for 'non-mark consonant', there are consonants like Reph that are in the Mark category; the wording is meant to make it explicit that the 'tag with the same position as the closest consonant' step doesn't stop at those.
Thanks Nathan. If (3) and (4) are elaborations of (2) (instead of (2) being a separate step), there may be cases where certain codepoints remain untagged.
One example that I can think of off the top of my head is the "Ka, Nukta" consonant syllable. "Ka" is tagged as base, but "Nukta" remains untagged due to there being no "closest subsequent consonant".
What HarfBuzz appears to do is perform step (2) over the entire buffer. This covers step (3) as well as cases such as the one I mentioned above.
But it is doing (4) for post-base consonants just below....
Sorry, I don't think I've described this very well.
What I'm saying is if (2) is comprised of (3) and (4), some glyphs may remain untagged. In the example I gave, step (4) is never going to tag the "Nukta" because no post-base consonant exists.
What I think HarfBuzz is doing is first tagging all marks (not just marks preceding the base consonant) with the same tag as the preceding non-mark character, and then proceeding to do step (4). I.e. step (2) is distinct from steps (3) and (4).
An attempt to address this -- in IG only for the moment -- is in b610ae3. Have a look and, if it works, I can push the related commits for the other Indic2 scripts.
Thanks Nathan! A couple of points for discussion:
Unless we want to draw specific attention to the "Nukta", the step:
All "Nukta"s must be tagged with the same positioning tag as the preceding consonant.
should already be covered by:
- Initially, all remaining marks should be tagged with the same positioning tag as the closest preceding consonant.
I'm not certain how descriptive we want to be here, but in this paragraph:
In other words, all consonants preceding the base consonant "own" the marks that follow them, while all consonants after the base consonant "own" the marks that come before them.
is it worth mentioning something along the lines of: "in the case of a syllable not having any post-base consonants, the base consonant should "own" all the marks that follow it"? Or is that already implied by:
- Initially, all remaining marks should be tagged with the same positioning tag as the closest preceding consonant.
Unless we want to draw specific attention to the "Nukta", the step:
I did want to call special attention to Nukta. The concern is that Nukta occupies a unique role, since it changes a letter into a different letter, which is why there's the special step to re-order adjacent marks ensuring that the Nukta is always first.
is it worth mentioning something along the lines of: "in the case of a syllable not having any post-base consonants, the base consonant should "own" all the marks that follow it"? Or is that already implied
It was supposed to be implied, yeah. But I think you're right that saying it explicitly is important.
I did want to call special attention to Nukta. The concern is that Nukta occupies a unique role, since it changes a letter into a different letter, which is why there's the special step to re-order adjacent marks ensuring that the Nukta is always first.
Ah, right! Makes sense.
OK. Merged into the other script docs with 3e5ebe6.
Steps 2-4 in section 2.8 state the following:
Does (2) effectively cover (3)?
Also, in (3), what is a "non-mark consonant"?