n8willis / opentype-shaping-documents

Documentation of OpenType shaping behavior
170 stars 13 forks source link

"Consonant, Halant, ZWJ" #43

Closed adrianwong closed 3 years ago

adrianwong commented 5 years ago

Our state machine recognises a Consonant, Halant, ZWJ sequence as a valid consonant syllable.

Is there such a thing as a consonant syllable that exists in half form?

Our spec states that it's only a Consonant, Halant, ZWJ, Consonant sequence that should receive the half form treatment.

n8willis commented 5 years ago

I don't think that a real-world half-form would exist as a syllable in a word, but by correctly shaping "Consonant, Halant, ZWJ" you enable users to show a half-form on its own, such as for explanatory purposes. So it's not "language" but it is useful text.

adrianwong commented 5 years ago

Considering that we should correctly shape a "Consonant, Halant, ZWJ" sequence to display a standalone half-form, what are your thoughts on updating the spec to reflect that?

Also, is the notion of a "half form base consonant" something that even exists?

n8willis commented 5 years ago

Do you mean rewording the "Consonant,Halant,ZWJ,Consonant" bulleted example? I think yes, we should explain the C,H,Z standalone case. It might be something to explain earlier, too, when first discussing standalone syllables -- right now, we don't say much about why they're important, and sort of just lump them in with broken syllables.

adrianwong commented 5 years ago

Do you mean rewording the "Consonant,Halant,ZWJ,Consonant" bulleted example?

Yup!

It might be something to explain earlier, too, when first discussing standalone syllables

An explanation on the importance of standalone syllables would be very handy, in my opinion.

While we're on the topic - I used the term "standalone" in my previous message rather loosely, which is probably contributing to my confusion. Is a "Consonant, Halant, ZWJ" sequence considered a "standalone" syllable in the sense that it does not possess a base consonant? Our regex considers this same sequence a valid consonant syllable.

n8willis commented 5 years ago

To be frank, I got that term from HarfBuzz, and I suspect that HarfBuzz got it from the Microsoft Docs, where it (or "stand-alone") also is used to refer to showing marks in isolation and other such things. I suspect that the regular expression was written with more concern for getting the mark-shaping issues correct, since that often involves the dotted-circle / placeholders.

Would it help if we defined a fallback order for the regular expressions? It's kind of implied as things stand now: you try to match "normal" syllables first, then when that doesn't work you figure out what to do.

adrianwong commented 5 years ago

Would it help if we defined a fallback order for the regular expressions? It's kind of implied as things stand now: you try to match "normal" syllables first, then when that doesn't work you figure out what to do.

I'd already gathered that from the order in which the regex was specified. It's probably implied enough such that an explicit definition would be unnecessary, I think.

lianghai commented 5 years ago

Is a "Consonant, Halant, ZWJ" sequence considered a "standalone" syllable in the sense that it does not possess a base consonant? Our regex considers this same sequence a valid consonant syllable.

To be frank, I got that term from HarfBuzz, and I suspect that HarfBuzz got it from the Microsoft Docs, where it (or "stand-alone") also is used to refer to showing marks in isolation and other such things. I suspect that the regular expression was written with more concern for getting the mark-shaping issues correct, since that often involves the dotted-circle / placeholders.

Microsoft Indic specs having this “stand alone cluster” conception seems to be only an attempt to address the need of using NBSP to provide a placeholding base for (contextually encoded) combining marks. It certainly doesn’t address the conceptual relationship between various (contextually encoded) dependent signs.

n8willis commented 3 years ago

I put a WIP set of changes up in the pull request above; it adds info to 3.9 (half) and 3.12 (cjct) regarding the ZWJ syllable-break-regex behavior, and some minor enhancements to the tangentially-related issue of what "standalone syllable" is intended to mean. Well, technically it is all about detecting syllable boundaries when invisible acronymic codepoints are present, so maybe it's one big happy changeset.

Regardless, eyes are welcome.

n8willis commented 3 years ago

I believe this to be fixed by #119.