Closed adrianwong closed 3 years ago
I don't think that a real-world half-form would exist as a syllable in a word, but by correctly shaping "Consonant, Halant, ZWJ" you enable users to show a half-form on its own, such as for explanatory purposes. So it's not "language" but it is useful text.
Considering that we should correctly shape a "Consonant, Halant, ZWJ" sequence to display a standalone half-form, what are your thoughts on updating the spec to reflect that?
Also, is the notion of a "half form base consonant" something that even exists?
Do you mean rewording the "Consonant,Halant,ZWJ,Consonant" bulleted example? I think yes, we should explain the C,H,Z standalone case. It might be something to explain earlier, too, when first discussing standalone syllables -- right now, we don't say much about why they're important, and sort of just lump them in with broken syllables.
Do you mean rewording the "Consonant,Halant,ZWJ,Consonant" bulleted example?
Yup!
It might be something to explain earlier, too, when first discussing standalone syllables
An explanation on the importance of standalone syllables would be very handy, in my opinion.
While we're on the topic - I used the term "standalone" in my previous message rather loosely, which is probably contributing to my confusion. Is a "Consonant, Halant, ZWJ" sequence considered a "standalone" syllable in the sense that it does not possess a base consonant? Our regex considers this same sequence a valid consonant syllable.
To be frank, I got that term from HarfBuzz, and I suspect that HarfBuzz got it from the Microsoft Docs, where it (or "stand-alone") also is used to refer to showing marks in isolation and other such things. I suspect that the regular expression was written with more concern for getting the mark-shaping issues correct, since that often involves the dotted-circle / placeholders.
Would it help if we defined a fallback order for the regular expressions? It's kind of implied as things stand now: you try to match "normal" syllables first, then when that doesn't work you figure out what to do.
Would it help if we defined a fallback order for the regular expressions? It's kind of implied as things stand now: you try to match "normal" syllables first, then when that doesn't work you figure out what to do.
I'd already gathered that from the order in which the regex was specified. It's probably implied enough such that an explicit definition would be unnecessary, I think.
Is a "Consonant, Halant, ZWJ" sequence considered a "standalone" syllable in the sense that it does not possess a base consonant? Our regex considers this same sequence a valid consonant syllable.
To be frank, I got that term from HarfBuzz, and I suspect that HarfBuzz got it from the Microsoft Docs, where it (or "stand-alone") also is used to refer to showing marks in isolation and other such things. I suspect that the regular expression was written with more concern for getting the mark-shaping issues correct, since that often involves the dotted-circle / placeholders.
Microsoft Indic specs having this “stand alone cluster” conception seems to be only an attempt to address the need of using NBSP to provide a placeholding base for (contextually encoded) combining marks. It certainly doesn’t address the conceptual relationship between various (contextually encoded) dependent signs.
I put a WIP set of changes up in the pull request above; it adds info to 3.9 (half) and 3.12 (cjct) regarding the ZWJ syllable-break-regex behavior, and some minor enhancements to the tangentially-related issue of what "standalone syllable" is intended to mean. Well, technically it is all about detecting syllable boundaries when invisible acronymic codepoints are present, so maybe it's one big happy changeset.
Regardless, eyes are welcome.
I believe this to be fixed by #119.
Our state machine recognises a
Consonant, Halant, ZWJ
sequence as a valid consonant syllable.Is there such a thing as a consonant syllable that exists in half form?
Our spec states that it's only a
Consonant, Halant, ZWJ, Consonant
sequence that should receive the half form treatment.