w3c / i18n-glossary

Definitions of terms used in W3C Internationalization documents.
https://w3c.github.io/i18n-glossary/
5 stars 4 forks source link

Errors in definition of orthographic syllable #76

Open NorbertLindenberg opened 5 months ago

NorbertLindenberg commented 5 months ago

The glossary entry for "orthographic syllable" describes it as a "typographic character unit", which in turn is described as a unit "that is indivisible with respect to a particular typographic operation" such as line breaking. Orthographic syllables are in fact not always indivisible. For example, in line breaking, both Batak and Tulu-Tigalari allow line breaks within orthographic syllables (see Line breaking at orthographic syllable boundaries).

The entry also describes an orthographic syllable as a "unit that includes more than one grapheme cluster". That's not correct – simple orthographic syllables may consist of a single grapheme cluster. "क" is both an orthographic syllable and a grapheme cluster. So, change to "one or more grapheme clusters".

The entry also states that "this term is used but not defined in the Unicode Standard". That's no longer true: Both section 6.1 Writing Systems of the Core Specification and the Unicode glossary now define the term.

aphillips commented 5 months ago

Need to add "O" to the alphabetic index at the top of the glossary.

Discussing in the 2024-04-25 teleconference.

r12a commented 3 months ago

hi @NorbertLindenberg. Thanks for your comments.

The glossary entry for "orthographic syllable" describes it as a "typographic character unit", which in turn is described as a unit "that is indivisible with respect to a particular typographic operation" such as line breaking. Orthographic syllables are in fact not always indivisible. For example, in line breaking, both Batak and Tulu-Tigalari allow line breaks within orthographic syllables (see Line breaking at orthographic syllable boundaries).

This is a quote from a CSS spec, so the comment should be made there. But note that the quoted text provides a Thai example where an orthographic syllable (and grapheme cluster!) is split for letter-spacing - which is a script-specific variant, too.

The entry also describes an orthographic syllable as a "unit that includes more than one grapheme cluster". That's not correct – simple orthographic syllables may consist of a single grapheme cluster. "क" is both an orthographic syllable and a grapheme cluster. So, change to "one or more grapheme clusters".

Fix proposed.

The entry also states that "this term is used but not defined in the Unicode Standard". That's no longer true: Both section 6.1 Writing Systems of the Core Specification and the Unicode glossary now define the term.

Fix proposed.

PR is at https://github.com/w3c/i18n-glossary/pull/77