w3c / ttml2

Timed Text Markup Language 2 (TTML2)
https://w3c.github.io/ttml2/
Other
41 stars 16 forks source link

Upright orientation involves more than just glyph orientation #281

Closed r12a closed 6 years ago

r12a commented 7 years ago

10.2.46 tts:textOrientation http://w3c.github.io/ttml2/spec/ttml2.html#style-attribute-textOrientation

If the value of this attribute is upright, then, in vertical writing modes, glyphs from horizontal scripts are set upright, i.e., using their nominal orientation in horizontal text, while glyphs from vertical scripts are not affected. In addition, for purposes of bidirectional processing, this value causes all affected characters to be treated as strong left-to-right, i.e., to be treated as if a tts:direction of ltr and tts:unicodeOverride of override were applied.

There are additional things to bear in mind here. The treatment as strong left-to-right will put arabic script characters in the correct visual order down the vertical line, but it should also be said that the characters should use the isolated form. Furthermore, the rotations should be applied to groups of glyphs that constitute a grapheme cluster, so that for example indic syllables remain together. (Although some consonant clusters are not fully encompassed by grapheme clusters even, in scripts like devanagari.)

I think there's some wording to this effect in the CSS spec that you could look at.

css-meeting-bot commented 6 years ago

The Working Group just discussed Upright orientation involves more than just glyph orientation ttml2#281, and agreed to the following resolutions:

The full IRC log of that discussion <nigel> Topic: Upright orientation involves more than just glyph orientation ttml2#281
<nigel> github: https://github.com/w3c/ttml2/issues/281
<nigel> Nigel: It looks like Richard's proposals are certainly missing right now and would make a
<nigel> .. big difference to readability for the scripts he mentions.
<nigel> Glenn: You would never set arabic in upright form as he describes there... Actually there's
<nigel> .. a language in one of the Maldive islands that uses arabic letters only in their isolated form
<nigel> .. to write their language.
<nigel> Glenn: Right now we don't say the things he proposes but that doesn't mean that
<nigel> .. processors couldn't do it.
<nigel> Nigel: We talk about individual glyphs, which is the problem.
<nigel> Glenn: I think the language came from an earlier version of Writing Modes, and they've
<nigel> .. refined those but we haven't updated ours accordingly.
<nigel> Nigel: Is the task therefore to update TTML2 to match the more up to date CSS language?
<nigel> Glenn: I've avoided using grapheme cluster so far but there were other questions about Ruby
<nigel> .. that mention them, so I think we may not be able to avoid them. It's a really complex
<nigel> .. concept and very few people understand it. I'm not sure of the value of introducing it
<nigel> .. here. I prefer to leave it under-specified and let implementations do the right thing.
<nigel> .. We can resolve it editorially with notes later.
<nigel> Nigel: We refer to glyphs and don't allow for groups of glyphs.
<nigel> Glenn: We do allow for that.
<nigel> Nigel: How do we do that?
<nigel> Glenn: "glyphs from horizontal scripts" is in my mind ambiguous - scripts do not have
<nigel> .. glyphs, they have characters, and glyphs are the result of a complex mapping from characters
<nigel> .. to glyphs.
<nigel> Nigel: We already defined "glyph area" - why don't we just substitute "glyph" for "glyph area"?
<nigel> Glenn: That would work for me. That's in fact exactly what would happen.
<nigel> .. I anticipate Richard will come back with some questions.
<nigel> RESOLUTION: Change "glyph" to "glyph area" in the quoted text.
<nigel> Cyril: I will prepare that pull request.
r12a commented 6 years ago

[i passed this by Fantasai for a sense check before posting, and she said LGTM]

I traced the links again, but still find the definition of 'glyph area' very vague. Given https://github.com/w3c/ttml2/issues/236#issuecomment-275459765 i see it as meaning a grapheme cluster such as é (when decomposed) or a tamil conjunct, but also representing a whole word in joined up Arabic. The latter is odd.

What if the arabic word contains one or more letters that don't join on the left side, eg. التدويل?

What about northern indic scripts such as devanagari, where a top line joins most of the characters in a word, in a similar way to the Arabic joining, eg. अंतर्राष्ट्रीयकरण ?

The latter example is relevant here. Although one could argue that upright arabic text is rare, upright devanagari text is less so (see for example https://github.com/w3c/type-samples/issues/52). The important point in the devanagari example just pointed to is that the word is not simply split at letter boundaries - it is split at syllable boundaries (which in that particular case coincide with grapheme cluster boundaries).

I think it may be time to define a glyph area as corresponding to a 'typography character unit' as defined at https://drafts.csswg.org/css-text-3/#typographic-character-unit – which equates to a grapheme cluster generally, though perhaps covers more for some complex conjuncts (of which there are many in indic scripts).

Btw:

Glenn: You would never set arabic in upright form as he describes there...

It's likely that this is not at all common (we are trying to ascertain whether it might be more common for Uighur), but see https://w3c.github.io/alreq/#h_vertical_upright for a picture showing it (and following the CSS rules).

Glenn: Actually there's a language in one of the Maldive islands that uses arabic letters only in their isolated form to write their language.

You are perhaps referring to Dhivehi written in the Thaana script. For more information see http://r12a.github.io/scripts/thaana/

cconcolato commented 6 years ago

For the record, there is one occurence of grapheme cluster in TTML2 ED (as of today, commit 492604f), in the <emphasis-style> definition

cconcolato commented 6 years ago

I traced the links again, but still find the definition of 'glyph area' very vague.

For the record, XSL 1.1 defines glyph area as follows:

A glyph-area is a special kind of inline-area which has no child areas, and has a single glyph image as its content.

glyph image is not defined.

XSL 1.1 also says:

The most common inline-area is a glyph-area, which contains the representation for a character (or characters) in a particular font.

nigelmegitt commented 6 years ago

[Meeting 2018-02-15] The WG has resolved not to expand the definition of "glyph area" further, nor to adopt "grapheme cluster" or "typographic character unit", but notes that all three concepts may be coincident from an implementation perspective. The group is willing to revisit this later.