w3c / ttml2

Timed Text Markup Language 2 (TTML2)
https://w3c.github.io/ttml2/
Other
41 stars 16 forks source link

Meaning of 'glyph area descendant' #236

Closed r12a closed 6 years ago

r12a commented 7 years ago

[This was intended to be a quick query by email, but it is developing into a thread, so i'm copying the discussion here for ongoing discussion.]

my original question: I'm working on review comments for the ruby section, but i'm a bit stuck because i don't understand the meaning of the term 'glyph area descendant', and i don't see it described anywhere. Could one of you explain what that is? I think it will be important to understand it in order to assess the ruby section.

r12a commented 7 years ago

[reply by Glenn]

It is referring to the area tree produced in formatting. See XSL-FO for the conceptual model. The basic hierarchy in this local context is:

    block area
        line area
            inline area
                glyph area

A glyph area is an inline area, so this may reduce to the following depending on context:

    block area
        line area
            glyph area

Thus, glyph areas are always descendants of a line area, and may have an intervening inline area or even an inline-block and inline area between the glyph area and the ancestor line area.

Note that XSL-FO classifies a line area as a special type of block area.

r12a commented 7 years ago

[my response]

So is it equivalent to a character? grapheme cluster? font glyph? something else? I'm trying to figure out what we're counting when looking for N.

r12a commented 7 years ago

[response from Nigel]

"Glyph area" itself is a term widely used in the spec, though there is no definition for it, since it is an XSL-FO concept - see https://www.w3.org/TR/2006/REC-xsl11-20061205/#d0e723

I see that for Ruby there are two classes of glyph area, spacing ones and non-spacing ones. Maybe it would also be an improvement to be explicit and say "the number of non-spacing glyph areas that are descendants of these inline areas" where that applies.

So is it equivalent to a character? grapheme cluster? font glyph? something else? I'm trying to figure out what we're counting when looking for N.

Looking at the fact that non-spacing glyphs are omitted from counting, that effectively leaves the other glyphs, or I suppose grapheme cluster would be equivalent, since for counting purposes it is irrelevant whether you include the non-spacing glyph areas or not, and the non-spacing ones would be combined into the same grapheme cluster.

In other words, "a" and "á" would both count as 1 whether you count the grapheme clusters or the spacing glyph areas. If you were to count all the glyph areas then "a" would be 1 and "á" would be 2.

We haven't discussed it but I assume the spec text for ruby is based on the glyph area counts rather than the other concepts you mention, since it is directly concerned with how the glyphs should be laid out.

r12a commented 7 years ago

[from here, discussion is moved to this issue]

10.2.34 tts:rubyAlign https://www.w3.org/TR/2016/WD-ttml2-20161117/#style-attribute-rubyAlign

Let IR and IB be, respectively, the inline areas generated by (1) a ruby text container or ruby text annotation and (2) an associated ruby base container or ruby base. Further, let NR and NB, be, respectively, the number of glyph area descendants of these inline areas.

If the value of this attribute is auto, then if NR equals NB, the semantics of withBase apply; otherwise, if NR is less than NB, the semantics of spaceAround or spaceBetween apply, respectively, according to whether NB is less than or equal to one (1) or is greater than one; otherwise (NR is greater than NB), the semantics of center apply.

I'm not sure how this works for the following cases:

  1. the ruby annotation contains the text 'píng'. How many glyph areas is that? What if the annotation is 'píng píng'?
  2. the ruby annotation contains hindi text with conjuncts. Is the conjunct + vowel sign one glyph area? (Conjuncts may or may not be equivalent to grapheme clusters.)
  3. the ruby annotation contains arabic text with the ligature lam-alif, composed of two font glyphs. Is this one glyph area or two?
  4. the ruby annotation contains any arabic text, which is mostly joined up consonants. How many glyph areas is that?
skynavga commented 7 years ago

The term glyph area is operationally defined by XSL-FO. The terms spacing and non-spacing are terms of art from the Unicode Standard, see, e.g., combining character, also used font specifications, e.g., OpenType Font.

As presently specified, the definition of glyph area is effectively implementation dependent. That may not be the best we can do, however, so let's consider where it is necessary to improve this.

First, in the context of Japanese text, this isn't a practical problem; however, in the general case, I agree it could be, as some of your examples demonstrate.

  1. the ruby annotation contains the text 'píng'. How many glyph areas is that? What if the annotation is 'píng píng'?

4 and 9, respectively

  1. the ruby annotation contains hindi text with conjuncts. Is the conjunct + vowel sign one glyph area? (Conjuncts may or may not be equivalent to grapheme clusters.)

a reasonable implementation would likely map a single grapheme cluster to a single glyph area for the purpose of counting ruby in this context

  1. the ruby annotation contains arabic text with the ligature lam-alif, composed of two font glyphs. Is this one glyph area or two?

a reasonable implementation would likely include all joined glyphs including any non-spacing marks that apply to them into a single glyph area for the purpose of counting ruby in this context; in other words, it would like extremely strange to separate otherwise joined glyphs or component glyphs

  1. the ruby annotation contains any arabic text, which is mostly joined up consonants. How many glyph areas is that?

see above

skynavga commented 7 years ago

Merge editorial clarifications/elaborations from PR #314.

nigelmegitt commented 7 years ago

Reopening and moving to Group's WR action required list - @r12a please could you review the pull request and check that the editorial wording added does answer your question?

css-meeting-bot commented 7 years ago

The Working Group just discussed Meaning of 'glyph area descendant' #236, and agreed to the following resolutions:

The full IRC log of that discussion <nigel> Topic: Meaning of 'glyph area descendant' #236
<nigel> github: https://github.com/w3c/ttml2/issues/236
<nigel> Nigel: Would switching "glyph area descendant" to "descendant glyph area" help?
<nigel> r12a: Yes it would help a bit - now I understand you mean the first descendant that is a glyph area.
<nigel> .. And that you don't mean a descendant of a glyph area.
<nigel> glenn: I could add a note to the definition of glyph area to say that glyph areas have no descendant areas
<nigel> .. I could also add a Note to §10.2.3.7.
<nigel> r12a: Would you consider "which is" before each "descendant"?
<nigel> glenn: I could do that.
<nigel> glenn: I also note that the third instance of "glyph area descendant" in that section doesn't say what it is a descendant of.
<nigel> RESOLUTION: Add "which is a" before "descendant".
<nigel> r12a: I will review the definition of glyph area from the pull request too.
<nigel> r12a: Where the text says to count the glyph areas, what is a glyph area?
<nigel> glenn: In the area tree there are a number of glyph areas, but the question is how is that
<nigel> .. tree constructed. This algorithm doesn't define that, it assumes it has already been constructed.
<nigel> .. It's viewed as being outside the scope.
<nigel> nigel: Any other implementers here who need to construct the area tree?
<nigel> cyril: Yes, we have done it for Japanese.
<nigel> r12a: That's the easy case.
<nigel> glenn: I wanted to avoid saying anything about graphemes here and keeping it on the topic of layout.
<nigel> r12a: It is much less likely that there would be arabic or hindi ruby which would be a more complex case.
<nigel> glenn: There may be other open issues that are affected by this too.
<nigel> pal: How does this compare with what CSS does?
<nigel> r12a: CSS doesn't talk about glyph areas at all.
<nigel> glenn: They leave that to implementation details.
<nigel> r12a: They don't have to count things.
<nigel> glenn: They don't have a withBase alignment.
<nigel> r12a: Correct.
<nigel> flick: Or height in vertical flows.
<nigel> r12a: The context here is withBase that requires counting.
<nigel> glenn: The JLReq does go into this area, talking about sizing effects of 1 vs 2 vs 3 ruby
<nigel> .. associated with a base, which in Japanese typography there are conventions for.
<nigel> .. This doesn't go that far but it does start going further than CSS.
<nigel> cyril: CSS talks about boxes rather than glyph areas, and the alignment is based on boxes.
<nigel> glenn: They're synonyms - XSL uses "area", CSS uses "box" to mean the same thing.
<nigel> cyril: Netflix doesn't use "withBase" so possibly a solution would be to defer this functionality.
<nigel> r12a: We were quite surprised about withBase. My understanding is that this withBase is a
<nigel> .. thing that lambdaCap does so it is in the spec.
<nigel> glenn: It came out of my analysis of what would be needed to support lambdaCap.
<nigel> pal: If there are no lambdaCap files including it, then it would be safer to put it at risk or remove it.
<nigel> nigel: We are working towards CR now so we could mark it at risk now.
<nigel> flick: That would be a signal to people who are interested.
<nigel> RESOLUTION: Mark rubyAlign="withBase" as at risk for CR
<nigel> glenn: If it is an optional feature and our exit criteria require one implementation of each
<nigel> .. optional feature then this is already satisfied.
<nigel> r12a: I checked my dictionary for any examples where this would apply, and could not find any.
<nigel> .. Either in Japanese or with Pinyin. So it was all a bit weird to me.
<nigel> glenn: I'm pretty sure there's language on this in the lambdaCap spec also.
nigelmegitt commented 7 years ago

Since we made some actionable resolutions when we discussed this, fixing the labels...

css-meeting-bot commented 6 years ago

The Working Group just discussed Meaning of 'glyph area descendant' ttml2#236.

The full IRC log of that discussion <nigel> Topic: Meaning of 'glyph area descendant' ttml2#236
<nigel> github: https://github.com/w3c/ttml2/issues/236
<nigel> Nigel: On reviewing this, it appears that there is work to do.
<nigel> Glenn: I missed that - I will tag it for my priority.