w3c / ttml1

Timed Text Markup Language 1 (TTML1)
http://w3c.github.io/ttml1/
Other
13 stars 12 forks source link

Ambiguous definition for determination of descendant region identifier. #194

Open plehegar opened 8 years ago

plehegar commented 8 years ago

Step 3 of the [association region] procedure in Section 9.3.2 is ambiguous if multiple descendants are associated with a region (possibly distinct).

Suggest changing from:

"if the element contains a descendant element that specifies a region attribute, then the element is associated with the region referenced by that attribute;"

to

"if the element contains a descendant element that specifies a region attribute, then the element is associated with the first region referenced by that attribute using a breadth-first pre-traversal search of descendant elements;"

(raised by Glenn Adams on 2014-08-27) From tracker issue http://www.w3.org/AudioVideo/TT/tracker/issues/341

skynavga commented 8 years ago

Before I can fully resolve my understanding of the intended semantics, I believe we need to determine which of the following semantics apply when @region is specified on an element A and one of its descendant elements D and the two specified regions, R(A) and R(D), are different:

  1. R(D) (temporarily) OVERRIDES the region on A, forcing A to be selected into R(D); in this case, D is not selected when evaluating R(A), but A is selected when evaluating R(D): thus D is selected for R(D), but not selected for R(A)
  2. R(A) (temporarily) OVERRIDES the region on D, forcing D to be selected into R(A); in this case, D is selected when evaluating R(A), but A is not selected when evaluating R(D): thus D is selected for R(A), but not selected for R(D)
  3. R(D) DOES NOT OVERRIDE the region on A; in this case, D is not selected when evaluating R(A), and A is not selected when evaluating R(D): thus D is never selected
  4. R(A) DOES NOT OVERRIDE the region on D; in this case, A is not selected when evaluating R(D), and D is not selected when evaluating R(A): thus D is never selected

Of these four options, I believe option 1, R(D) overrides R(A), is the most intuitive and useful choice.

nigelmegitt commented 8 years ago

What we have now are both 3 and 4.

palemieux commented 7 years ago

The [associate region] algorithm can be disambiguated by tweaking it to asking the question [is the element associated with region R?]. Specifically,

A Content element is associated with a region R if any of the following ordered assertions is true, where the first assertion satisfied is used and remaining assertions are skipped

(1) if the element specifies a region attribute that references region R;
(2) if  region attribute of the first ancestor of the element that specifies a region attribute references region R;
(3) if the element contains a descendant element that specifies a region attribute that references region R;
(4) if a default region was implied (due to the absence of any region element) and R is the default region;

This means that an element that does not specify a region and whose parents do not specify a parent, will be flowed into all regions specified by its children.

nigelmegitt commented 7 years ago

@palemieux From that list it looks as though the same element containing character content might find itself associated with multiple regions. Specifically the third point about containing a descendant element that specifies a region attribute is not mutually exclusive from the first two that look at the element or its ancestors. While this is probably reasonable for body and div it is less clearly a good idea for p and span elements which may contain character content.

For example:

<region xml:id="r1"/>
<region xml:id="r2"/>
...
<body>
  <div>
    <p>Some text
      <span region="r1">in r1</span>
      <span region="r2">in r2</span>
    </p>
  </div>
</body>

Would you expect "Some text" to appear in both regions? Or should this be non-conformant TTML?

skynavga commented 7 years ago

On Wed, Dec 21, 2016 at 5:32 AM, Nigel Megitt notifications@github.com wrote:

@palemieux https://github.com/palemieux From that list it looks as though the same element containing character content might find itself associated with multiple regions. Specifically the third point about containing a descendant element that specifies a region attribute is not mutually exclusive from the first two that look at the element or its ancestors. While this is probably reasonable for body and div it is less clearly a good idea for p and span elements which may contain character content.

For example:

...

Some text in r1 in r2

Would you expect "Some text" to appear in both regions? Or should this be non-conformant TTML?

both regions and conformant

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml1/issues/194#issuecomment-268512070, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb3eDHdcrdH9SYKAnPxJGPNetSQg2ks5rKRxpgaJpZM4Gb8MY .

palemieux commented 7 years ago

Would you expect "Some text" to appear in both regions? Or should this be non-conformant TTML?

I would not expect "some text" to appear in any region according to the algorithm specified at https://github.com/w3c/ttml1/issues/194#issuecomment-257047647 since the anonymous span containing "some text" has neither parents nor children that specify @region.

skynavga commented 7 years ago

On Wed, Dec 21, 2016 at 3:35 PM, Pierre-Anthony Lemieux < notifications@github.com> wrote:

Would you expect "Some text" to appear in both regions? Or should this be non-conformant TTML?

I would not expect "some text" to appear in any region according to the algorithm specified at #194 (comment) https://github.com/w3c/ttml1/issues/194#issuecomment-257047647 since the anonymous span containing "some text" has neither parents nor children that specify @region https://github.com/region.

It is by my choice of option 1 in https://github.com/w3c/ttml1/issues/194#issuecomment-224911805, where, even though no R(A) is specified, all R(A)s are effectively set (temporarily) to R(D).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml1/issues/194#issuecomment-268661659, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb6ftzOFUFCO6M5BYAV0a8WNJOw6yks5rKamwgaJpZM4Gb8MY .

palemieux commented 7 years ago

https://github.com/w3c/ttml1/issues/194#issuecomment-224911805 does not apply in the example at https://github.com/w3c/ttml1/issues/194#issuecomment-268512070 since neither descendants nor parents of "Some text" specify @region.

skynavga commented 7 years ago

On Wed, Dec 21, 2016 at 4:12 PM, Pierre-Anthony Lemieux < notifications@github.com> wrote:

194 (comment)

https://github.com/w3c/ttml1/issues/194#issuecomment-224911805 does not apply in the example at #194 (comment) https://github.com/w3c/ttml1/issues/194#issuecomment-268512070 since neither descendants nor parents of "Some text" specify @region https://github.com/region.

in your comment https://github.com/w3c/ttml1/issues/194#issuecomment-257047647, you appear to agree with my thinking, when you say

This means that an element that does not specify a region and whose parents

do not specify a parent, will be flowed into all regions specified by its children.

to which my response is: we need to ensure the algorithm produces this result

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml1/issues/194#issuecomment-268668741, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb65BCXgP2LC6cABb8jxh_37z9RSvks5rKbJWgaJpZM4Gb8MY .

palemieux commented 7 years ago

This means that an element that does not specify a region and whose parents do not specify a parent, will be flowed into all regions specified by its children.

Yes... but in the example at https://github.com/w3c/ttml1/issues/194#issuecomment-268512070 , the anonymous span "Some Text" does not have children!

skynavga commented 7 years ago

On Wed, Dec 21, 2016 at 4:34 PM, Pierre-Anthony Lemieux < notifications@github.com> wrote:

This means that an element that does not specify a region and whose parents do not specify a parent, will be flowed into all regions specified by its children.

Yes... but in the example at #194 (comment) https://github.com/w3c/ttml1/issues/194#issuecomment-268512070 , the anonymous span "Some Text" does not have children!

ah, ok, but, in that case, your suggestion doesn't go far enough (or is overly restrictive);

it may be that our current algorithm (and its proposed updates) is going about this the wrong way; perhaps a better way to qualify region association is with the evaluation of all content in the context of a given region, and then state which of that content is excluded from the region, and thus leaving all remaining content included in (selected into) the region;

so, e.g.,

  1. for every region R, evaluate all content in body for inclusion in R, such that
  2. if an element E is explicitly associated with another region R', then E and its descendants are not selected into R'

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml1/issues/194#issuecomment-268673902, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCbzrSlznvW7RF99O1X13ZXokz5PQVks5rKbehgaJpZM4Gb8MY .

palemieux commented 7 years ago
  1. if an element E is explicitly associated with another region R', then E and its descendants are not selected into R'

Does this mean that if no content element specifies @region attribute then all content elements are flowed in all regions?

I am not convinced the algorithm above is too restrictive, especially if another cannot be found.

skynavga commented 7 years ago

On Wed, Dec 21, 2016 at 5:38 PM, Pierre-Anthony Lemieux < notifications@github.com> wrote:

  1. if an element E is explicitly associated with another region R', then E and its descendants are not selected into R'

Does this mean that if no content element specifies @region https://github.com/region attribute then all content elements are flowed in all regions?

hmm, right

I am not convinced the algorithm above is too restrictive, especially if another cannot be found.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/w3c/ttml1/issues/194#issuecomment-268687376, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXCb6zAlzvAISjJMOYMECENzrp47-Xbks5rKcaNgaJpZM4Gb8MY .

nigelmegitt commented 7 years ago

Sorry to be controversial, but perhaps we should begin with the semantic that seems most appropriate and design the algorithm around that rather than the other way around...

I would expect all character content and <br> to be in exactly one region, and for all ancestor elements to be homed within that region to set up the styling inheritance chain. So body and div could be in multiple regions, but span and br would not. Happy to put character content that is a child of a p in an anonymous span for this purpose. The only question I have not answered then is what we do with p elements.

palemieux commented 7 years ago

@nigelmegitt I believe the algorithm at https://github.com/w3c/ttml1/issues/194#issuecomment-257047647 achieves these objectives while remaining close to the current prose: elements are associated with a region if a descendant or parent specify @region.

nigelmegitt commented 7 years ago

What should happen if an element (let's say div) specifies a different region to one or more of its ancestors/descendants (let's say a p) with that algorithm? Would you put the div into both the region that it specifies and all the regions specified by its descendants?

(reminder: current specification is to prune the p altogether in this case)

palemieux commented 7 years ago

What should happen if an element (let's say div) specifies a different region to one or more of its ancestors/descendants (let's say a p) with that algorithm?

Given:

<div region="r1">
<p region="r2">
</p>
</div>

div is associated with region r1 and p is associated with region r2 per rule (1). As a result, the p is never displayed since is is pruned when region r1 is processed and div is pruned with region r2 is processed. I think this is clearly the intent of TTML1.

nigelmegitt commented 7 years ago

It is certainly the effect of the algorithm in TTML1 but when we discussed it before this seemed surprising to @skynavga so I think it is open to debate if it is the correct/desired behaviour. I don't have a problem with it myself, since I think that it is deterministic and well defined, and effectively makes an 'error' safe by conservatively not showing content.

In your algorithm above @palemieux however I am not clear if the same behaviour would occur?

nigelmegitt commented 7 years ago

PS I also seem to recall that @tairt questioned whether this (pruning) is the desired behaviour in the past too.

palemieux commented 7 years ago

before this seemed surprising to @skynavga

Where? I thought @skynavga surprise came with the following example:

<div>
<p>
Foo
<span region="r1">
Blah
</span>
<span region="r2">
Doh
</span>
</p>
</div>

where "Foo" would never be displayed since none of its ancestors or descendants have @region specified, which makes sense to me.

In other words, authors need to specify @region on parents for their children to be flowed in regions.

nigelmegitt commented 7 years ago

@palemieux I was referring to the originally filed issue and the first comment above.

nigelmegitt commented 7 years ago

[Meeting 2017-01-13] Agreed that content elements may be associated with multiple regions, so [construct intermediate document] step 2 needs to test if a content element is associated with a specific region R and [associate document] needs to be able to state whether or not the element is in fact associated with that region R.

Noted also that if the region attribute were IDREFS then multiple regions could be specified for content elements but this is currently prohibited. No current request to make this change.

@skynavga wishes to review this and test it against the TTT implementation in case it breaks anything.

skynavga commented 6 years ago

Regarding https://github.com/w3c/ttml1/issues/194#issuecomment-268512070, I just tested this content on TTPE and it (1) does not produce any error/warning; and (2) the text "Some text" does not appear, which concurs with @palemieux's https://github.com/w3c/ttml1/issues/194#issuecomment-268661659 but not my https://github.com/w3c/ttml1/issues/194#issuecomment-268532253. See attached for test file, config file, and output ISD.

test.out.zip

nigelmegitt commented 6 years ago

Thanks @skynavga - that suggests that TTPE conforms to the specification, but the specification does not match our expectations. I believe #288 resolves this in favour of your https://github.com/w3c/ttml1/issues/194#issuecomment-268532253 - do you agree?

skynavga commented 6 years ago

If I can I can convince myself that the apply all rules (rather than apply first rule) version of the algorithm does not oversubscribe an E to some R, then I can agree (that it better matches the original intentions and our intuition). However, I would like to revert the language of #288 to the original spec language plus only the following changes (and no other):

  1. change "the following ordered rules, where the first rule satisfied is used and remaining rules are skipped" to "the following rules:";
  2. remove step 5;

Pierre's rewrite of the text of each of steps (1) through (4), while being of good intention, has the disadvantage of looking like a wholesale rewrite, as opposed to a fine surgery. We need to be extremely conservative about changing the text of these fundamental algorithms in TTML1.

skynavga commented 6 years ago

It would also be useful to add a clarifying note just after the last algorithm step that says:

"The [associate region] procedure will, in general, associate multiple regions with a given element."

skynavga commented 6 years ago

I think there is a problem in step (2) with the new all rules apply algorithm. Consider

<div region="B">
  <p region="A">...</p>
</div>

Then the p will be selected into both A and B, since both rules (1) and (2) apply; however, clearly, p is intended to be selected into only A and not B.

[GA] added content to <p> to make it clear that I was abbreviating this example in the first instance, and that <p> is not empty.

palemieux commented 6 years ago

Then the p will be selected into both A and B, since both rules (1) and (2) apply

p will never be displayed because the div is pruned when constructing the ISD for region A, and p is pruned when constructing the ISD for region B.

See https://github.com/w3c/ttml1/issues/194#issuecomment-268848537

skynavga commented 6 years ago

@palemieux not true according to the current "all rules apply" algorithm, since the it will associate both A and B with both div and p

skynavga commented 6 years ago

There is another problem with the new algorithm as well: it does not obtain the results desired for https://github.com/w3c/ttml1/issues/194#issuecomment-268512070. Specifically, the anonymous span containing "Some text" is still not associated with any region.

palemieux commented 6 years ago

@palemieux not true according to the current "all rules apply" algorithm, since the it will associate both A and B with both div and p

<div region="B"> gets pruned by Step 2 of [construct intermediate document] when evaluation region A, therefore the p is never displayed.

skynavga commented 6 years ago

@palemieux no, that is not true, step 2 (d) states

they are Content elements and aren't associated with region R according to the [associate region] procedure.

and, since the (newly proposed) [associate region] procedure associates both A and B with both div and p, then div is not pruned when processing region A

skynavga commented 6 years ago

Just to be clear, my example above was abbreviated; in particular, I mean the <p> to be non-empty. I will edit the example in place to make this clear.

palemieux commented 6 years ago

@skynavga The proposed algorithm at https://github.com/w3c/ttml1/issues/194#issuecomment-257047647 specifies where the first assertion satisfied is used and remaining assertions are skipped, so step (1) is satisfied and <div region="B"> is only associated with region B.

skynavga commented 6 years ago

@palemieux except that isn't what you wrote in the PR at [1]

[1] https://github.com/w3c/ttml1/pull/288/files#diff-6036d776aaf698d95dec714264977eb9R7473

palemieux commented 6 years ago

@skynavga Thanks for the catch -- not quite sure what happened. I have updated the PR to match the issue. Apologies for the confusion.

skynavga commented 6 years ago

Thanks for fixing that, as it indeed lead to many of the above comments. However, it does not address the following:

  1. the example https://github.com/w3c/ttml1/issues/194#issuecomment-268512070, that is, the anonymous span containing "Some text" is still not associated with a region;

  2. since application of rule (2) can result in early termination, then a non-leaf element may fail to be associated with regions of descendants via rule (3);

In order to address these points, I have below documented what I believe are the semantic invariants (requirements) that apply, followed by two possible implementations.

INVARIANTS

  1. if default region R exists, [default region applies] E is associated with R

  2. if E specifies region R, [specified region of element applies] E is associated with R; otherwise, [in absence of specified region, regions of parent element apply] E is associated with all regions associated with the parent of E;

  3. if a descendant D of E specifies region R, [specified region of descendant element applies] E is associated with R

PROCEDURE 1 (unoptimized)

  1. [default region applies] for every E in any order, E associates with the default region;

  2. [specified region of element applies] for every E in any order, if E specifies a region, then E associates with R;

  3. [in absence of specified region, regions of parent element apply] for every E in any order, if E does not specify a region, then E associates with every region R associated with parent of E;

  4. [specified region of descendant element applies] for every E in any order, E associates with every region specified by descendant D of E;

PROCEDURE 2 (optimized)

  1. if default region applies, then for every E in pre-order, E associates with the default region; exit

    => satisfies [default region applies]

  2. for every E in pre-order,

    (a) if E specifies region R, then E associates with R; otherwise, E associates with every region R associated with PARENT(E); (b) if E is not an anonymous span, E associates with every region specified by descendant D of E.

    => satisfies [specified region of element applies], [in absence of specified region, regions of parent element apply], and [specified region of descendant element applies]

Notes

  1. In Procedure 2, step 2, pre-order traversal is required so that step 2(b) will have already computed the associations with PARENT(E).
palemieux commented 6 years ago

the example #194 (comment), that is, the anonymous span containing "Some text" is still not associated with a region;

I do not think that "Some text" should be associated with any region since I do not see a clear use case where this would be desirable.

since application of rule (2) can result in early termination, then a non-leaf element may fail to be associated with regions of descendants via rule (3);

This is clearly the intent of the algorithm as currently specified: ancestor-specified regions take precedence.

css-meeting-bot commented 6 years ago

The Working Group just discussed Ambiguous definition for determination of descendant region identifier. ttml1#194, and agreed to the following resolutions:

The full IRC log of that discussion <nigel> Topic: Ambiguous definition for determination of descendant region identifier. ttml1#194
<nigel> github: https://github.com/w3c/ttml1/issues/194
<nigel> group: [whiteboard discussion of example in issue confirming shared understanding]
<nigel> Glenn: I've convinced myself that this is no longer an issue.
<nigel> .. In TTML2 multiple regions can be referenced though.
<nigel> Nigel: This issue maybe belongs on TTML2 not TTML1 then.
<nigel> Cyril: It would be useful to add a note though.
<nigel> Glenn: [supports adding a clarifying note]
<nigel> group: Evaluates example using current published spec, and finds a possible ambiguity
<nigel> .. in [associate region] step 1, that it doesn't clarify that a specified region that is not the
<nigel> .. one being looked for causes the algorithm to return false immediately.
<nigel> David: Suggests refactoring the algorithm to cascaded if/then/else instead of numbered
<nigel> .. bullets.
<glenn> https://github.com/w3c/ttml1/issues/194#issuecomment-354492791
<nigel> Tess: If step 1 should exit, then it needs to say so.
<nigel> Pierre: I want to create a pull request based on this, to check it.
<nigel> .. I will close the existing pull request
<nigel> SUMMARY: @palemieux to propose a new pull request clarifying step 1
css-meeting-bot commented 6 years ago

The Working Group just discussed Ambiguous definition for determination of descendant region identifier. ttml1#194, and agreed to the following resolutions:

The full IRC log of that discussion <nigel> Topic: Ambiguous definition for determination of descendant region identifier. ttml1#194
<nigel> github: https://github.com/w3c/ttml1/issues/194
<nigel> Nigel: See also discussion on #288 (pull request).
<nigel> RESOLUTION: Close issue with no spec change.
<nigel> Pierre: That's the only thing we can do now.
<nigel> Glenn: Close this without prejudice.
<nigel> Pierre: We should defer it - characterise it as to be fixed later.
<nigel> RESOLUTION: (updated) Don't close the issue, close the pull request and Defer the issue to a later edition.